Posts tagged reinforcement learning

Posts tagged reinforcement learning

Working memory hinders learning in schizophrenia
A new study pinpoints working memory as a source of learning difficulties in people with schizophrenia.
Working memory is known to be affected in the millions of people — about 1 percent of the population — who have schizophrenia, but it has been unclear whether that has a specific role in making learning more difficult, said Anne Collins, a postdoctoral researcher at Brown University and lead author of the study.
“We really tend to think of learning as a unitary, single process, but really it is not,” said Collins, who in 2012 along with co-author Michael Frank, associate professor of cognitive, linguistic, and psychological sciences, developed an experimental task and a computational model of cognition that can distinguish the contributions of working memory and reinforcement in the learning process. “We thought we could try to disentangle that here and see if the impairment was in both aspects, or only one of them.”
In the new study in the Journal of Neuroscience, cognitive scientists Collins and Frank collaborated with schizophrenia experts James Waltz and James Gold of the University of Maryland to measure the effects of working memory and reinforcement in learning by applying these methods. They found that only working memory was a source of impairment.
Learning about learning’s components
To find that out, they marshaled 49 volunteers with schizophrenia and an otherwise comparable set of 36 people without the condition to participate in the specially designed learning task. In each round, participants were shown a set of images and then were asked to push one of three buttons when they saw each image. With each button push they were told whether they had hit the correct button for that image. Over time, through trial and error, participants could learn which picture called for which button. With perfect memory, one wouldn’t need to see an image more than three times to learn the right button to push when it appeared.
The task explicitly involves employing the brain’s systems for working memory (keeping each image–button association in mind) and for reinforcement learning (wanting to repeat an action that led to the feedback of “correct” and to avoid one that produced “incorrect”). But in different rounds while the degree of reinforcement remained the same, the experimenters varied the number of images in the sets the volunteers saw, from two to six. What varied, therefore, was the degree to which working memory was taxed.
What the researchers found was that for both people with schizophrenia and for controls, the larger the image set size, the more trials it took to learn to press the correct button consistently for each image and the longer it took to react to each stimulus. People with schizophrenia generally performed worse on the task than healthy controls.
Those results show that as the task involved more images, it became harder to do – a matter of working memory, since the capacity to maintain information explicitly in memory is limited – but that alone did not prove that working memory was a source of learning problems for people with schizophrenia. They could also be doing worse because of a slower use of the reinforcement.
To determine that, the researchers used their computational models of how learning occurs in the brain to fit the experimental data. They asked what parameters in the models needed to vary to accurately predict the behavior they measured in people with and without schizophrenia.
That analysis revealed that varying parameters of working memory, such as capacity, but not parameters of reinforcement learning, accounted best for differences in behavior between the groups.
“With model-fitting techniques, I can look quantitatively, trial-by-trial and see that the model predicts subject’s choices,” she said. “The same model explains both the healthy group and the patient group, but with differences in parameters.”
That confirmed that working memory uniquely affected learning in people with schizophrenia, while reinforcement learning mechanisms did not, Collins said.
The study suggests that working memory could be a more important target than reinforcement learning among researchers and clinicians hoping to help improve learning for people with schizophrenia, Collins said.
Among mentally healthy people as well, the study illustrates that the different components of learning can be understood individually, even as they all interact in the brain to make learning happen.
“More broadly, it brings attention to the fact that we need to consider learning as a multiactor kind of behavior that can’t be just summarized by a single system,” Collins said. “It’s important to design tasks that can separate them out so we can extract different sources of variance and correctly match them to different neural systems.”
(Image: Shutterstock)
Choice bias: A quirky byproduct of learning from reward
The price of learning from rewarding choices may be just a touch of self-delusion, according to a new study in Neuron.
The research by Brown University brain scientists links a fundamental problem in neuroscience called “credit assignment” — how the brain reinforces learning only in the exact circuits that caused the rewarding choice — to an oft-observed quirk of behavior called “choice bias” – we value the rewards we choose more than equivalent rewards we don’t choose. The researchers used computational modeling and behavioral and genetic experiments to discover evidence that choice bias is essentially a byproduct of credit assignment.
“We weren’t looking to explain anything about choice bias to start off with,” said lead author Jeffrey Cockburn, a graduate student in the research group of senior author Michael Frank, associate professor of cognitive, linguistic, and psychological sciences. “This just happened to be the behavioral phenomenon we thought would emerge out of this credit assignment model.”
So the next time a friend raves about the movie he chose and is less enthusiastic about the just-as-good one that you chose, you might be able to chalk it up to his basic learning circuitry and a genetic difference that affects it.
Modeled mechanism
The model, developed by Frank, Cockburn, and co-author Anne Collins, a postdoctoral researcher, was based on prior research on the function of the striatum, a part of the brain’s basal ganglia (BG) that is principally involved in representing reward values of actions and picking one. “An interaction between three key BG regions moderates that decision-making process. When a rewarding choice has been made, the substantia nigra pars compacta (SNc) releases dopamine into the striatum to reinforce connections between cortex and striatum, so that rewarded actions are more likely to be repeated. But how does the SNc reinforce just the circuits that made the right call? The authors proposed a mechanism by which another part of the subtantia nigra, the SNr, detects when actions are worth choosing and then simultaneously amplifies any dopamine signal coming from the SNc.”
“The novel part here is that we have proposed a mechanism by which the BG can detect when it has selected an action and should therefore amplify the dopamine reinforcing event specifically at that time,” Frank said. “When the SNr decides that striatal valuation signals are strong enough for one action, it releases the brakes not only on downstream structures that allow actions to be executed, but also on the SNc dopamine system, so any unexpected rewards are amplified.”
Specifically, dopamine provides reinforcement by enhancing the responsiveness of connections between cells so that a circuit can more easily repeat its rewarding behavior in the future. But along with that process of reinforcing the action of choosing, the value placed on the resulting reward becomes elevated compared to rewards not experienced this way.
Experimental evidence
That prediction seemed intriguing, but it still had to be tested. The authors identified both behavioral and genetic tests that would be telling.
They recruited 80 people at Brown and elsewhere in Providence to play a behavioral game and to donate some saliva for genetic testing.
The game first presented the subjects pictures of arbitrary Japanese characters that would have different probabilities of rewards if chosen ranging from a 20 percent to 80 percent chance of winning a point or losing a point. For some characters, the player could choose a character to discover its resulting reward or penalty, whereas for others, its result was simply given to them. After that learning phase, the subjects were then presented the characters in pairs and instructed to pick the one they thought had the highest chance of winning based on what they had learned.
The researchers built the game so that for every character a player could choose, there was an equally rewarding one that had merely been given to them. On average, players showed a clear choice bias in that they were more likely to prefer rewarding characters that they had chosen over equally rewarding characters they had been given.
Notably, they exhibited no choice bias between unrewarding characters suggesting that choice bias emerges only in relation to reward, one of the key predictions of their model. But they wanted to test further whether the impact of reward on choice bias was related to the proposed biological mechanism, that striatal dopaminergic learning is enhanced to chosen rewards.
The genetic tests focused on single-letter differences in a gene called DARPP-32, which governs how well cells in the striatum respond to the reinforcing influence of dopamine.
People with one version of the gene have been shown in previous research to be less able to learn from rewards, while people with other versions were less driven by reward in learning.
“The reason why this gene is interesting is because we know something about the biology of what it does and where it is expressed in the brain,” Frank said. “It’s predominant in the striatum and specifically affects synaptic plasticity induced by dopamine signaling. It’s related to the imbalance by which you learn from really good things or not so good things.
“The logic was if the mechanism that we think describes this choice bias and credit assignment problem is accurate then that gene should predict the impact of how good something was on this choice bias phenomenon,” he said.
Indeed, that’s what the data showed. People with the form of the gene that predisposed them to be responsive to big rewards also showed more choice bias from the most strongly rewarded characters. Interestingly, the other people also showed choice bias, but more strongly for those characters that were more mediocre. This pattern was mirrored by the authors’ model when it simulated the effects of DARPP-32 on reward learning imbalances from positive vs. negative outcomes.
For some people, the plums are sweeter if they picked them.
MIT researchers reveal how the brain keeps eyes on the prize.
“Are we there yet?”
As anyone who has traveled with young children knows, maintaining focus on distant goals can be a challenge. A new study from MIT suggests how the brain achieves this task, and indicates that the neurotransmitter dopamine may signal the value of long-term rewards. The findings may also explain why patients with Parkinson’s disease — in which dopamine signaling is impaired — often have difficulty in sustaining motivation to finish tasks.
The work is described this week in the journal Nature.
Previous studies have linked dopamine to rewards, and have shown that dopamine neurons show brief bursts of activity when animals receive an unexpected reward. These dopamine signals are believed to be important for reinforcement learning, the process by which an animal learns to perform actions that lead to reward.
Taking the long view
In most studies, that reward has been delivered within a few seconds. In real life, though, gratification is not always immediate: Animals must often travel in search of food, and must maintain motivation for a distant goal while also responding to more immediate cues. The same is true for humans: A driver on a long road trip must remain focused on reaching a final destination while also reacting to traffic, stopping for snacks, and entertaining children in the back seat.
The MIT team, led by Institute Professor Ann Graybiel — who is also an investigator at MIT’s McGovern Institute for Brain Research — decided to study how dopamine changes during a maze task approximating work for delayed gratification. The researchers trained rats to navigate a maze to reach a reward. During each trial a rat would hear a tone instructing it to turn either right or left at an intersection to find a chocolate milk reward.
Rather than simply measuring the activity of dopamine-containing neurons, the MIT researchers wanted to measure how much dopamine was released in the striatum, a brain structure known to be important in reinforcement learning. They teamed up with Paul Phillips of the University of Washington, who has developed a technology called fast-scan cyclic voltammetry (FSCV) in which tiny, implanted, carbon-fiber electrodes allow continuous measurements of dopamine concentration based on its electrochemical fingerprint.
“We adapted the FSCV method so that we could measure dopamine at up to four different sites in the brain simultaneously, as animals moved freely through the maze,” explains first author Mark Howe, a former graduate student with Graybiel who is now a postdoc in the Department of Neurobiology at Northwestern University. “Each probe measures the concentration of extracellular dopamine within a tiny volume of brain tissue, and probably reflects the activity of thousands of nerve terminals.”
Gradual increase in dopamine
From previous work, the researchers expected that they might see pulses of dopamine released at different times in the trial, “but in fact we found something much more surprising,” Graybiel says: The level of dopamine increased steadily throughout each trial, peaking as the animal approached its goal — as if in anticipation of a reward.
The rats’ behavior varied from trial to trial — some runs were faster than others, and sometimes the animals would stop briefly — but the dopamine signal did not vary with running speed or trial duration. Nor did it depend on the probability of getting a reward, something that had been suggested by previous studies.
“Instead, the dopamine signal seems to reflect how far away the rat is from its goal,” Graybiel explains. “The closer it gets, the stronger the signal becomes.” The researchers also found that the size of the signal was related to the size of the expected reward: When rats were trained to anticipate a larger gulp of chocolate milk, the dopamine signal rose more steeply to a higher final concentration.
In some trials the T-shaped maze was extended to a more complex shape, requiring animals to run further and to make extra turns before reaching a reward. During these trials, the dopamine signal ramped up more gradually, eventually reaching the same level as in the shorter maze. “It’s as if the animal were adjusting its expectations, knowing that it had further to go,” Graybiel says.
An ‘internal guidance system’
“This means that dopamine levels could be used to help an animal make choices on the way to the goal and to estimate the distance to the goal,” says Terrence Sejnowski of the Salk Institute, a computational neuroscientist who is familiar with the findings but who was not involved with the study. “This ‘internal guidance system’ could also be useful for humans, who also have to make choices along the way to what may be a distant goal.”
One question that Graybiel hopes to examine in future research is how the signal arises within the brain. Rats and other animals form cognitive maps of their spatial environment, with so-called “place cells” that are active when the animal is in a specific location. “As our rats run the maze repeatedly,” she says, “we suspect they learn to associate each point in the maze with its distance from the reward that they experienced on previous runs.”
As for the relevance of this research to humans, Graybiel says, “I’d be shocked if something similar were not happening in our own brains.” It’s known that Parkinson’s patients, in whom dopamine signaling is impaired, often appear to be apathetic, and have difficulty in sustaining motivation to complete a long task. “Maybe that’s because they can’t produce this slow ramping dopamine signal,” Graybiel says.