Depression is associated with varied symptoms, from mood changes to cognitive impairment. A large proportion of these symptoms may be driven at least in part by abnormal responses to affective stimuli (1). Specifically, depression is associated with a strong “negative” bias: enhanced sensitivity to negative (punishing) stimuli and a behavioral neglect of positive (rewarding) stimuli (2). This affective bias, which is manifested across many facets of learning, memory, and cognition, putatively serves both to instigate and to uphold the debilitating negative and anhedonic mood state (3, 4). A clearer understanding of the neural basis of affective bias in depression will thus lead to a clearer understanding of the overall pathology.
In this study, we focused on affective biases seen in flexible learning in depression. Adaptive behavior in our daily life, where the consequences of our actions are often uncertain and variable, requires individuals to frequently and flexibly update their behavior. The experimental model most often used to examine such flexible behavior is the probabilistic reversal learning paradigm. In this paradigm, subjects learn by trial and error to choose the most rewarding stimulus and then subsequently reverse their choice when contingencies change and this previously rewarding stimulus is unexpectedly followed by punishment. In this probabilistic task, where around one-fourth of the reward and punishment feedback is misleading, depressed individuals reverse more often than do healthy individuals when they receive misleading negative feedback (5–7). This problem has been interpreted to reflect a negative affective bias and may underlie the tendency of depressed individuals to emphasize negative—at the expense of positive—life experiences.
However, this negative affective bias could be driven by at least two different processes: 1) increased behavioral sensitivity to unexpected punishment in depression (encouraging reversal during misleading negative feedback), and/or 2) reduced behavioral sensitivity to reward in depression (reducing the ability to maintain the correct stimulus-reward association). To elucidate the nature of affective biases in reversal learning, we developed a novel reversal learning paradigm that enabled direct comparison of reversals signaled by unexpected reward with reversals signaled by unexpected punishment (8–11). In this task, subjects do not directly choose the rewarded or punished stimulus but rather predict the outcome of stimuli selected by the computer. Unlike the probabilistic tasks, this task is deterministic and subjects are required to reverse their behavior as soon as they receive unexpected outcomes. Critically, our study design involved Pavlovian rather than instrumental conditioning, which allowed the assessment of reversals on the basis of unexpected reward as well as unexpected punishment (8, 11).
Using this task, we previously demonstrated that both punishment and reward reversals rely on overlapping but distinct regions of the striatum (11). This involvement of the striatum is consistent with imaging studies of the classic probabilistic reversal learning task in healthy individuals, in whom increased striatal response precedes behavioral switching (12), and it concurs with the frequently highlighted role of the striatum in dopamine-mediated prediction error learning (13).
Extrapolation from the above findings suggests that it is plausible that the behavioral bias in reversal learning seen in depression (5–7) is driven by altered striatal processing. Indeed, attenuated striatal function is seen in the depressive pathology across multiple cognitive tasks, from higher-order planning to gambling (1, 14, 15). However, previous work using the classic probabilistic reversal learning paradigm in depressed individuals has not found significant differences in striatal response during reversal learning (5, 16, 17). Although the striatum is a key region involved in reversal learning in healthy individuals and reversal learning is impaired in depression, studies to date have not demonstrated striatal involvement in the negative bias in reversal learning in depression, despite the fact that the striatum is involved in the neuropathology of depression (14).
The negative bias in reversal learning in depression therefore might not directly involve the striatum but rather aberrant function in, for example, the orbitofrontal cortex (18–20) or the amygdala (5). Alternatively, however, previous studies may have failed to reveal the contribution of the striatum because they inadequately disentangled the separate reward and punishment components of reversal learning. In this study, we therefore employed our new deterministic reversal learning task to examine differences in the hemodynamic response during separate punishment and reward reversal trials across unmedicated depressed individuals and healthy comparison subjects. We predicted that depressed individuals would demonstrate a negative bias in reversal learning and that this would be associated with a corresponding attenuation in striatal response during reversal trials. However, given the absence of striatal differences across diagnosis in punishment-based probabilistic reversal learning, we predicted that any alteration in striatal response would be restricted to reward-based reversals.
Volunteers (N=27; 15 Caucasian, one Asian, 11 African American; all right-handed) 18–50 years of age underwent screening evaluations that included a medical history, physical examination, laboratory testing, and structural MRI. Psychiatric assessment was conducted using the Structured Clinical Interview for DSM-IV-TR and an unstructured interview with a psychiatrist; 14 volunteers had no psychiatric disorders (healthy comparison subjects), and 13 had major depressive disorder. Exclusion criteria for all participants included psychotropic drug exposure (including nicotine) within the past 3 weeks; major medical or neurological illness; illicit drug use or alcohol abuse within the past year; lifetime history of alcohol or drug dependence; psychiatric disorders other than major depression (excepting comorbid anxiety disorder and a remote history of substance abuse); current pregnancy or breastfeeding; structural brain abnormalities on MRI; general MRI exclusions. Additional exclusion criteria for comparison subjects were a history of any psychiatric disorder (except a remote history of substance abuse) and a history of any mood disorder in a first-degree relative. After receiving a complete description of the study, participants provided written informed consent as approved by the National Institutes of Health Combined Neuroscience Institutional Review Board. Participants were group matched for age (healthy comparison group, mean=31 years [SD=6], depressed group, mean=36 years [SD=11]), gender (eight male participants in each group), years of education (healthy comparison group, mean=17 years [SD=2], depressed group, mean=16 years [SD=2]), and IQ (healthy comparison group, mean=120 [SD=15], depressed group, mean=120 [SD=15]; IQ scores were not available for eight participants [five in the depressed group], four because English was not their first language [one in the depressed group]; one [in the depressed group] because he vocationally administered IQ testing, and three because they dropped out of the study after scanning but before neuropsychological testing). The mean score on the 21-item Hamilton Depression Rating Scale (HAM-D) (21) was higher in the depressed than in the comparison group (depressed group, mean score=20 [SD=7]; comparison group, mean score=1 [SD=1]; F=95, df=1, 25, p<0.001).
Behavioral and Functional Neuroimaging Measures
The behavioral task was adapted from a previously developed paradigm (8, 9, 11) and programmed using E-PRIME (Psychological Software Tools, Inc., Pittsburgh).
On each trial, participants were presented with two vertically adjacent stimuli, one scene and one face (location randomized) on a projector viewed by means of a mirror attached to the head coil in the functional MRI (fMRI) scanner. One of these two stimuli was associated with reward and the other with punishment. Participants were required to learn these deterministic stimulus-outcome associations by trial and error. Unlike standard probabilistic reversal paradigms, however, participants were not required to choose between the two stimuli but were instructed to predict whether a stimulus that was highlighted with a black border (randomized from trial to trial) would lead to reward or to punishment (the task contingencies were thus Pavlovian and expected to be processed more specifically in the ventral striatum ). They indicated their outcome prediction for the highlighted stimulus by pressing, with the index or middle finger of their dominant (right) hand, one of two buttons (one for reward, one for punishment; response mappings counterbalanced) on a button box placed on their abdomen. They had up to 1,500 msec to provide a response. Once they responded, the outcome was presented for 500 msec in the center of the screen (between the two stimuli). Reward consisted of a green smiley face and punishment a red sad face. If they failed to make a response, “Too late!” was displayed instead of the outcome. After the outcome, the screen showed only a fixation cross for a reaction time-dependent interval, so that the interstimulus interval was jittered modestly between 2,000 and 4,000 msec.
Each experimental block consisted of one acquisition stage and a variable number of reversal stages. The task proceeded from one stage to the next following a specific number of consecutive correct trials as determined by a preset learning criterion. This criterion varied between stages (four, five, or six correct responses) to prevent predictability of reversals. The task also terminated after 10 consecutive incorrect trials in order to avoid scanning blocks in which participants were not performing the task correctly (e.g., because of having forgotten the outcome-response mappings). Reversals of contingencies were signaled to participants either by an unexpected reward presented after the previously punished stimulus was highlighted or by an unexpected punishment presented after the previously rewarded stimulus was highlighted. Unexpected reward and unexpected punishment events were interspersed within blocks. Consistent with previous versions of this task (8, 11), the same stimulus was highlighted after the unexpected outcome and was presented until participants correctly reversed their predictions.
During the scan session, participants completed six experimental blocks. The average number of reversal stages per experimental block was eight (four signaled by punishment), although the block terminated automatically after completion of 150 trials (7.4 minutes), so that each participant performed 900 trials (six blocks) per experimental session (approximately 90 minutes, including breaks). A 30-second fixation period was also included at the beginning and end of each block to provide a baseline with which to compare blood-oxygen-level-dependent (BOLD) response during trials.
All participants performed a practice block before entering the fMRI scanner to familiarize them with the task. The practice task was identical to the main task except that the stimuli were presented on a laptop computer.
Reaction times and accuracy rates were assessed in an analysis of variance with reversal (reversal versus nonreversal trials) and valence (reward versus punishment) as within-subject factors and group (depressed versus healthy comparison group) as the between-subjects factor. Trials on which participants failed to make a response were excluded from reaction time analyses, and the rare trials in which participants coincidentally made a nonreversal error on an unexpected outcome trial were excluded from all analyses (as this meant that they accidentally preempted the reversal, making the expectancy of outcome unclear). Accuracy was determined as a proportion of the total number of trials for the type being examined; nonreversal reward errors were divided by the total number of nonreversal reward trials, and punishment reversal errors were divided by the total number of punishment reversals. As the task was deterministic, reversal errors were defined as errors on the trial immediately following the unexpected outcome (9, 11). Partial eta-squared (ηp2) effect sizes are reported for all significant contrasts, and p values are Bonferroni adjusted.
A GE Signa HDxt 3-T scanner (GE Healthcare, Milwaukee) was used to acquire structural and functional MR images. The functional sequence comprised six echo-planar imaging sessions of 255 volume acquisitions (flip angle=90°; repetition time=2,000 msec; echo time=30 msec; field-of-view=24×24 cm; slice thickness=3 mm; slice spacing=0.5 mm; matrix=64×64 sagittal slices with array spatial sensitivity encoding technique). The first 10 volumes from each session were discarded to avoid T1 equilibrium effects. The structural sequence comprised a magnetization-prepared rapid gradient echo anatomical reference image (flip angle=60°; repetition time=7,800 msec; echo time=3,000 msec; field of view=22×22 cm; slice thickness=1.2 mm; slice spacing=0 mm; matrix=246×192 for spatial coregistration and normalization).
Images were preprocessed (see the data supplement that accompanies the online edition of this article) and analyzed using SPM8 (Wellcome Department of Cognitive Neurology, London). We estimated a general linear model, for which parameter estimates were generated at the onsets of all expected and unexpected reward and punishment trials (with zero duration), which co-occurred with the response. Consistent with our previous study, an unexpected outcome was the first outcome of a new stage, presented after learning criterion had been obtained (i.e., the outcome signaling contingency reversal), and all other outcomes were coded as expected outcomes, irrespective of task performance (11).
Because of strong a priori hypotheses regarding the role of the striatum in this task, a region-of-interest analysis was performed by extracting standardized β values from the anatomically defined (23) left and right caudate and putamen using the MarsBar software package (24) for each trial type. In line with our hypotheses, across-group analyses were performed separately for each trial.
Next, to localize more specifically the peak differences in responses within the striatum and to investigate the extended functional anatomical network of regions that may interact with the striatum during task performance, a whole brain voxel-wise analy-sis was performed post hoc for each of the four trial types. For this whole brain analysis, a one-sample t test was created for each trial type (unexpected punishment and unexpected reward) with group as a covariate. Clusters are reported at voxel-level p values <0.001 (labels assigned using the automated anatomical labeling toolbox for SPM ) and defined using a voxel-level threshold corresponding to an uncorrected p value <0.001 and coordinates reported (Montreal Neurological Institute [MNI]/Talairach) for peak voxel t value. Family-wise error voxel-level corrected p values are also reported for the peak voxel t values within small-volume-corrected regions of interest.
Error rates and reaction times are presented in Table 1. There was a significant three-way interaction of valence, reversal (reversal, nonreversal), and group in error rates (F=10.4, df=1, 25, p=0.004; ηp2=0.29), but not for reaction time. This significant three-way interaction was broken down in simple (interaction) effects analyses for reversal and nonreversal trials separately.
Behavioral Results on a Reversal Learning Task in Depressed Individuals and Healthy Comparison Subjects
| Add to My POL
|Error Ratea||Reaction Time (msec)|
|Group, Stage, and Valence||Mean||SD||Mean||SD|
|Major depression group|
|Healthy comparison group|
According to our hypothesis, the main outcome of interest was reward-based reversal learning. Depressed participants made more errors than did comparison subjects on reward reversal trials (F=11.7, df=1, 25, p=0.002; ηp2=0.32; Figure 1A) but made equal numbers of punishment reversal errors, driving a significant group-by-valence interaction in error rates (F=5.2, df=1, 25, p=0.032; ηp2=0.17). This difference was seen despite comparable reaction times during reward and punishment reversals across groups. Thus, the depressed participants demonstrated a negative affective bias in reversal learning as a result of reduced behavioral responsiveness to reward but not punishment.
Impaired Reward Reversal Learning and Attenuated Right Putamen Response to Unexpected Reward in Depressed Individuals Relative to Healthy Comparison Subjectsa
a As shown in panel A, accuracy is lower on reward (F=11.7, df=1, 25, p=0.002) but not punishment reversals in depressed individuals relative to healthy individuals. Panel B shows attenuated right (anatomically defined) putamen response during reward reversal trials in depressed individuals relative to healthy individuals (F=10.5, df=1, 25, p=0.003) but equivalent response during punishment reversal. Error bars indicate standard deviations. In panel C, whole brain analysis confirms that the peak neural response difference between depressed and healthy individuals on reward reversal trials was the right anteroventral putamen (peak voxel x=30, y=3, z=–8; image shows SPM t scores ranging from 2.1 to 4.1).
By contrast, there was no valence specificity on nonreversal trials. Depressed and healthy individuals responded equally well on nonreversal reward and punishment trials.
Neural effects in each of the four regions of interest during the key reward- and punishment-based reversal trials are summarized in Table 2. The most striking pattern was observed in the right putamen (23), which showed a significant three-way interaction of valence, reversal, and HAM-D score included as a continuous variable (F=3.1, df=13, 13, p=0.026; ηp2=0.76). Accordingly, we emphasize the data from this region.
Differences During Reward and Punishment Reversal Trials in Anatomically Defined Regions of Interest in Depressed Individuals and Healthy Comparison Subjects
| Add to My POL
|Healthy Comparison Subjects > Depressed Individuals|
|Unexpected Reward||Unexpected Punishment|
|Anatomically Defined Region||F||df||p||F||df||p|
|Right putamen||10.4||1, 25||0.003||0.8||1, 25||0.4|
|Left putamen||3.7||1, 25||0.07||0.2||1, 25||0.7|
|Right caudate||0.9||1, 25||0.4||0.06||1, 25||0.8|
|Left caudate||0.008||1, 25||0.9||3.5||1, 25||0.07|
The main trials of interest were the reward reversal trials. Significantly decreased right putamen response was observed in depressed individuals during reward (F=10.5, df=1, 25, p=0.003; ηp2=0.30) but not punishment reversals. These results are shown in Figure 1B and correspond with the accuracy results presented in Figure 1A. Thus, as predicted, the negative affective bias in the behavior of depressed individuals was accompanied by attenuation in striatal response during reward reversals.
There was, by contrast, no valence specificity in neural responses during the nonreversal trials. Putamen response was significantly higher in the healthy comparison group than in the depressed group during both reward (F=7.4, df=1, 25, p=0.01; ηp2=0.23) and punishment (F=7.8, df=1, 25, p=0.01; ηp2=0.24) nonreversal trials.
In the depressed group, HAM-D score did not correlate with the ventral putamen BOLD response to unexpected reward or the reward reversal errors.
Consistent with the region-of-interest analysis, a whole brain analysis of regions that were more active in the healthy comparison group relative to the depressed group during unexpected reward revealed increased response in the right anteroventral putamen in healthy relative to depressed individuals (whole brain peak voxel: MNI coordinates, x=30, y=3, z=–8; Talairach coordinates, x=30, y=2, z=–7 [right anteroventral putamen]; uncorrected p<0.001; small-volume corrected region-of-interest, family-wise error corrected p=0.011; Figure 1C and Table 3). A comparable whole brain analysis for unexpected punishment trials failed to reveal any significant difference between the comparison and depression groups.
Regions More Active at a More Liberal Statistical Threshold in Healthy Versus Depressed Participants During Unexpected Reward Trialsa
| Add to My POL
|MNI Coordinates||Talairach Coordinates|
|Brain Region||x||y||z||x||y||z||T||K (Cluster Size)|
|Right ventral putamen||30||3||–8||30||2||–7||4.05||16|
|Left mid-cingulate cortex||–15||0||38||–15||–1||32||3.75||2|
|Left mid-occipital cortex||–24||–57||34||–24||–54||34||3.65||2|
Consistent with our hypothesis, a negative bias in reversal learning in depression was accompanied by altered reward-related striatal response. Specifically, we found impaired reward (but not punishment) reversal behavior in depression alongside attenuated ventral striatal response to unexpected reward. Thus, we provide a potential neural basis for the negative bias underlying the flexible-learning impairment in depression.
The attenuated reward-related striatal response in major depressive disorder is consistent with results of several recent studies examining reward processing deficits in different aspects of cognition in depression (1, 2, 15, 25, 26). However, this study is the first to demonstrate valence specificity in the striatal response to reward and punishment in depression and the first to demonstrate that striatal attenuation in depression extends beyond the receipt and anticipation of reward (15) to reward-based reversal learning. This blunted behavioral response to reward and not to punishment also provides an alternative explanation for the previously demonstrated impairment in reversal learning in depression (5–7); it may be driven by attenuated reward responses rather than by elevated punishment responses. Previous studies with the probabilistic reversal learning task failed to reveal differences in striatal function while solely examining reversals based on unexpected punishment (5, 16, 17), and (although the interpretation of this latter negative finding was limited by the low generalizability and statistical sensitivity conferred by the relatively small sample sizes) we saw significant three-way interactions of valence, reversal, and depression and also failed to demonstrate striatum-specific differences between depressed and comparison groups on punishment-based reversals. The group difference in the striatal hemodynamic response was significant only when we compared responses to unexpected reward.
Under a variety of experimental conditions, mood disorders have been associated with abnormal neural processing in structures implicated in appetitive and aversive learning, including the orbitofrontal cortex (18–20) and the amygdala (5), which likely contributes to the overall neurocognitive profile of depression. The locus within the striatum where we observed an attenuated hemodynamic response to unexpected rewards implicated a region of the anterior ventrolateral putamen, which receives projections from both the medial and orbital prefrontal cortical networks (14, 27) as well as the amygdala (28). Thus the attenuated BOLD response in the putamen may have been driven by abnormal afferent transmission from these cortical regions (27, 29) rather than by a specific abnormality within the striatum. Notably, lesions in the ventral striatum, orbitofrontal cortex, pallidum, or mediodorsal nucleus of the thalamus have all been shown to cause perseverative deficits in stimulus-reward reversal tasks in rats and monkeys, such that the animals have difficulty switching away from previously rewarded but not unrewarded stimuli (14). The present study thus extends the sources of altered neural transmission in depression to encompass attenuated reward reversal-related responses in the ventral striatum, but this finding is interpreted within the context of the limbic-prefrontal cortical-striatal-pallidal-thalamic circuits involving this part of the striatum (11, 12, 14).
While the negative bias demonstrated with the reversal learning task used here joins the affective biases demonstrated by a range of cognitive tasks in depression (2), the specific direction of the impairment we observed—attenuated reward processing rather than improved punishment processing—may be related to the impaired ability to derive pleasure from rewarding activities seen in depression. This hypothesis would be compatible with evidence that the functioning of the mesolimbic dopaminergic system, which plays a major modulatory role within the limbic-cortical-striatal-pallidal-thalamic circuitry (30), is reduced in depression (both in general and in response to unpredicted reward) (1, 14, 31, 32) and with evidence for the involvement of dopamine in punishment and reward learning in the striatum in this (8–10, 33) and other (34, 35) tasks. Individuals with higher dopamine synthesis capacity, for instance, demonstrate improved reward-based relative to punishment-based reversal learning on the task we used here (10, 36). Moreover, amphetamine-induced dopamine release within the anteroventral putamen is correlated with subjective feelings of euphoria (or hedonia) in healthy individuals (37, 38). Thus, the attenuated anteroventral putamen response we identified in depression may reflect a reciprocal process: attenuated striatal response associated with reduced dopamine release and anhedonia. It is conceivable, furthermore, that amelioration of the reversal learning impairment and anhedonia in depression would result from enhancement of the mesolimbic dopaminergic system (17, 39). Nevertheless, these hypotheses require testing in future studies, since the present study included neither anhedonia ratings nor assessments of central dopaminergic function.
Finally, our findings do not invalidate the proposition that depression is also associated with hypersensitivity to punishment in other contexts, such as when performance declines after a perceived error (and associated aversive feedback) on planning or mnemonic tasks (2). Indeed, alterations to both reward and punishment processing are seen in depression (1), and while this “catastrophic response to perceived failure” (2, p. 64) is likely due to an enhanced impact of negative (punishing) judgment on performance, the task used in this study does not provide patients with explicit judgment about their performance and may therefore tap into distinct reward and punishment processing mechanisms. Indeed, one key advantage of neurocognitive assessment as a measure of pathology is that it is possible to target distinct neural systems with different cognitive tasks, thereby breaking down the underlying architecture of such multifaceted and subjective behaviors. Recent findings in fact implicate a habenula-rostromedial tegmental circuit in the processing of reward omission and expected punishment (40), but our fMRI parameters were not optimized to detect signal change in a structure of this small size. Whether this circuit therefore underlies altered punishment processing in depression is a question for future research.
These results suggest that altered reversal learning in depression is driven by attenuated striatal function and that this effect depends more specifically on an attenuated response to unexpected reward rather than to unexpected punishment. The region of the striatum critical for this bias corresponds with the anteroventral putamen, which is known to play a key role in hedonic processing and may therefore represent the neural underpinnings of anhedonic mood in depression. Improving the ability of depressed patients to learn about rewarding feedback, including social interactions and positive life experiences, is critical for recovery. The findings from this study provide a neural target for such recovery.