Major depressive episodes are characteristic of both major depressive disorder and bipolar disorder. Diagnostic criteria rely on features, of course—namely, the presence or absence of manic or hypomanic episodes—to distinguish between the two diagnoses. In some cases, however, a history of mood elevation is underreported by patients; in others, patients who appear to be in a depressive episode simply have not yet experienced a manic episode (1, 2). Initial misdiagnosis is common (3–6), and delayed or inappropriate treatment can be associated with consequences, including switching into mania, precipitation of a mixed state, more frequent mood episodes, or poorer outcome in general (7–9).
A number of studies have attempted to distinguish the phenomenology of depression in major depressive disorder and bipolar disorder. In bipolar depression, a greater prevalence of atypical features or reverse neurovegetative symptoms, such as hypersomnia or hyperphagia, was reported by most studies (10–18) but not all (19). Likewise, a greater prevalence of melancholic symptoms among bipolar depressed patients was identified in several reports (17, 20) but not in others (21). Finally, irritability (22, 23), anger (24, 25), subthreshold mixed symptoms, such as overactivity (26), and psychosis (17) have also been associated with bipolar depression. One prospective study suggested specificity with combinations of clinical predictors, such as early onset of symptoms, bipolar family history, and hypersomnia/slowing as high as 98% (12). These findings are derived from select samples, however, and they are rarely replicated. Therefore, we compared clinical and sociodemographic features of major depressive disorder and bipolar disorder in a large cohort of outpatients participating in three clinical trials for the treatment of major depressive episodes.
Baseline data were compared from subjects participating in two U.S. outpatient sites in two major depressive disorder treatment studies and one bipolar disorder treatment study conducted between 1999 and 2001. Both major depressive disorder studies were multicenter, parallel, double-blind, randomized, placebo-controlled clinical trials carried out exclusively in the United States. Study 1 compared duloxetine to placebo in the acute treatment (8 weeks) of patients diagnosed with major depression, and study 2 compared two doses of duloxetine with paroxetine and placebo. Each study comprised two identical trials performed in parallel with the same protocol. Primary results of one trial for each study have been previously reported (27, 28). Baseline data from both trials were pooled within each study for the purposes of this analysis. Inclusion criteria were the following: 1) DSM-IV criteria were met for a primary diagnosis of nonpsychotic major depressive disorder, as defined by the Mini-International Neuropsychiatric Interview (29); 2) at least moderate depression was diagnosed, as defined by a Clinical Global Impression (CGI) severity scale score of 4 or higher; and 3) the subject had a Hamilton Depression Rating Scale (HAM-D) total score of 15 or higher.
The bipolar depression study was a multicenter, parallel, double-blind, randomized, placebo-controlled clinical trial carried out in 13 countries to compare the efficacy and safety of olanzapine and the olanzapine-fluoxetine combination with placebo. Primary study results have been previously reported (30). Inclusion criteria were 1) DSM-IV criteria met for bipolar I disorder and depression according to the Structured Clinical Interview for DSM-IV and 2) baseline Montgomery-Åsberg Depression Rating Scale (31) (MADRS) total score ≥20. HAM-D and CGI severity scores were not used as entry criteria in this study.
Both studies also excluded subjects under age 18, those felt to be at serious risk of suicide in the judgment of the investigator, those with current substance use disorders, and pregnant or breastfeeding women. Those with comorbid anxiety disorders were permitted provided they did not represent the current primary diagnosis.
Because we were interested in focusing on the diagnosis of depressive episodes in the outpatient setting, all inpatients and psychotic patients were dropped from the bipolar disorder study. In addition, because the bipolar disorder study included patients from the United States as well as other countries, whereas the major depressive disorder studies included only patients from the United States, non-U.S. patients from the bipolar study were dropped from this analysis. This yielded 477 U.S. bipolar patients who were outpatients and nonpsychotic with baseline MADRS data. In the major depressive disorder studies, there were 367 patients from study 1 and 707 patients from study 2.
Demographic and illness characteristics were compared between the bipolar disorder study and the two major depressive disorder studies by using a chi-square test for categorical data and an analysis of variance for continuous measures. For each item of the MADRS and the Hamilton Anxiety Rating Scale (HAM-A), the studies were compared by using analysis of covariance (ANCOVA), with the total MADRS score as the covariate. The adjusted means from the ANCOVA model are reported. To account for multiple comparisons, we defined statistical significance if p<0.05 for both sets of comparisons (i.e., between the bipolar disorder study and major depressive disorder study 1 and between the bipolar disorder study and major depressive disorder study 2).
A forward stepwise logistic regression was performed to determine the best predictors of major depressive disorder versus bipolar disorder. The forward-stepping model initially looked at all possible variables and picked the one that best discriminated between major depressive disorder and bipolar disorder based on the chi-square statistic. If the p value for that variable was less than 0.05, that variable was included in the model and the remaining variables were evaluated. This process was repeated until there were no remaining variables associated with diagnosis with p<0.05. Possible variables were all individual items from both the MADRS and HAM-A scales, along with family history, age at onset of illness, and number of prior depressive episodes.
The total MADRS score was forced into the model initially and remained while the stepwise procedure was carried out. This approach ensured that the rest of the variables selected included only those capturing nontrivial differences (that is, not driven by the depression severity score) in the two groups. When determining predictive power, the MADRS total score was not included in the model, but the variables selected from the stepwise procedure were used as predictors, with the symptom scores adjusted for MADRS total score. The adjustment was accomplished by computing residuals of raw predictor scores with respect to the total MADRS score. This was performed because the MADRS total score was known to be unbalanced between the studies because of different severity inclusion criteria. A receiver-operating characteristic curve and the area under the curve were also calculated to summarize the predictive power of the logistic model.
As expected, age at onset of mood symptoms was about 8 years earlier for the bipolar patients than for the two major depressive disorder patient cohorts (t1). Family history of major depressive disorder did not differ significantly between the two groups; however, family history of bipolar disorder was more common among the subjects with bipolar disorder. The number of prior depressive episodes was also significantly greater among the subjects with bipolar disorder. In fact, for 40% of the bipolar patients, this value was recorded as "too numerous to count" (these patients are included in the ">25" category in t1), suggesting a greater prevalence of indistinct or highly recurrent episodes in this group.
F1 shows univariate comparisons of individual depressive symptoms among the three groups, as measured by the MADRS, adjusted for overall depressive severity. Five items—apparent sadness, tension, reduced sleep, pessimistic thoughts, and suicidal thoughts—were statistically significantly different between the bipolar group and each of two major depressive disorder groups. Similarly, F2 shows univariate comparisons of anxiety symptoms among the three groups, as measured by the HAM-A, again adjusted for overall depressive severity. In these comparisons, nine items differed significantly between the bipolar group and each of two major depressive disorder groups. The score for fears was statistically significantly higher for the bipolar patients, whereas the insomnia, intellectual (cognitive), somatic (muscular), respiratory, gastrointestinal, genitourinary, and autonomic symptoms scores were all significantly lower for the bipolar patients. The score for behavior at the interview was also significantly lower for the bipolar patients compared to the major depressive disorder patients.
Clinical features and rating scale items were then incorporated as predictors in a stepwise logistic regression with diagnosis (bipolar disorder versus major depressive disorder) as the outcome; significant predictors are presented in t2. (The MADRS total score was forced into the model to adjust for differences in total severity; when it was not forced into the model, it was the first term to enter, and the results did not change.) With a cutoff point of 0.5, the logistic regression model, with MADRS total score omitted, correctly classified 1,316 of 1,514 subjects (86.9%), with a sensitivity of 69.0% (probability of predicting bipolar disorder when the actual diagnosis was bipolar disorder) and a specificity of 94.9% (probability of predicting major depressive disorder when the actual diagnosis was major depressive disorder). The model appeared to appropriately fit the data (Hosmer-Lemeshow p>0.40, indicating no reason to reject the model). The receiver-operating characteristic curve for a range of cutoff points is presented in F3; the total area under the receiver-operating characteristic curve was estimated to be 0.914. The area under the curve may be interpreted as the probability that the predictions and outcomes are concordant; for example, a value of 0.50 means that the predictions were no better than guessing, whereas a value of 1.0 would indicate perfect prediction ability.
The stepwise logistic regression model was also run without the possibility of including family history or number of previous episodes of depression in the model to minimize the possibility of observer bias (i.e., that raters in a clinical trial for bipolar disorder might be more comprehensive in ascertaining family history of bipolar disorder or might expect a greater number of episodes). This model resulted in the addition of two MADRS items and the elimination of two HAM-A items that were previously included (t2). With a cutoff point of 0.5, this logistic regression model correctly classified 1,164 of 1,545 subjects (75.3%), with a sensitivity of 42.8% and a specificity of 89.9%. The area under the receiver-operating characteristic curve was estimated to be 0.768.
Delayed recognition of bipolar disorder appears to be common (32), even in more recent investigations (3, 4, 6). For misdiagnosed bipolar patients, when mood stabilizer initiation is delayed, outcomes appear to be poorer (9). Exposure to antidepressants, particularly in the absence of mood stabilizers, can precipitate switching into manic or mixed states or cycle acceleration in a subset of bipolar patients (7, 8). Conversely, although rarely discussed in the literature, patients with major depressive disorder exposed to mood stabilizers unnecessarily likewise would be expected to suffer poorer outcomes because of side effects or lesser likelihood of treatment response. Therefore, distinguishing patients with major depressive disorder from patients with bipolar disorder in a depressive episode is of profound clinical importance.
In this analysis, to our knowledge, the largest systematic comparison of subjects with major depressive disorder and bipolar disorder to date, we identified both sociodemographic and clinical features associated with a mood disorder diagnosis. Of note, most of these individual differences are modest, although in the aggregate, they allowed the differentiation of bipolar disorder from major depressive disorder with good specificity.
Epidemiological studies suggest that the mean age of illness onset is earlier among bipolar patients than among those with major depressive disorder, with one study estimating a mean difference of 6 years (33). Likewise, family and twin studies have established the familiality of bipolar disorder, so our finding that bipolar disorder is more common in family members of bipolar subjects is expected (34). Perhaps more important, the rates of major depressive disorder are similar in the two groups, highlighting the fact that bipolar patients frequently have unipolar family members and vice versa (35, 36).
Somatic symptoms of depression and anxiety—in particular, the somatic (muscular), respiratory, and genitourinary items from the HAM-A—were greater in the major depressive disorder group. The role of somatic symptoms has recently received renewed attention in major depressive disorder (37) but has not been previously examined in bipolar disorder. Conversely, tension/edginess and fearfulness were more severe among subjects with bipolar disorder than among subjects with major depressive disorder. Neither trial excluded comorbid anxiety disorders, unless they were considered "primary"—i.e., more clinically important than the mood disorder.
A clear limitation of the present report is the patient source. Participants in clinical efficacy trials are known to differ from general clinical populations (38, 39). We report these parameters for comparison with other studies rather than to assert that this model should necessarily be applied clinically before it is validated.
A second limitation is the omission of some features of depression previously associated with bipolar disorder, including specific assessment of reverse neurovegetative symptoms, such as hypersomnia or hyperphagia, which are not captured by the MADRS. The study inclusion criteria would therefore be expected to yield groups enriched for insomnia rather than hypersomnia. Still, our finding that insomnia is more strongly associated with major depressive disorder does indirectly support the association of at least one atypical depressive feature with bipolar disorder. Inclusion of these features might further improve predictive power.
The bipolar disorder and major depressive disorder groups that we examined do not include bipolar II subjects, who may be more difficult than bipolar I subjects to distinguish from major depressive disorder subjects because their episodes of mood elevation are less severe (40). However, one prospective study suggested that bipolar II patients may be more easily distinguished from major depressive disorder than bipolar I patients based on clinical features other than mania (39).
Finally, it is possible that some major depressive disorder subjects in this study, particularly those early in their course (e.g., in a first depressive episode), will go on to experience a manic/hypomanic episode and be rediagnosed with bipolar disorder (32, 40, 41). However, 79% of major depressive disorder subjects were older than age 30 and thus beyond the peak period of risk for a first manic episode (2); moreover, excluding first-episode subjects from the regression models yielded essentially identical results (not shown).
In summary, this comparative study suggests that in addition to age at onset, recurrence, and family history, individual symptoms—particularly those related to anxiety, both somatic and cognitive—may be useful in distinguishing bipolar disorder from major depressive disorder. Although no individual symptom discriminates between diagnoses, it was possible to construct a model with significant predictive value even in the absence of information about manic or hypomanic symptoms. This approach may be particularly applicable in the initial identification of depressed patients at high risk for a bipolar course who could be monitored more closely after antidepressant initiation. The suggestion of subtle differences between symptoms can also inform future studies that could attempt to distinguish neurobiological features underlying the two forms of depressive episodes (42).
Received Jan. 10, 2005; revision received March 23, 2005; accepted May 13, 2005. From the Bipolar Clinic and Research Program, Massachusetts General Hospital and Harvard Medical School; and Eli Lilly and Company, Indianapolis. Address correspondence and reprint requests to Dr. Perlis, Massachusetts General Hospital, ACC 812, 15 Parkman St., Boston, MA 02114; email@example.com (e-mail).Dr. Perlis is supported by NIMH grant K23 MH-067060. Dr. Perlis has received honoraria or consulting fees from AstraZeneca, Bristol-Myers Squibb, Eli Lilly and Company, GlaxoSmithKline, and Pfizer. Dr. Nierenberg has received honoraria, consulting fees, or grant support from Eli Lilly and Company, GlaxoSmithKline, Janssen, Shire, Innapharma, Wyeth, Cyberonics, Lichtwer, Cederroth, and Forest. Drs. Brown and Baker are employees of, and stockholders in, Eli Lilly and Company.
Least Squares Mean Scores From Individual Baseline Montgomery-Åsberg Depression Rating Scale (MADRS) Items, Adjusted for Total MADRS Scorea
aBipolar patients differed significantly in both studies of major depressive disorder (p<0.05).
Least Squares Mean Scores From Individual Baseline Hamilton Anxiety Rating Scale Items, Adjusted for Total Montgomery-Åsberg Depression Rating Scale Scorea
aBipolar patients differed significantly in both studies of major depressive disorder (p<0.05).
Receiver-Operating Characteristic Curve for Model of Bipolar Depression Versus Major Depressive Disordera
aModel corresponds to Table 2 (family history of bipolar disorder, age at onset of illness, previous number of depressive episodes, Montgomery-Åsberg Depression Rating Scale [MADRS] item 1 score, Hamilton Anxiety Rating Scale score on items 3, 4, 5, 7, 10, 12, and 14) but not MADRS total score.