A century after Kraepelin’s classic work, controversy persists about the natural history of manic-depressive illness. In this issue of the Journal, a study from McLean Hospital by Tohen and associates presents new data on manic patients followed for up to 4 years after their first hospitalization. This sample was mostly white (85%), predominantly of middle and upper socioeconomic status, and relatively old in terms of age at illness onset (mean=32 years), and subjects were mainly psychotic (89%) and exhibiting pure mania (75%) and low psychiatric comorbidity (8%, excluding substance abuse). This cohort might be thought of as "Kraepelinian," since it has many features that coincide with the kinds of patients Kraepelin described. Grof et al. (1) has suggested that such "classic" Kraepelinian patients may have a different (and less severe) course than less classic populations (e.g., those with earlier age at onset, more mixed states, less psychosis, more comorbidity, and more variable racial and socioeconomic subgroups—such as in the Stanley Network cohort) (2). With this Kraepelinian cohort, we observe almost universal recovery from the acute episode (98%), as Kraepelin famously claimed. However, even in this relatively good prognosis cohort, compared to Kraepelin’s observations, fewer patients (72%) achieved full symptomatic recovery (i.e., were in complete remission), and even fewer achieved functional recovery (43%). One important conclusion is that full symptomatic recovery (remission) does not guarantee functional recovery. Why do euthymic bipolar patients remain functionally impaired? What nonpharmacologic interventions are needed?
Methodologically, the authors are to be commended for including detailed treatment data in this course of illness study. Unfortunately, papers about the "natural" course of bipolar disorder all too often give no accounting of treatments. The days of true natural history studies are over, since almost all samples today include a large proportion of patients who are being or have been treated. Clearly an accounting of that treatment is necessary to understand what we are observing: how much is natural history, and how much is the effect of treatments? Such data are also important in themselves in providing evidence regarding the real-world impact of medications in nonclinical trial settings.
For instance, it is notable that antidepressants were not infrequently prescribed at discharge following the index manic episode, presumably for the depressive component of mixed symptoms or perhaps in anticipation of a postmanic episode of depression. The use of antidepressants under these circumstances has been observed in numerous studies (3) despite evidence that antidepressants may worsen mixed states (4). Further, antidepressants were associated with more rapid relapse than other agents, as will be subsequently described.
The antidepressant effect may suggest an association with increased mood cycling, as suggested by other reports (5). However, this association was statistically nonsignificant, which might lead some readers to discount it. This raises another methodological point. The use of p values to assess statistical significance does not imply that an observation is "real" or not, except in the primary analysis of a randomized clinical trial. In that setting, there is a hypothesis (e.g., drug is more effective than placebo) tested by using a p value. However, in observational studies, much of the observed data are not predicted beforehand or are not specifically chosen as primary hypotheses. In those cases, p value comparisons are ambiguous. Since relevant specific hypotheses were not made before certain observations and analyses were conducted, one cannot justifiably assert that one is engaged in "hypothesis testing."
A better approach, recommended by mainstream epidemiology today (6), is to use effect estimate methods. Effect estimates are simply the observations (e.g., antidepressant use was associated with mean of 12.4 weeks until 25% of patients had a new episode, versus 30.7 weeks seen with lithium use). Further, one reports the range of effect estimates one would expect to be observed in repeated studies (based on the observed effect and standard error), i.e., the 95% confidence interval. In the case of antidepressants, the range of possible effect estimates is as low as 5.1 weeks and as high as 31.3 weeks. In the case of lithium, the range is as low as 20.9 weeks and as high as 43.6 weeks. Inspection of the confidence intervals suggests that indeed there may be a difference here: lithium seems to have a much longer range of likely delay until onset of a new episode. Perhaps the best way to make this point is to translate observed mean effects into relative risks (in this case a hazard ratio with confidence intervals). Thus, antidepressants appear to lead to relapse almost twice as fast as lithium.
Even with these analyses, these are observational data and are thus liable to confounding effects, i.e., other variables that could explain or contribute to the outcome. Statistical methods of adjustment for confounds include multivariable regression analysis. Too frequently, even in prominent journals, observational studies are published without any attempt to adjust the results for confounding variables. This is an important flaw that makes large chunks of the published literature essentially uninterpretable. We are surprised how frequently clinicians and even researchers are not aware of the basic rationale for the higher validity of randomized studies, i.e., control for confounding variables. For instance, if we think that female gender is associated with more recovery from the index manic episode, we need to consider the possibility that something else might explain the effect: perhaps most female subjects in our study are older than the male subjects, and thus the apparent gender effect is in fact an age effect. Obviously in randomized studies, all variables are equally distributed and thus the experimental effect under study (e.g., drug versus placebo) is not liable to confounding. Outside of randomized datasets, however, statistical analyses (like stratification or regression), as done in the study by Tohen et al., should be the norm. It would have been even more informative if medications (like antidepressant use) were included in regression models to assess their effects on course.
Using this appropriate methodology, Tohen and colleagues identify that better outcome seems associated with female gender, older age, less depression, and shorter index hospitalization. Further, among other predictors, lower occupational status appeared to predict relapse into a manic episode, while higher occupational status predicted relapse into a depressive episode. These data confirm previous suggestions that indeed there is a better prognosis for subjects who are female, older, and have a less depressive presentation. The mechanisms for these predictors deserve more research. Is better prognosis in female subjects a reflection of better compliance (7)? The influence of occupational status on polarity of recurrence is unique and needs to be replicated. It could be a chance finding, and indeed the mechanism of such an association, if real, seems quite unclear. A relevant point is that the functional assessments were conducted by means of a telephone interview. Future research might expand on this topic by using more detailed functional assessments (8).
In another paper in this issue related to bipolar disorder, Maj and colleagues focus on assessing the diagnostic validity of agitated depression. Standard methods for diagnostic validation were used to compare agitated and nonagitated depression groups, and there were differences found in phenomenology, no differences in family history, a possible difference in course, and a probable difference in treatment response. However, all of these conclusions are based on observational data unadjusted for confounding variables and thus are not definitive. For example, the survival curve suggesting a different course for agitated versus nonagitated depression would be more definitive if it was the result of a Cox regression model adjusted for potential confounding factors.
The main finding of this report is that about two-thirds of patients with agitated depression also experienced a number of manic symptoms, which could be considered a mixed state. Yet definitive conclusions about the validity of this subgroup (which we might call "mixed-state agitated depression" as opposed to "non-mixed-state agitated depression") cannot be drawn because the diagnostic validator comparisons were not made between this mixed-state subgroup and the non-mixed-state subgroup of patients with agitated depression (nor was there a comparison between the mixed-state subgroup and the nonagitated depression group). This is unfortunate, since it is an important question whether it is legitimate to see agitated depression as a mixed state, as Kraepelin and Weygandt suggested. If so, antidepressants would be less indicated and mood stabilizers more so. Further, changes in the DSM-V nosological schema would seem to be in order on this point.
Nevertheless, these studies deserve our attention for their possible clinical relevance, notably the relatively poor functional recovery rate in manic patients even in those with a good prognosis; the worse prognosis noted especially in younger, male, and depressed subjects; the observation that antidepressants may speed up the frequency of mood episodes; and the subgroup of patients with agitated depression in bipolar disorder who may be experiencing a mixed state.
Address reprint requests to Dr. Goodwin, Center on Neuroscience, Medical Progress, and Society, Department of Psychiatry, George Washington University, 2150 Pennsylvania Ave., NW, 8th Floor, Washington, D.C. 20037.