The term "mood stabilizer" is in wide use in the context of treating bipolar disorder. However, the U.S. Food and Drug Administration (FDA) does not officially recognize the term, and investigators have no consensus definition. Some have suggested that an agent is a mood stabilizer if it has efficacy in decreasing the frequency or severity of any type of episode in bipolar disorder and if it does not worsen the frequency or severity of other types of episodes (1–3). Others have proposed a more stringent definition that requires that an agent possess efficacy in treating both manic and depressive symptoms (4). To our knowledge, no reviews have systematically assessed evidence from clinical trials in light of either of these definitions.
Based on the treatment needs of persons with bipolar disorder, we propose—similar to the more stringent definition—that an agent be considered a mood stabilizer if it has efficacy in each of four distinct uses: 1) treatment of acute manic symptoms, 2) treatment of acute depressive symptoms, 3) prevention of manic symptoms, and 4) prevention of depressive symptoms.
Doesn’t this "two-by-two" definition set the bar too high? Can we expect any single agent to treat all phases of bipolar disorder? It is clear that monotherapy is the exception rather than the rule (5, 6). Clinical practice has of necessity moved toward rational polypharmacy for various situations (3), and this approach has been endorsed by numerous evidence-based (e.g., references 7–10) and survey-based (11) clinical practice guidelines. Nonetheless, the two-by-two conceptualization clarifies characteristics of the ideal agent.
What about psychotic symptoms? These symptoms are not considered in the definition because they are not a core aspect of bipolar disorder, they occur in a minority of cases, and they are not required for the diagnosis, according to DSM-IV-TR.
What about mixed and hypomanic episodes? Debate continues regarding the definition of mixed episodes and the significance of dysphoria during hypomanic or manic episodes (12, 13). Most studies treat mixed or dysphoric mania as a subset of mania rather than of depressive episodes in terms of phenomenology (12, 14) and treatment response (12, 15, 16). The core symptom sets for mania and hypomania are identical, and the syndromes are distinguished only by the presence of functional impairment, the need for hospitalization, or psychosis in mania (DSM-IV-TR). Therefore, we consider mixed and hypomanic episodes within the overall category of mania and note results for these episodes separately.
Having proposed this conceptualization a priori, we reviewed all available English-language, peer-reviewed, controlled clinical trials of agents used for bipolar disorder through mid-2002 against this definition. We took an "FDA-like" approach, identifying as mood stabilizers those agents with at least two positive placebo-controlled trials in each of the four uses. We subsequently varied this threshold in a sensitivity analysis to provide relaxed and more stringent criteria and, where evidence permitted, also addressed the looser definition of mood stabilizer.
We sought to locate all pharmacologic treatment studies for bipolar disorder that fulfilled the following criteria: 1) published in a peer-reviewed journal through June 2002; 2) investigated an intervention in a group of patients with bipolar disorder (N=4 or more) or reported separately results for subgroups of bipolar disorder subjects from a diagnostically heterogeneous group; 3) specified quantitative outcome variables and formal statistical analyses; and 4) reported in English.
Literature databases, including MEDLINE, PsychLIT, and the Cochrane Collaboration database, were searched. Authors known to be actively working in the field in the United States and Europe were contacted regarding further work in print or in press. The bibliography of each located article was searched for additional articles. This step was repeated iteratively until no further unreviewed references were found.
From the trials that were located we culled class A studies (controlled trials), defined according to the criteria of the U.S. Agency for Healthcare Research and Quality (formerly the U.S. Agency for Health Care Policy and Research) (17) as operationalized in prior studies (9, 18) and summarized in t1. Evidence-grading agreement between co-authors was high (kappa=0.85).
The class A trials were further categorized according to the four uses. Trials of treatments for acute manic (including hypomanic or mixed episodes) or depressive episodes included subjects who were acutely ill and examined the effects of treatment during a single episode of illness. Prophylactic trials (we use this term rather than "continuation" or "maintenance," since differentiation between states of recovery and remission [19] was not typically made) typically included subjects with some level of subacute symptoms and examined the effects of treatment over periods of months to years.
Trials were also classified according to the use of placebo versus active control and according to whether they used parallel-groups or within-subjects (ABA) designs alternating agent (A) and no agent, placebo, or active control (B). They were also separated according to whether they examined use of monotherapy or combination therapy, with the latter group including studies of subjects who were receiving an ongoing medication regimen to which an additional agent (or control condition) was added.
The trials were also classified according to agent of interest. With placebo-controlled trials, identification of the agent of interest was straightforward. In trials in which two agents were compared to placebo, both were listed as agents of interest. With active control trials (e.g., lithium versus chlorpromazine or carbamazepine versus lithium for acute mania), a judgment of which agent was the agent of interest and which was the established comparator was made on the basis of study design features reported in the article.
Traditional reviews treat study findings as qualitative data and integrate them according to the reviewer’s judgment of the studies’ relevance, quality, etc. On the other end of the spectrum, meta-analytic techniques treat study findings in aggregate as an interval datum, converting all to a common metric and making a summary judgment based on statistical techniques (20). Recent meta-analytic reviews of treatments for bipolar disorder have evaluated the efficacy of carbamazepine in prophylaxis (21), lithium in acute mania (22), lithium in prophylaxis (23), and valproate in maintenance (24).
Although meta-analytic techniques could potentially be used for establishing each individual use considered in our definition of mood stabilzers, they cannot easily be used for the synthesis across uses. Moreover, we anticipated small numbers of class A trials in some use subcategories for several agents. We therefore treated trial findings as categorical data (positive/negative). As our a priori criterion we opted for an FDA-like approach, requiring at least two placebo-controlled trials with positive findings to be considered efficacious in that use. For placebo-controlled trials, the positive-trial criterion was a better effect of the agent of interest than of placebo at p<0.05 (for parallel-groups trials) or a ≥50% relapse rate with placebo substitution (for ABA trials). For active control trials (parallel-groups or ABA trials), the positive-trial criterion was a better effect of the agent of interest than of the comparator or an effect of the agent of interest that was equal to or better than that of the comparator and better than baseline at p<0.05.
Trials were categorized as positive or negative on the basis of the primary outcome variables. If independent analyses were done for several uses or agents, the trial was categorized for each separately (e.g., two agents versus placebo) and noted as such in the results tables. Most prophylaxis trials measured efficacy in preventing relapse to either (hypo)mania or depression rather than to each type of episode separately; where these latter more specific outcomes could be identified among primary or secondary analyses, each was used as a primary outcome variable.
After having analyzed findings for agents according to our main criterion, we conducted a sensitivity analysis both by raising the threshold to use only parallel-groups, placebo-controlled class A trials and by lowering the threshold to include active control class A trials. We also reviewed available data for mixed and hypomanic episodes. Finally, we reviewed the evidence according to a broader mood stabilizer definition suggested by some authors (1–3).
Among several thousand citations, we located 551 candidate articles, which provided 111 class A trials (81 monotherapy trials, 30 combination therapy trials; the findings from the combination trials are not presented because of space limitations, but the results are available from the first author on request). Monotherapy trials provided 95 independent analyses, including 48 for treatment of acute mania, 16 for acute depression, and 31 for prophylaxis.
Several classic trials were excluded because of a lack of study group statistics (e.g., references 25 and 26) or a lack of separation of bipolar disorder subjects from subjects with other disorders (e.g., reference 27). Several earlier reports were superseded by later extensions of the same dataset (e.g., references 28 and 29). Two prophylaxis trials were included, although they focused on subsets of bipolar disorder patients, such as those with rapid cycling (30) or with comorbid borderline personality disorder (31). One trial of treatment for acute mania was excluded because it lasted for only one day (32).
The trial characteristics and results are summarized in t2, t3, and t4. For treatment of acute mania, at least one positive placebo-controlled trial supported the efficacy of lithium, valproate, haloperidol, olanzapine, clonazepam, verapamil, or lecithin. In addition, evidence from at least one active control trial supported the efficacy of carbamazepine, lamotrigine, pimozide, risperidone, lorazepam, ECT, ethylenediaminetetraacetate (EDTA) plus ascorbic acid, l-tryptophan, and d,l-propranolol.
In acute depression, at least one positive placebo-controlled trial supported the efficacy of fluoxetine, imipramine, lithium, lamotrigine, piribedil, and ascorbic acid. Evidence from at least one active control trial supported the efficacy additionally of tranylcypromine, ECT, a low-vanadium diet, and EDTA plus ascorbic acid.
At least one positive placebo-controlled trial supported the prophylactic efficacy of lithium, valproate, and lamotrigine, with one trial indicating a trend toward significant effects of carbamazepine versus placebo (103). No additional agents were supported by evidence from active control trials. However, one trial showed equivalence among low- and high-dose carbamazepine and lithium for prophylaxis of both mania and depression (105), and one trial showed equivalence between carbamazepine and lithium (104). These are not ranked as positive trials because they did not meet the a priori criteria of both including data indicating improvement over baseline and exceeding effects of comparator(s).
+
Candidate Mood Stabilizers According to the "Two-by-Two" Definition
Data for agents that had at least two class A trials (positive or negative), as shown in t2, t3, and t4, and that were therefore candidates for mood stabilizer status, are summarized in t5, t6, and t7. These tables are organized with the highest quality trials (placebo-controlled, parallel-groups trials) listed in the leftmost column, other placebo-controlled (ABA) trials in the next column to the right, and active control trials following rightward.
The trial quality threshold can be conceptually "moved" to the left to raise the evidence threshold or to the right to relax the requirement in a post hoc categorical sensitivity analysis. Although this categorical analysis does not formally distinguish the weight of evidence on the basis of sample size (as a meta-analysis would indirectly do), interpretation of the findings can be tempered by inspection of such considerations, as noted below. Further, no notation is made as to whether a trial was adequately powered for specific comparisons. In most trials the authors did not comment on power, although the authors who did comment sometimes noted insufficient power (e.g., in a study of lithium versus placebo for prophylaxis [100]); nevertheless, by convention such trials are listed, as are those for which no comment on power is available.
For acute mania, lithium, valproate, and olanzapine unequivocally meet the definition criteria. They continue to meet the criteria if the threshold is raised to include only parallel-groups, placebo-controlled trials. Verapamil presents an equivocal case, which can easily be detected with this data array method. Its antimanic efficacy is supported by two positive placebo-controlled trials; however, the total number of subjects in these two trials, 19, is less than the 32 treated in a higher-quality negative trial. If the criterion is relaxed to include also active control trials, carbamazepine and clonazepam are included, and the evidence for verapamil is strengthened somewhat. Note also that chlorpromazine, the comparator for lithium in early trials, is also included by virtue of its being equal to lithium and better than baseline in five trials summarized in t2(35, 37–39, 41), although it is inferior to lithium but still better than baseline in two trials (36, 40). Haloperidol is also included by virtue of one placebo-controlled trial (55) and one active control trial (41).
For acute depression, only lithium is supported by placebo-controlled trials; however, none of these trials had a parallel-groups design. Relaxing the criterion to allow active control trials allows tranylcypromine and ECT to be included.
For prophylaxis, lithium and lamotrigine are each supported by at least two placebo-controlled trials, with the two lamotrigine trials providing analyses without regard to relapse polarity. Lithium is supported by five trials analyzing relapse to either episode, plus two trials reporting relapse specifically to mania and two reporting relapse specifically to depression; one trial, although underpowered for lithium (100), provided negative data for each polarity, and one trial (97) also provided negative data for depression prophylaxis. Evidence for a prophylactic role for valproate is equivocal, with primary analyses negative for mania and depression in the large trial of Bowden and co-workers (102), plus another trial that included subjects with comorbid personality disorder and that reported positive results for manic-like symptoms and negative results for depressive symptoms using measures relevant to, but not specific for, manic and depressive episodes (31).
Lithium also meets the higher threshold requiring parallel-groups, placebo-controlled evidence for manic (97, 99) and depressive (97, 98) episode prophylaxis, while lamotrigine does not (29). Relaxing the criterion to admit active control trials allows inclusion of three equivocal trials for carbamazepine that showed equivalence to lithium but no evidence for improvement over baseline.
F1 combines data from the foregoing tables for each of the four mood stabilizer uses according to the a priori definition and the sensitivity analyses. Only lithium fulfills the a priori definition. Relaxing the criterion to allow active control trials as evidence does not change this result. Raising the threshold to require at least two parallel-groups, placebo-controlled trials results in no single agent fulfilling the definition.
+
Subanalyses for Mixed Episodes and Hypomania
Four acute mania trials analyzed agent efficacy in mixed episodes or on depression ratings. Subjects with mixed episodes had similar outcomes to others in trials of olanzapine versus placebo (56, 57). In active control trials, depression ratings were not significantly different in comparisons of olanzapine versus valproate (59) and of carbamazepine versus lithium (48).
Four prophylaxis trials reported results specifically for hypomania. Lithium was more efficacious than placebo in preventing hypomanic episodes (99, 100). Nimodipine response did not differ between bipolar I disorder subjects and bipolar II disorder subjects (108). Lamotrigine demonstrated qualitatively better separation from placebo in bipolar II disorder subjects than in bipolar I disorder subjects with rapid cycling.
+
Looser Definition of Mood Stabilizer
The looser definition of mood stabilizer noted in the introduction—an agent with efficacy in decreasing the frequency or severity of any type of episode in bipolar disorder while not worsening the frequency or severity of other types of episodes—is actually difficult to evaluate in the clinical trials literature. While many agents have efficacy in at least one use (F1), few trials measured the outcomes of both manic and depressive symptoms concurrently, and lack of evidence of a negative effect does not mean no negative effect.
Two acute mania trials showed improvement in both manic and depressive symptoms, as noted earlier (48, 59), so it is unlikely that these agents worsen either type of episode, at least in the short term. Comparing data across controlled trials, no trials reported that the agent of interest had worse performance than placebo or that the outcome of treatment was worse than baseline. One therefore must rely on noncontrolled trial data to draw conclusions about an agent’s negative effects on course. This approach is clearly needed in the case of antidepressants (109), although the degree of risk from these agents versus other factors (110) and the specific agents of risk remain controversial (111). Thus, the looser definition of mood stabilizer is surprisingly difficult to address with class A data.
This study had several limitations. Only class A trials were reviewed, although much information, and indeed much clinical practice, is based on class B and C studies. The only quality weighting of individual trials was separating class A trials based on control (placebo versus active) and design (parallel-groups versus ABA). Trials were combined in categorical fashion, allowing small trials to carry the same weight as multisite trials with several hundred subjects. This issue of weighting is of clear importance in meta-analytic reviews that seek to reduce effect to a single quantity (20). The reader is encouraged to investigate in particular the Cochrane Collaboration web site (www.cochrane.org), in which meta-analytic studies for a wide range of treatments are presented, with weighting not only for size but for study quality. Trials were categorized as positive or negative on the basis of only the primary outcome variables, while secondary outcome variables may also be informative (102).
There are also several variables that are clinically relevant but that could not be addressed in this analysis. For instance, some (112, 113), although not all (114) studies suggested that among these agents, lithium may have a specific antisuicide effect. Such events are seldom reported in clinical trials and, on the basis of rates reported in clinical population-based studies (e.g., references 112 and 113), would be expected to be rare. Clinical epidemiologic methods will continue to help to address this important issue. We also could not address mechanisms of action, such as the intriguing finding that according to some studies, verapamil has an antimanic effect, even though it does not cross the blood-brain barrier and likely exerts its central effects indirectly (115). Finally, since the focus of this analysis was efficacy, side effects (also often not reported in earlier trials) were not addressed. However, it is clear that issues of tolerability are of substantial importance as one moves from the realm of efficacy trials to the effectiveness of an intervention in general clinical practice (116).
It must also be added that cohort effects and publication bias may affect these results, although these effects are due to the quality of the available literature and not to the methods of this specific study. In terms of cohort effects, agents that have been in use longer—e.g., lithium—have been better studied. The absence of positive data for efficacy for some of the newer agents should not be equated with the absence of efficacy. For instance, the first unequivocally positive trial of prophylaxis with valproate was published in 2002, and this study was a small trial involving an unusual study group (subjects with both bipolar II disorder and borderline personality disorder) (31). Many studies of bipolar disorder are in the "pipeline," and this analysis must be considered only a snapshot at this time.
Of additional interest in this regard is the relatively small number of negative studies. While the heterogeneity of the data preclude formal analysis by using funnel plots or similar methods (117), the small proportion of negative analyses published for mania (7/48, 14.6%), depression (1/16, 6.3%), and prophylaxis (8/31, 25.8%) leads the astute reader to consider whether the published literature tells the entire story—both now and for the future. Similar concerns about reporting bias against negative studies have been voiced for several years by our colleagues in medicine and public health (118).
All of these limitations were recognized at the outset. However, to address definitional issues and to cover a sample of trials this large—and to do so with uniformity—the above analytic decisions needed to be made. To compensate, we presented the data as explicitly as possible so that readers can formulate their own interpretations. Nonetheless, comparison with the conclusions of meta-analytic reviews indicates that our main interpretations are reasonable.
+
Key Findings and Comparison With Meta-Analyses
Among the most striking and clinically important findings are the large number of trials in acute mania and the large number of agents meeting the criteria for efficacy in acute mania. This large number of agents with efficacy for acute mania stands in marked contrast to the relative paucity of agents for acute depression. These findings underscore the " ‘bias’ towards the importance of mania over the last 50 years" (3). Moreover, the evidence for efficacy of agents in acute depression is the least robust, since raising the threshold resulted in no agents meeting the criteria for efficacy (F1).
The findings of our categorical review are in overall agreement with meta-analytic reviews that studied individual uses for agents. The findings include documentation of the efficacy for lithium in acute mania (22) and prophylaxis (23) and equivocal support for carbamazepine (21) and valproate (24) in prophylaxis.
+
Defining "Mood Stabilizer" and Identifying Needs of the Field
By our a priori definition, lithium alone currently fulfills the definition of mood stabilizer, and no other agents achieve that status even with the relaxed criteria (F1). It is important for clinicians and researchers to recognize the strength of the evidence for lithium’s efficacy, particularly in acute mania and prophylaxis, as well as the gaps in efficacy data for other more widely prescribed agents. Taking acute and prophylactic needs together, our analysis supports a role for lithium as first-line treatment for bipolar disorder.
Nevertheless, lithium monotherapy has been the exception rather than the rule (5, 6), and its introduction has had only modest effects on various aspects of outcome for bipolar disorder (119). These studies provide prima facie evidence of the substantial need that lithium has not met, and additional strategies must be developed.
Specific needs include most prominently additional agents with efficacy for acute depression, continued exploration of prophylaxis alternatives for cases where lithium fails or is not tolerated, and greater exploration of combination therapies in high-quality controlled clinical trials. These analyses identify potential components for trials combining agents "from above" and "from below"(respectively, for manic symptoms and for depressive symptoms, per reference 3). They also suggest a second look at several forgotten agents that may yet hold promise (e.g., verapamil, ascorbic acid).
Finally, in addition to providing a standardized review of a wide array of agents used in bipolar disorder, this report suggests a conceptual framework for the field by providing an evidence-based definition for the commonly used but ill-defined term "mood stabilizer." This conceptualization provides a yardstick by which to assess other potential agents and combinations.
+
From Clinical Trials to Clinical Practice
An analysis such as this presents the evidence basis from only published class A controlled trials. This literature addresses only a small proportion of the needs encountered in clinical practice, as inspection of the major evidence-based guidelines for bipolar disorder has revealed (7–10). Some suspect that the limited scope of clinical needs addressed in the literature is at least in part responsible for the poor rates of adherence to mental health clinical practice guidelines (120). Thus, the caveats that apply to the class A clinical trials literature also apply to the conclusions of this study.
We as a field are therefore left with the important questions: What is the applicability of these findings to those who are too ill or otherwise do not qualify or wish to participate in randomized, controlled trials? What of those for whom first-, or second-, and third-line treatments have failed? What of those who cannot tolerate treatment with these agents? As the field becomes more cognizant of the limitations of efficacy data in general (116, 121, 122), we must begin to consider these issues in "messier" effectiveness samples, as several large studies are currently doing (116, 123, 124).
What role, then, can these results play? This analysis may provide a kernel of organization, both conceptually and in terms of data, on which further efficacy and effectiveness studies can build. It can assist investigators in identifying lacunae in the current clinical trials literature that are not readily apparent from comparing multiple review articles with disparate assumptions and methods.
This review can also assist the clinician in evaluating claims from various quarters about this or that agent being a "mood stabilizer." This review will have served its purpose if its readers, rather than accepting or rejecting the claims at face value, respond rather with the next logical questions: "For what specific use? How strong are the data?"
Earlier versions of this work were reported at the 155th annual meeting of the American Psychiatric Association, Philadelphia, May 18–23, 2002. Received Jan. 22, 2003; revision received June 11, 2003; accepted June 13, 2003. From the Department of Psychiatry and Human Behavior, Brown University; and Veterans Affairs Medical Center, Providence, R.I. Address reprint requests to Dr. Bauer, VAMC-116A, 830 Chalkstone Ave., Providence, RI 02806; Mark_Bauer@brown.edu (e-mail). Supported in part by VA Cooperative Study 430.