On average, antidepressant medication and specific psychotherapies have similar success in the first-line treatment of moderate depression (1—3). And on average, different antidepressants show equal or similar efficacy (4, 5). But the fact that treatments have similar efficacy on average does not imply that treatment selection is unimportant (6, 7). Individuals vary widely in response to specific treatments, and poor response to one treatment does not necessarily imply poor response to others. For example, among patients who do not benefit from initial treatment with one antidepressant, up to one-half experience significant improvement after switching to an alternative medication (8), adding a second medication (9), or adding psychotherapy (10). Unfortunately, many patients treated in community practice, especially in primary care, have no chance to benefit from second-line treatments. After starting antidepressant treatment, nearly one-half make no follow-up visits, and only one-quarter return often enough to pursue additional treatment options (11, 12). Accurate selection of the best initial treatment could have tremendous benefits for people living with depression.
Personalized medicine promises to move beyond data regarding the average effectiveness of treatments to identify the best treatment for any individual. In order to provide personalized medicine for depression, we must identify characteristics of individuals that reliably predict differences in benefits and/or adverse effects of alternative depression treatments, including both biological and psychosocial treatments. These personalizing factors might include sociodemographic characteristics, clinical characteristics (such as symptom patterns or comorbidities), and biological markers (such as neuroimaging or genetic variation).
This review examines evidence that specific patient characteristics can guide selection of initial treatment for adult outpatients with unipolar depression. We begin by presenting a conceptual model for personalized treatment in order to clarify the type of evidence relevant to treatment selection. We then consider the following three specific clinical decisions: the choice between psychotherapy and antidepressant medication, selection of a specific antidepressant medication, and selection of a specific psychotherapy. For each of these three decisions, our review first clarifies the types of evidence that can and cannot inform treatment selection. We then review potentially informative evidence regarding specific factors hypothesized to inform initial treatment choice. Since fewer data exist to guide subsequent treatment choices, we do not consider second-line therapies or therapies for treatment-resistant depression.
We hope to identify measurable characteristics of individual patients that can guide selection of treatment. Our question concerns differential efficacy (Do patients with Characteristic X show better response to Treatment A than to Treatment B?). This is often expressed as a moderator effect (Does Characteristic X moderate the difference in response rates between Treatment A and Treatment B?). It is essential to distinguish moderators or predictors of differential efficacy from more general predictors of depression outcome (13). Previous research has often conflated these two concepts.
Two study designs could produce the evidence needed to personalize treatment selection. First, we might compare alternative treatments in an unselected group of patients and examine whether a specific patient characteristic moderates the relationship between treatment type and outcome (14—16). One example would be a randomized trial comparing cognitive therapy and interpersonal psychotherapy in which we examine whether co-occurring personality disorder moderates (or interacts with) the effect of treatment type (17). Second, we might select a group of patients with a specific characteristic and then compare outcomes in patients receiving alternative treatments. One example would be a study limited to patients with depression and co-occurring personality disorder in which we compare cognitive therapy and interpersonal therapy (18). The former strategy is more flexible, allowing study of multiple potential moderators, including moderators (such as genetic variations) that were not identified prior to treatment. The latter strategy (limiting the sample to patients with a specific characteristic of interest) may be more efficient, but it only permits study of a single predictor or potential moderator that is identified in advance.
If a study does not include a direct comparison of alternative treatments, it cannot accurately identify moderators or predictors of differential treatment response. For example, if we study a cohort of patients receiving medication A and observe that characteristic X predicts better outcomes, we cannot determine whether characteristic X is a true moderator (predicting differential response to medication A compared with some alternative medication), a general predictor of good response to any medication, or simply a general predictor of good prognosis regardless of treatment. Alternatively, if we conduct a randomized trial comparing medication A with placebo and observe that characteristic X predicts a greater drug-placebo difference, we still cannot determine whether characteristic X is a true moderator (predicting differential response to medication A compared with some alternative medication) or simply a general predictor of good response to any medication. Since neither of these hypothetical studies includes a comparison of medication A with a specific alternative, neither could possibly yield data to inform the choice between medication A and any alternative.
This distinction between general predictors of prognosis, general predictors of treatment response, and predictors of differential treatment response (true moderators) is illustrated in Figure 1. Characteristic X might predict better outcome regardless of treatment or might predict better outcome with any treatment or might predict better outcome with treatment A than with treatment B. Only in the third situation could we conclude that characteristic X can guide our choice between treatments A and B.
General and Differential Predictors (Moderators) of Treatment Response
We can also illustrate this distinction by examining the evidence that depression severity predicts better response to a specific antidepressant medication. Regardless of treatment, more severe depression at baseline predicts a poorer outcome (19, 20). In contrast, more severe depression at the initiation of treatment typically predicts greater benefit from medication compared with placebo (21, 22), perhaps because benefits of treatment are more apparent in those with a poorer general prognosis. Finally, more severe depression does not appear to predict better response to any specific antidepressant relative to others (4). More severe symptoms prior to starting treatment are therefore negative predictors of overall prognosis, positive predictors of benefit from medication in general (relative to placebo), and null predictors of differential response to specific medication. Severity of depression may have clinical utility as a predictor of benefit from treatment, but it has no apparent utility for predicting better response to one antidepressant over another.
Because identifying moderators or differential predictors requires a comparison of alternative treatments, it is often linked (both conceptually and practically) to comparative effectiveness research. Comparative effectiveness research examines the average effects of alternative treatments, while research to personalize treatment examines individual characteristics predicting differential response. In statistical terms, comparative effectiveness research considers main effects while personalized treatment research considers moderators or interaction effects.
We should first emphasize that many previous studies cited as guides to choosing between psychotherapy and pharmacotherapy do not directly address this clinical decision. Examples of these include cohort studies of patients treated with cognitive therapy or interpersonal psychotherapy showing that sleep EEG abnormalities or dexamethasone nonsuppression predict poorer outcomes (23—25). Studies of patients receiving a single treatment cannot determine whether these biomarkers are true moderators of treatment efficacy (i.e., specifically predict poorer response to psychotherapy than to pharmacotherapy) or simply general predictors of poor prognosis (i.e., predict poorer outcome with any treatment).
Three reports (26—28), including data from six randomized trials, compared the efficacy of specific antidepressants with that of specific psychotherapies among patients with more severe depression, defined as a Hamilton Depression Rating Scale (HAM-D) score ≥20 (Table 1). None of these found a significant advantage of either pharmacotherapy or psychotherapy.
Randomized Comparisons of Antidepressant Medication and Specific Psychotherapies in Outpatients With Severe Depressiona
| Add to My POL
|Study||Medication||Psychotherapy||Medication vs. Psychotherapyb||Analysis (test statistic for difference)|
|DeRubeis et al. (26)||Imipramine or nortriptyline (N=102)||Cognitive-behavioral therapy (N=67)||13 vs. 12||t=0.43, p=0.67|
|DeRubeis et al. (27)||Paroxetine (N=120)||Cognitive therapy (N=60)||13 vs. 14||F=0.56, df=1, 231, p=0.46|
|Dimidjian et al. (28)||Paroxetine (N=57)||Cognitive therapy (N=25)||8.6 vs. 10.3||n.s.|
|Dimidjian et al. (28)||Paroxetine (N=57)||Behavioral activation therapy (N=60)||8.6 vs. 7.6||n.s.|
Six reports (29—34), including data from three randomized trials, examined clinical characteristics as moderators of response to specific medications versus specific psychotherapies (Table 2). Two such analyses (29, 31) found that personality disorder or maladaptive personality traits predicted better response to selective serotonin reuptake inhibitors (SSRIs) than to cognitive therapy or cognitive-behavioral therapy (CBT), while a third analysis limited to patients with chronic depression (32) found no such moderator effect. In a comparison of paroxetine and cognitive therapy (30), recent life stress, unemployment, and being married or living with a partner predicted more favorable outcome with psychotherapy. Among patients with chronic depression (33), a history of childhood trauma predicted better response to psychotherapy than to nefazodone, while those without trauma histories showed the opposite pattern. In this same trial (34), a stated preference for psychotherapy or pharmacotherapy strongly predicted better response to the preferred treatment, but this analysis was limited to the small minority of patients expressing such a preference.
Randomized Comparisons of Antidepressant Medication and Specific Psychotherapies Evaluating Specific Clinical Characteristics as Moderators
| Add to My POL
|Study||Proposed Moderator||Medication||Psychotherapy||Outcome||Medication vs. Psychotherapy||Analysis (Test Statistic for Interaction)|
|Fournier et al. (29)||Personality disorder||Paroxetine (N=120)||Cognitive therapy (N=60)||HAM-D response rate||66% vs. 44%||49% vs. 70%||χ2=6.8, p=0.009|
|Maddux et al. (32)a||Personality disorder||Nefazodone (N=226)||Cognitive-behavioral analysis therapy (N=228)||Mean HAM-D posttreatment score||F=0.88, df=2, 473, p=0.41|
|Bagby et al. (31)||High neuroticism||SSRI (N=129)||Cognitive-behavioral therapy (N=146)||Mean HAM-D posttreatment score||8 vs. 5||6 vs. 6||t=2.12, p=<0.04|
|Fournier et al. (30)||Recent life stress||Paroxetine (N=120)||Cognitive therapy (N=60)||Mean HAM-D posttreatment score||12. vs. 6||9 vs. 10||t=2.17, p=0.03|
|Fournier et al. (30)||Unemployment||Paroxetine (N=120)||Cognitive therapy (N=60)||Mean HAM-D posttreatment score||15 vs. 6||7 vs. 8||t=3.04, p=0.003|
|Fournier et al. (30)||Single marital status||Paroxetine (N=120)||Cognitive therapy (N=60)||Mean HAM-D posttreatment score||10 vs. 10||12 vs. 4||t=3.13, p=0.002|
|Nemeroff et al. (33)||Childhood trauma||Nefazodone (N=226)||Cognitive-behavioral analysis therapy (N=228)||Mean HAM-D posttreatment score||38% vs. 48%||40% vs. 29%||χ2=6.9, p=0.009|
|Kocsis et al. (34)||Preference for psychotherapy||Nefazodone (N=24)||Cognitive-behavioral analysis therapy (N=32)||Mean HAM-D posttreatment score||8% vs. 50%||45% vs. 22%||χ2=13, p=0.04|
Two trials examined response to combined treatment, finding that comorbid personality disorder predicted greater benefit from pharmacotherapy combined with either interpersonal psychotherapy (35) or brief psycho-dynamic psychotherapy (36) compared with pharmaco-therapy alone.
Clinical Predictors of Differential Benefit From Antidepressant Medications
Two lines of research have examined symptom patterns as predictors of differential response to alternative antidepressants. McGrath et al. demonstrated that the pattern of atypical depression (oversleeping, overeating, anergy, rejection sensitivity) predicted better response to phenelzine than to imipramine (37). More recent research indicates that this symptom pattern does not predict better response to SSRIs than to imipramine (38), and thus relevance to current practice is limited. Numerous trials have examined whether pretreatment anxiety or insomnia predicts better response to drugs thought to have sedating or anxiolytic effects. In a systematic review of 13 published reports including 3,114 patients, Gartlehner et al. (4) found no evidence that higher levels of either insomnia or anxiety predicted differential response to alternative antidepressant drugs. Papakostas et al. (39) reanalyzed pooled data from 10 trials including 1,275 patients with depression and high levels of anxiety, finding slightly more favorable outcomes (approximately 1 point on the HAMD) with various SSRIs compared with bupropion.
Past treatment response is often suggested as a guide to medication selection (40). Surprisingly, almost no empirical data exist regarding consistency of response to specific medications over time. Remillard et al. (41) described 59 inpatients with a history of good antidepressant response in a prior inpatient episode. Of 35 patients prescribed the same medication, 57% responded. Of 24 patients prescribed a different antidepressant, 79% responded. We are aware of no other studies examining whether history of favorable response to a specific medication predicts another favorable response in a subsequent treatment episode. Prior treatment during the current depressive episode has also been proposed as a guide to medication selection. Several studies have evaluated this question with respect to SSRIs, but none found that poor response to one SSRI drug predicted response to a subsequent drug in the same class. For example, the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial (8) found that poor response to citalopram did not predict differential probability of subsequent response to sertraline, bupropion, or venlafaxine.
Biomarker Predictors of Differential Benefit From Antidepressant Medications
Early research on biologic predictors of antidepressant response examined markers of neurotransmitter production or metabolism. Preliminary studies indicated that urinary 3-methoxy-4-hydroxyphenylglycol levels might predict differential response to adrenergic versus serotonergic antidepressants (42), but these findings were not replicated (43).
Subsequent research examined a range of biological predictors of treatment response, including endocrine measures (markers of corticosteroid, neuroactive steroid, and thyroid activity), neuroimaging measures, and electroencephalographic measures. While several of these measures were found to predict overall prognosis or general treatment response, none were found to predict greater response to one antidepressant (or type of antidepressant) than to another (13, 44).
Genetic Predictors of Differential Benefit From Antidepressant Medications
As we have noted, many previous studies cannot, by design, inform the selection of one antidepressant over another. This group includes numerous cohort studies examining the associations between specific genetic variations and favorable treatment outcomes in patients treated with a single antidepressant (45—52) or class of antidepressants (53, 54). Cohort studies of this type cannot distinguish between general predictors of prognosis and true moderators that predict differential response to alternative drugs. Similarly, studies identifying genetic variations associated with better response to one active antidepressant than to placebo (55) cannot distinguish general predictors of benefit with antidepressant treatment from specific predictors of differential response to alternative medications. Because none of the aforementioned reports compare alternative active treatments, none can provide evidence to personalize antidepressant selection.
Relatively few studies have examined the association between genetic variations and outcomes in patients treated with alternative antidepressants. These include both randomized comparisons of alternative antidepressants (56—59) and nonrandomized comparisons (60, 61). In some cases, the pattern of associations is consistent with presumed mechanisms of drug action. For example, Kim et al. (60) found that response to an SSRI (either fluoxetine or sertraline) was significantly associated with variation in serotonin transporter genes but not with variation in norepinephrine transporter genes. Using data from a randomized comparison of the two drugs, Szegedi et al. (56) and Tadic et al. (57) found that variations in both monoamine oxidase and catechol-O-methyltransferase genes were associated with response to mirtazapine but not with response to paroxetine. Wakeno et al. (59) found that variation in alpha 2A-adrenergic receptor genes was associated with response to milnacipran but not to paroxetine. Primary analyses in each of these studies addressed the following question: Does a specific genetic variation predict better outcome in patients treated with a specific medication? These analyses are appropriate for testing hypotheses regarding mechanisms of drug action, but they do not directly address our question regarding differential efficacy: Does a specific genetic variation identify a group of patients with significantly more favorable response to one antidepressant than to another? This question regarding differential efficacy is formally addressed by a test for interaction or moderation. Kim et al. (60) did report a post hoc subgroup analysis in which patients with a specific variation of the norepinephrine transporter gene experienced significantly higher response rates during treatment with nortriptyline than with either fluoxetine or sertraline. If replicated, this finding could inform medication choice for this subgroup of patients. The Genome-Based Therapeutic Drugs for Depression project (62) used genome-wide association analyses in patients treated with nortriptyline or escitalopram to identify moderator effects (at a suggestive level of statistical significance) for two novel genetic variations. If replicated, these findings could inform the choice between these two medications.
Genetic Predictors of Differential Adverse Effects From Antidepressant Medications
We again begin by pointing out that most previous studies cannot, by design, guide treatment selection. Included in this group are cohort studies of patients treated with a single medication or group of medications that examine genetic predictors of specific adverse effects, such as insomnia (63), sexual dysfunction (64, 65), and development of suicidal ideation (66). These cohort studies cannot distinguish general predictors of experiencing adverse effects from moderators or differential predictors of more adverse effects with one treatment than with another. For example, the serotonin receptor variation previously associated with insomnia during fluoxetine treatment may instead be associated with a general tendency to experience adverse effects with a range of medications (67, 68). In the same way, finding that a particular genetic variation is associated with suicidal ideation during treatment with citalopram (66, 69) could simply imply that this variation is associated with either suicidal ideation in general or suicidal ideation during treatment with any antidepressant. Neither of these latter interpretations would argue for or against the use of citalopram in this subgroup of patients.
Of studies examining adverse events, only one has included patients treated with alternative drugs (70), finding that variation in a gene coding for the HTR2A serotonin receptor was associated with adverse effects with paroxetine but not with mirtazapine. Because no test for interaction or moderation was reported, this finding does not definitively address our question regarding personalized treatment: Does variation in HTR2A identify a group of patients who experience significantly fewer adverse effects during mirtazapine treatment than during paroxetine treatment? The finding of a significant relationship in patients treated with one drug and not in those treated with the other can sometimes reflect low statistical power rather than a true interaction or moderator effect.
While research on differential response to medications has focused on biomarkers and genetic variation, the limited research on differential response to psychotherapies has focused more on clinical characteristics (Table 3). Four reports (17, 18, 71, 72) have examined avoidant and/ or borderline personality traits or disorders as moderators of response to interpersonal therapy versus cognitive therapy or CBT. This evidence does suggest that borderline personality disorder or avoidant attachment style predicts better response to cognitive therapy. Evidence from one study each suggests that obsessive-compulsive personality disorder may predict better response to interpersonal therapy than cognitive therapy (71) and that more severe depression may predict better outcome with behavioral activation than with cognitive therapy (28).
Randomized Comparisons of Alternative Psychotherapies Evaluating Specific Clinical Characteristics as Moderators
| Add to My POL
|Study||Proposed Moderator||Therapy A||Therapy B||Outcome||Therapy A vs. Therapy B||Analysis|
|Barber and Muenz (71)||Avoidant personality disorder||Cognitive therapy (N=37)||Interpersonal therapy (N=47)||Mean HAM-D post-treatment score||5.7 vs. 10.7||8.5 vs. 5.5||t=3.8, p<0.001|
|Joyce et al. (17)||Personality disorder (primarily avoidant and borderline)||CBT (N=76)||Interpersonal therapy (N=83)||Decline in Montgomery-Åsberg Depression Rating Scale score (%)||58% vs. 38%*||58% vs. 66%||a|
|Bellino et al. (18)||Borderline personality disorder||Cognitive therapy (N=12)||Interpersonal therapy (N=14)||Mean HAM-D post-treatment score||13.7 vs. 14.1b||a||a|
|McBride et al. (72)||Avoidant attachment style||CBT (N=28)||Interpersonal therapy (N=27)||Mean Beck Depression Inventory posttreatment scorec||0.5 vs. 3.6||4.2 vs. 3.7||z=3.1, p=0.002|
|Barber and Muenz (71)||Obsessive-compulsive disorder||Cognitive therapy (N=37)||Interpersonal therapy (N=47)||Mean HAM-D post-treatment score||8.1 vs. 4.2||7.7 vs. 7.0||t=2.5, p=0.01|
|Dimidjian et al. (28)||HAM-D score 20||Behavioral action (N=43)||Cognitive therapy (N=45)||HAM-D response rate||76% vs. 48%||39% vs. 60%||a|
Regarding initial choice between medication and psychotherapy, severity of depression does not appear to predict greater likelihood of response to medication or psychotherapy. Modest evidence suggests that personality disorder predicts more favorable response to pharmacotherapy and that negative life events (either recent stresses or childhood trauma) predict better response to psychotherapy. One trial suggests that a clear preference for either medication or psychotherapy predicts greater success with the preferred treatment.
Regarding selection of a specific medication, cooccurring anxiety disorder may predict greater improvement with SSRIs than with bupropion, but this difference appears small. Biological markers (such as neuroendocrine or imaging studies) do not appear to predict differential response to specific antidepressants. Most previous studies of genetic predictors have not used appropriate designs and analyses to identify true moderators or predictors of differential efficacy. Consequently, we have no strong evidence that any genetic variation can inform antidepressant selection. Surprisingly, we also have no evidence that history of prior medication response is useful in medication selection.
Regarding selection of a specific psychotherapy, moderate evidence suggests that borderline personality disorder or attachment difficulty predicts more favorable response to cognitive therapy than interpersonal therapy.
Inadequate statistical power to detect moderator effects is a likely explanation for many of the aforementioned "negative" findings. To illustrate, in a study comparing equal numbers of patients receiving alternative treatments with average response rates of 50%, we could examine whether relative efficacy of treatments A and B varies between patients with and without characteristic X. If patients with characteristic X have a 60% response rate with treatment A and a 40% response rate with treatment B and those without characteristic X show the opposite pattern (a relatively large moderator effect), a sample of approximately 300 patients would be necessary to reliably detect this difference (assuming a 5% type I error rate and 20% type II error rate). Of the aforementioned pharmacogenetic studies (56—61), none have included more than 250 patients, and most have included fewer than 150.
Research to personalize treatment, especially research focused on genetic moderators or predictors, must consider another source of error. Such research depends on the assumption (usually unstated) that an individual's response to a specific treatment will be stable across different episodes of treatment. This assumption also underlies the common clinical practice of basing medication selection on past treatment experience. This fundamental assumption has only been examined in a single small observational study (41), and it was not supported by these data. It is remarkable that an assumption so central to clinical practice and pharmacogenetic research has so little empirical support. Addressing this gap in knowledge is a priority for future research.
Because studies of depression treatment response typically consider only a single episode of treatment per person, they obscure the distinction between person-level and episode-level predictors of treatment response. Episode-level predictors are those that may vary within individuals across episodes of depression treatment. Examples of these potentially variable characteristics include pretreatment symptom severity, pretreatment episode duration, co-occurring substance abuse, and recent life stresses. In contrast, person-level predictors are expected to show no variability across treatment episodes. Examples of these stable characteristics include race or ethnicity, stable personality traits, genetic variation, family history, or the past experience of physical or sexual abuse. Traditional research designs (examining differences between individuals during a single episode of treatment) may be adequate to identify episode-level predictors of treatment response. Accurate identification of stable or person-level predictors of treatment response will require examining consistency of treatment response within individuals across multiple episodes of care.
Research to date has identified few clinical characteristics and no biomarkers or genetic variations that reliably predict differential effectiveness or adverse effects of specific depression treatments. We have described three conceptual difficulties that may be responsible for our failure to identify differential predictors of response. First, previous research has often conflated predictors of response to specific treatments with more general predictors of prognosis. Second, response to specific treatments may actually vary across episodes of treatment. Third, response to treatment in any episode of illness may be influenced by a mixture of episode-level (or time-varying) and patient-level (or stable) characteristics. These conceptual issues have important implications for future research to identify differential predictors of treatment response.
First, research to personalize treatment for depression will probably require samples of patients considerably larger than those enrolled in traditional clinical trials. Detection of moderators or interaction effects generally requires larger samples than does detection of average or main effects, especially if a potential moderator (such as a genetic variation) influences response to one treatment but not the other.
Second, research to identify person-level predictors of treatment response may need to consider response across multiple episodes of depression. Research designs that consider only a single episode of treatment per person cannot examine whether any particular response pattern (e.g., gastrointestinal side effects with a specific SSRI, responding favorably to interpersonal psychotherapy) is stable within individuals across episodes of treatment.
Third, accurately predicting response to specific treatments may require combinations of several weak predictors rather than a single powerful one (73). For example, response to a specific antidepressant drug might be simultaneously moderated by a large number of variants of small effect. A recent genome-wide association study suggests that this may be the case (54). Developing rules for treatment selection based on multiple predictors will require useful theory regarding treatment mechanisms, large sample sizes, and healthy skepticism regarding predictors identified only after multiple comparisons.
Given the large sample sizes needed to detect moderators of treatment effectiveness, randomized trials to inform personalized treatment may need to adopt broader recruitment strategies and more efficient methods for outcome assessment. Large, pragmatic trials have been proposed as a more efficient strategy for comparative effectiveness research (74). These methods could be extended to address questions of differential treatment effectiveness.
Recruitment for randomized trials to identify moderators of treatment response might incorporate information regarding response to past treatments. Traditional clinical trials typically consider past treatment only for safety reasons, excluding patients with histories of adverse reactions to a study treatment. Future trials might consider past treatment response in order to select particularly informative samples of participants. For example, a study to identify genetic predictors of favorable response to drug A over drug B might preferentially include patients with past exposures to drug A and/or drug B. Information regarding past response could be combined with new data to more accurately distinguish true treatment effects from other sources of variation in outcome.
Another alternative approach would use observational data from large, population-based samples of patients treated under naturalistic conditions. This approach has led to significant advances in personalizing treatment for other chronic health conditions (75, 76). Observational studies permit examination of thousands of patients rather than the hundreds typically enrolled in clinical trials. Most important, longitudinal data would permit study of multiple treatment episodes per patient, including exposures to similar and dissimilar treatments. These data could serve in two ways. First, only such longitudinal data could evaluate the stability of potential treatment response phenotypes prior to expending resources searching for a corresponding genotype. Second, observational studies could identify potential moderators or differential predictors for definitive evaluation in subsequent randomized trials. Given potential biases due to nonrandom assignment of treatments, observational studies would only be appropriate for generating hypotheses regarding differential treatment response rather than confirming them.
When research does identify statistically significant predictors of differential treatment response, caution will be necessary when translating those research findings into clinical practice. The rise and fall of the dexamethasone suppression test as a predictor of need for depression treatment is a useful cautionary tale regarding clinical utility of predictive tests derived from research populations (77). Data from research settings suggested that this test might accurately predict poorer prognosis and greater need for specific depression treatment. But predictive power was much weaker in representative clinical populations, reflecting the different spectrum of illness in everyday practice (78).
Replication is the first step in translating research findings to clinically useful tests or predictors. Given the number of putative predictors available to examine in any study, reliance on the 5% standard for statistical significance is likely too liberal. Many reported predictors are likely to be false positives (79).
Even after replication, calibration studies in representative populations are necessary to assess clinical utility. The expected clinical utility of testing for genetic or other predictors of treatment response could vary widely according to the prevalence of the predictor, the accuracy of the test in actual practice, and both the prevalence and clinical importance of the outcome (80—82). Predictive power often declines significantly during the translation from research to practice. Given this slippage between research accuracy and clinical utility, the number of patients whom we would need to test in order to gain an additional good clinical outcome is often surprisingly large. For example, tests for cytochrome P450 variation might accurately predict antidepressant serum concentrations. But the numerous sources of variation between serum concentrations and clinical response mean that such tests would likely have limited utility for guiding antidepressant prescribing in clinical practice (83).
Given our limited ability to predict differential response to specific depression treatments, clinical recommendations in this area derive more from what we do not know than from what we do. When selecting between psychotherapy and medication, modest evidence suggests that personality disorder predicts better initial response to medication and a history of stressful or traumatic life events predicts better response to psychotherapy. When selecting between medications, neither biomarkers nor specific symptom patterns (e.g., anxiety symptoms, insomnia) predict clinically important differences in response. While conventional wisdom would base treatment selection on past response of an individual patient or the patient's family members, we lack evidence that response to a specific treatment is consistent within families or even within individuals over time. Treatment selection should also consider patients' preferences and (in the case of psychotherapy) the availability of adequately trained providers.
Our limited ability to predict response to specific treatments does have important implications for the organization of practice, especially in the selection and prescribing of antidepressant medications. Of patients starting antidepressant treatment, only one-half will experience a good outcome with the first treatment selected. Unfortunately, we have scant evidence that attempts to match specific treatments to specific patients will improve the rate of success. We should contrast this disappointing conclusion with the very strong evidence that organized follow-up programs significantly increase the success of antidepressant treatment (84, 85). Monitoring outcomes and personalizing treatment over time will have a much greater effect on outcomes than will attempts to personalize initial treatment selection. But opportunities for monitoring and tailoring of treatment are frequently missed. Of patients initiating anti-depressant treatment, as many as one-half will not return for follow-up visits (11, 12). Of those starting psychotherapy in community practice, one-half make fewer than four visits (86). Because low motivation, discouragement, and self-blame are core features of depression, aggressive outreach may be necessary to reach those who fail to return (87).
Our limited ability to match patients with specific treatments also raises questions about how we share uncertainty with our patients. Communicating hope is an essential element of any healing relationship. But it would be less than honest to imply that we can accurately select the best treatment for any individual. Prudence and respect for patients' autonomy argue for following an honestly optimistic approach, such as: "We have several good treatment options to choose from. On the average, they have about the same chance of success. But you are not an average; you are an individual. At this time, there is no scientific way to predict which treatment will work best for you. Together we will look at your options and decide what treatment to start with. But it is important to remember that there are other options. If the first treatment we pick does not work out for you, some other treatment might work well. Regular follow-up over the next several weeks will tell us whether to stay with our first choice or try something else."