Unipolar depressive disorders have a high prevalence (1, 2) and incidence (3), and they meaningfully impair quality of life for patients and their relatives (4, 5). Moreover, depressive disorders are linked with increased mortality rates (6), high levels of health service use, and huge economic costs (7—9). Major depression ranks fourth in disease burden worldwide, and it is expected to rank first in high-income countries by 2030 (10).
Practice guidelines recommend both pharmacological and psychological interventions for depressive disorders (11—14). Interpersonal psychotherapy (IPT) is recommended in these guidelines as one of the two psychological treatments of choice, the other being cognitive-behavioral therapy (CBT). IPT is a structured, time-limited psychological intervention based on interpersonal theory (15—18) and specifically developed for the treatment of major depression (19, 20).
Although numerous randomized controlled trials have examined the effects of IPT, only one meta-analysis has been conducted to evaluate IPT for depression (21). That analysis included a total of 13 studies and found significant and large effects for IPT compared with placebo or no treatment, and superior effects of IPT compared with CBT. A substantial number of studies of IPT have been published since then. Furthermore, the earlier meta-analysis did not examine heterogeneity, possible effect moderators that may explain heterogeneity, publication bias, or the quality of included studies. We therefore decided to conduct a new meta-analysis to examine whether IPT is an efficacious treatment and deserves the prominent place it currently holds in treatment guidelines.
We searched the literature using several methods. First, we used our existing database on psychological treatment of depression in adults, which is continuously updated and currently contains 1,122 full-text papers. This database, described in detail elsewhere (22), has been used in a series of earlier meta-analyses (www.evidencebasedpsychotherapies.org). We also conducted a comprehensive literature search (from 1966 to January 2010) in PubMed, PsycINFO, EMBASE, the Cochrane Central Register of Controlled Trials, and Dissertation Abstracts International. We identified abstracts by combining terms indicative of psychological treatment and depression. We also collected the primary studies from 42 meta-analyses of psychological treatment for depression. Second, we carried out an additional literature search (from 1966 to January 2010) in PubMed, PsycINFO, and EMBASE to retrieve studies of IPT in adolescents. Third, we collected the primary studies from the previous meta-analysis of IPT (21) and checked the reference lists of included studies.
+
Inclusion and Exclusion Criteria
We included studies of randomized controlled trials in which IPT for adults or adolescents with a unipolar depressive disorder or an elevated level of depressive symptoms was compared with waiting list, usual care, placebo, psychological treatment, pharmacological treatment, or combination treatment with IPT and pharmacotherapy. We also included maintenance studies in which patients were successfully treated during the acute treatment phase and then randomized to receive IPT or another treatment condition in the continuation phase.
We excluded studies that examined interpersonal counseling for subthreshold depression (23—26) because interpersonal counseling differs considerably from full IPT and is not intended for individuals with full-syndrome depressive disorders. No language restrictions were applied. Eligibility judgment was performed independently by two reviewers (A.S.G. and P.C.). In case of inconsistency, a third reviewer (A.v.S.) was consulted.
We assessed the validity of the studies according to the basic criteria suggested by the Cochrane Handbook for Systematic Reviews of Interventions (27): adequate sequence generation (the randomization scheme was generated correctly); allocation to conditions by an independent (third) party; blinding of assessors of outcomes; completeness of follow-up data; and no selective outcome reporting. (We omitted the criterion of adequacy of random allocation concealment to respondents because concealment of random allocation is impossible in psychological treatment.)
We calculated effect sizes (Cohen's d) for all individual studies on the basis of postscore analysis using the formula d=(Mc — Me)/SDec, where Mc is the mean of the outcome measures in the control group, Me is the mean of the outcome measures in the experimental groups, and SDec is the pooled standard deviation of the posttest scores of the two groups (28, 29). An effect size of 0.8 can be considered large, 0.5 moderate, and 0.2 small (30). For studies that reported more than one outcome, a single pooled effect size was calculated for each study.
The individual effect sizes were pooled in the Comprehensive Meta-Analysis (CMA) software program (www.meta-analysis.com). Pooled effect sizes were calculated separately for posttest comparisons of IPT with usual care, no treatment, or placebo; with other psychotherapy; and with pharmacotherapy; in addition, combination treatment with IPT and pharmacotherapy was compared with pharmacotherapy alone.
Because we expected considerable heterogeneity, we calculated pooled effect sizes with the random-effects model. However, we first tested the heterogeneity under the fixed model using the I2 statistic (31). I2 describes the variance between studies as a proportion of the total variance. A value of 25% indicates low heterogeneity, 50% moderate heterogeneity, and 75% high heterogeneity. We also report the p value of the Q statistic. A significant Q value rejects the null hypothesis of homogeneity.
The standardized mean difference is not easy to interpret from a clinical viewpoint. Therefore, we transformed the standardized mean differences into the numbers needed to treat, using the formulas Kraemer and Kupfer provide (32). The number needed to treat indicates the number of patients who must receive treatment to generate an additional positive outcome in the experimental group relative to the comparison group (33).
For the IPT maintenance studies, we calculated the odds ratio of recurrence of depression in maintenance IPT compared with a control condition, as well as the number needed to treat (in this case as the reverse of the risk difference).
We also performed subgroup analyses to test for significant differences between effect sizes in different categories of studies. In these analyses, we used the mixed-effects model, which pooled studies within subgroups with the random-effects model but tested for significant differences between subgroups with the fixed-effects model. If a subgroup contained fewer than three studies, we did not conduct the subgroup analysis. Because there are indications that psychological treatments of dysthymia are less effective (33), we decided to repeat all analyses after removing studies specifically targeting patients with dysthymia.
Publication bias was examined by inspecting the funnel plot. A funnel plot is a plot of a measure of study size (the standard error) on the vertical axis as a function of effect size on the horizontal axis. Large studies appear at the top of the graph and tend to cluster near the mean effect size. Smaller studies appear toward the bottom of the graph. As there is greater sampling variation in effect size estimates in the smaller studies, they will be dispersed across a range of values (34). Visual inspection of a funnel plot can give an indication of publication bias. The studies can be expected to spread symmetrically about the pooled effect size when publication bias is absent. When bias exists, the bottom of the plot will show a higher concentration of studies on one side of the mean than the other. This is because smaller studies are more likely to be published if they have larger than average effects, which makes them more likely to meet the criterion for statistical significance.
We also examined possible publication bias using Duval and Tweedie's trim-and-fill procedure (34). If a meta-analysis has included all relevant studies, the funnel plot should be symmetric and dispersed equally on either side of the mean effect. If there is publication bias, the funnel plot will be asymmetric, with more studies to the right side of the mean effect size (studies with large effect sizes) than to the left of the mean (studies with small or nonsignificant effect sizes, which can be expected to be harder to publish). Duval and Tweedie developed a method for imputing missing studies based on the assumption that studies should be equally distributed on both sides of the mean effect size. This procedure yields an estimate of the effect size after accounting for publication bias (adjusted effect size).
We expected that several comparisons (e.g., IPT compared with pharmacotherapy, or pharmacotherapy compared with combination treatment) would involve only a limited number of studies. We therefore conducted a power calculation to assess whether the included studies had sufficient statistical power to detect small effect sizes. Although there are no clear definitions for what constitutes a small effect size, we defined a small effect as d=0.2, according to the indications of Cohen (30), but we also examined how many studies would be needed to find an effect size of 0.3.
We conducted the power calculation according to the procedures described by Borenstein and colleagues (35). A power calculation indicated that we would need to include at least 32 studies with a mean sample size of 50 (25 participants per condition) to be able to detect an effect size of 0.2 (conservatively assuming a high level of between-study variance [τ2], a statistical power of 0.80, and an alpha of 0.05). Alternatively, we would need 20 studies with 80 participants apiece to detect an effect size of 0.2, or 16 studies with 100 participants. To detect an effect size of 0.3, we would need 14 studies with 50 participants, nine studies with 80 participants, or seven studies with 100 participants.
+
Selection and Inclusion of Studies
Having examined a total of 10,487 abstracts, we retrieved 1,209 full-text papers for further study. Of these, we excluded 1,171 papers that did not meet inclusion criteria (Figure 1). A total of 38 studies met all inclusion criteria and were included in this meta-analysis (36—73).
+
Characteristics of Included Studies
The 38 studies included 4,356 patients (1,338 in the IPT conditions, 812 in control conditions, 713 in pharmacotherapy conditions, 468 in other psychotherapy conditions, 510 in combination treatment with IPT and pharmacotherapy, and 515 in maintenance studies). Selected characteristics of the included studies are presented in Table 1.
Thirty-three of the 38 studies examined the effects of IPT as an acute treatment, and the remaining five examined IPT as a maintenance treatment after successful recovery from a depressive disorder. Sixteen studies compared IPT with a control condition (waiting list, usual care, placebo, other), 10 compared IPT with another psychotherapy, 10 contrasted IPT with pharmacotherapy, and 10 compared a combination treatment (IPT plus pharmacotherapy) with pharmacotherapy alone. Of the 16 studies comparing IPT with a control condition, eight used usual care as the control condition, three used a waiting list control group, two used a pill placebo, and three used another control group (monthly 30-minute nontherapeutic sessions; a parenting education control program; and nonscheduled treatment).
In 29 studies patients met criteria for a depressive disorder according to a diagnostic interview (four studies specifically targeted dysthymic patients); in four studies, patients had scored above a cutoff on a depression questionnaire. Seventeen studies treated adults in general, six treated adolescents, four treated older adults, four treated patients with somatic disorders, two treated women with postpartum depression, and the remaining five treated other, more specific target groups. Fourteen studies used the original IPT manual, and 19 reported having adapted the manual to the needs of the study's target population. Adaptations were minor and included adapting the number of sessions, addressing specific needs of the target groups, and changing the individual format to a group format. The 38 studies were conducted in 13 countries, with most in the United States (N=21, including two in Puerto Rico) and Europe (N=7).
The quality of the studies varied. Nineteen studies reported an adequate sequence generation, while the other 19 did not report a sequence generation method. Twelve studies reported allocation to conditions by an independent (third) party. Twenty-five studies reported using blinded outcome assessors, 10 did not report blinding of assessors, and three used self-report outcome measures. In 28 studies intent-to-treat analyses (completeness of follow-up data) were conducted. Nine studies (24%) met all quality criteria.
+
IPT Compared With Standard or No Treatment
We were able to compare the effects of IPT with a waiting list, usual care, or placebo control condition in 16 studies (Table 2). The mean effect size (Cohen's d) was 0.63 (95% confidence interval [CI]=0.36 to 0.90), which corresponds to a number needed to treat of 2.91. Heterogeneity was high (I2=82.96%). After removal of a possible outlier, the mean effect size decreased to 0.52 (95% CI=0.36 to 0.68; number needed to treat=3.50), with low to moderate heterogeneity (I2=42.84%). Meta-analyses based on the two most commonly used instruments (the 17-item Hamilton Depression Rating Scale [HAM-D] and the Beck Depression Inventory [BDI]) yielded comparable outcomes (Table 2). Figure 2 presents the effect sizes and confidence intervals.
The 16 studies had on average 92 participants (46 in the IPT and 46 in the control conditions). This generated sufficient statistical power to detect an effect size of 0.21 (number needed to treat=8.47).
Inspection of the funnel plot and Duval and Tweedie's trim and fill procedure did not indicate possible publication bias: the effect size adjusted for publication bias exactly equaled the unadjusted effect size. None of the studies in this subsample specifically targeted dysthymic disorder.
We examined some basic moderators in subgroup analyses. We found no indication that type of target group (adults, adolescents, more specific target group), method of diagnosing depressive disorder (diagnostic interview or other), or use of intent-to-treat analyses (yes or no) were significantly associated with effect size, although the number of studies was small in several subgroups. We did find that the studies using the original manual produced significantly lower effect sizes (d=0.29; number needed to treat=6.17) than did studies that used an adapted manual (d=0.67; number needed to treat=2.75) (p<0.01). Heterogeneity was low and not significant in both subgroups.
Studies that used a waiting list control group yielded larger effect sizes than studies that employed usual care or other control conditions. That this difference was not statistically significant (p>0.05) may reflect the small number of studies using a waiting list control group (N=3).
+
IPT Compared With Other Psychotherapies
Ten studies (13 comparisons) compared posttest effects of IPT to another psychotherapy (Table 2; see also Figure S1 in the data supplement that accompanies the online edition of this article). On average, the 13 comparisons included 74 patients (37 per condition), which sufficed to detect an effect size of 0.25 (number needed to treat=7.14).
The overall effect size for the 13 comparisons was 0.04 (95% CI=—0.14 to 0.21; number needed to treat=45.45) favoring IPT, which was not statistically significant (p=0.40). Heterogeneity was low to moderate (I2=39.81%).
In these analyses we included three studies that compared two psychological treatments with the same control group. This means that multiple comparisons from these three studies were included in the same analysis. The multiple comparisons, however, are not independent of one another, which may have resulted in artificially reduced heterogeneity and affected the pooled effect size. We examined such possible effects by conducting an analysis in which we included only one effect size per study. First, we included only the comparison with the largest effect size from the studies with multiple comparisons. Then, in another analysis, we included only the smallest effect size. As illustrated in Table 2, the resulting effect sizes were almost identical to those of the overall analyses. Heterogeneity did not increase considerably and remained moderate to high in these analyses.
The mean effect size based on the HAM-D resulted in comparable outcomes. The effect size based on the BDI was somewhat larger (d=0.28; number needed to treat=6.41) but did not reach statistical significance. Removal of the studies specifically treating dysthymic disorder had little effect on the overall effect size. There were no outlier studies.
Again, we examined possible moderators in subgroup analyses (Table 2). We did not find that the effect size differed significantly between studies that used an adapted manual and those that used the original manual; between studies in which IPT was compared with CBT and those in which IPT was compared with other psychotherapies; between studies treating adults in general and those targeting more specific groups (e.g., adolescents, people with somatic illnesses); between studies using different treatment formats; and between studies using intent-to-treat analyses and those using per-protocol analyses.
There were some indications for publication bias. After adjustment for publication bias, the effect size decreased to —0.11 (95% CI=—0.31 to 0.09; number of imputed studies=4; number needed to treat=16.13).
+
IPT Compared With Pharmacotherapy
Ten studies compared IPT with pharmacotherapy. These studies included on average 82 participants and had sufficient power to detect an effect size of 0.28 (number needed to treat=6.41).
A nonsignificant differential overall effect size of —0.12 (95% CI=—0.36 to 0.12; number needed to treat=14.71) favored pharmacotherapy (see Figure S2 in the online data supplement). Heterogeneity was moderate to high (I2=61.98%). After removal of one possible outlier (Finkenzeller et al. [44]), the overall effect size became significant (d=—0.19; 95% CI=—0.38 to —0.01; number needed to treat=9.43, p<0.05; I2=30.95%), indicating a significant superior effect of pharmacotherapy. Removal of the studies aimed at dysthymia resulted in comparable outcomes.
Subgroup analyses indicated that selective serotonin reuptake inhibitors (SSRIs) were significantly more effective than IPT (N=3; d=—0.39, p<0.01; number needed to treat=4.59), whereas tricyclic antidepressants were not (N=4; d=—0.02, p>0.1; number needed to treat=83.33), and the studies comparing SSRIs with IPT differed significantly from those examining tricyclics (p<0.05). Two of the three studies comparing IPT with SSRIs, however, involved patients with dysthymia. Furthermore, the number of studies in each of these subgroups was very small, so these results should be considered with caution.
These analyses gave no indication of publication bias; the unadjusted and adjusted effect sizes were identical, with zero imputed studies.
+
IPT Compared With Combination Treatment
In 10 studies the combination of IPT and pharmacotherapy was compared with pharmacotherapy alone. These 10 comparisons had on average 80 participants, which yielded enough statistical power to detect an effect size of 0.28 (number needed to treat=6.41). The mean effect size indicating the difference between these two types of treatment was 0.16 (95% CI=—0.03 to 0.36; number needed to treat=11.11) in favor of combination treatment (see Figure S3 in the online data supplement). This difference was not statistically significant, perhaps reflecting the small number of studies and consequent low statistical power. Heterogeneity was low to moderate (I2=39.26%). Removing the studies on dysthymic disorder did not yield a significant difference either. The subgroup analyses (original manual versus adapted manual; adults versus more specific group; intent-to-treat analyses versus per-protocol analyses) identified no significant differences between subgroups. There were some indications for publication bias: the effect size adjusted for publication bias was somewhat smaller than the unadjusted effect size (d=0.07; 95% CI=—0.13 to 0.27; number of imputed studies=3; number needed to treat=25.00).
+
IPT as Maintenance Treatment
We were able to compare maintenance pharmacotherapy in patients who had recovered from a depressive disorder with combination treatment with IPT and pharmacotherapy maintenance treatment in five studies. Four of these presented recurrence rates. The fifth study reported only means and standard deviations for patients; in this study, the odds ratio was calculated using the procedures integrated in the CMA software. This resulted in an odds ratio of 122.77, which was considered incredible (the other odds ratios ranged from 1 to 3.75). We therefore did not use the study in these analyses. The remaining four studies generated an odds ratio of 0.37 (95% CI=0.19 to 0.73, p<0.01; I2=0%; number needed to treat=7.63), which indicates that maintenance IPT combined with pharmacotherapy significantly reduced the recurrence rate compared with pharmacotherapy alone after successful treatment of acute depression (see Figure S4 in the online data supplement). Because of the small number of studies, we did not conduct additional analyses.
We were also able to compare the combination of maintenance IPT and pill placebo with pill placebo alone in four studies. The resulting odds ratio was 0.47 (95% CI=0.25 to 0.87; I2=0%; number needed to treat=5.95), indicating that maintenance IPT was more protective against relapse than pill placebo alone. As none of the other possible comparisons (maintenance IPT versus control groups; maintenance IPT versus pharmacotherapy) had more than two comparisons, we decided not to perform a meta-analysis.
We identified 38 randomized trials (with a total of 4,356 patients) examining the effects of IPT. Compared with control groups, we found a moderate to large effect of IPT in the acute treatment of depression. We also found some indications that IPT had less efficacy than SSRI pharmacotherapy. However, the overall difference was small, not all analyses were significant, and the number of studies in this subsample was small.
We found indications that combination treatment with IPT and pharmacotherapy was somewhat more efficacious than pharmacotherapy alone, although this difference reached significance only when the HAM-D was used as an outcome measure. However, the effect size was also small, and again this subsample of studies was relatively small, limiting statistical power. In a larger meta-analysis of studies comparing combination treatments with psychotherapy and pharmacotherapy and pharmacotherapy alone, we found that combination treatments were significantly better than pharmacotherapy alone (74). Furthermore, combination treatment may have greater efficacy for patients with more severe or chronic major depression (75).
We did not find that IPT had greater efficacy than other psychotherapies, including CBT, although the number of studies was too small to draw definite conclusions. IPT is the only type of psychotherapy for depression aside from CBT that has been compared with control groups, other psychotherapies, antidepressant medication, and combination treatments. Other types of psychotherapy are far less well examined. At present, therefore, IPT and CBT may be considered the best options for psychological treatments for depression. There is no indication that IPT is superior to CBT, and the two seem equally effective overall. Whether to prescribe CBT or IPT should depend on patient preference and moderating factors based on differential therapeutics (76).
Although the number of studies examining the effects of maintenance IPT was small, these studies had relatively high methodological quality. Analysis of these studies indicated that maintenance IPT combined with pharmacotherapy reduced the relapse rate considerably compared with pharmacotherapy alone. We also found that placebo plus IPT was more effective than placebo alone in reducing relapse rates.
Our finding that pharmacotherapy has greater efficacy than IPT is consistent with results from earlier meta-analytic research (77); this should not come as a surprise, since there was considerable overlap of included studies. The superior effect of combination treatment over pharmacotherapy alone also accords with earlier meta-analytic research (74). This suggests that IPT has an additional effect on depression beyond the effects of pharmacotherapy, although the effect size was small.
That IPT is efficacious compared with control conditions and probably augments pharmacotherapy is important. Pharmacotherapy may have limited benefit in situations such as complicated grief, where IPT can be crucial. Medication and psychotherapies presumably work by different mechanisms, and they generally relieve symptoms in different temporal patterns. Effective psychotherapies such as IPT are therefore among the most important instruments available to clinicians.
We found that studies using the original IPT manual produced significantly lower effect sizes than studies that used an adapted manual. This may reflect the fact that the original manual has been examined by several groups other than the inventors of IPT, whereas the adapted versions of the manual have been examined mainly by the researchers who developed them. Yet the original manual would inevitably have received more use than later adaptations, and differences in outcomes might also derive from different treatment populations, therapist skills, and the adaptations themselves. Nonetheless, the larger effect sizes of the adapted versions should be considered with caution.
This study has several limitations. First, the quality of the included studies was not optimal. Only nine of 38 (24%) studies met all quality criteria. Although this proportion is relatively high compared with other studies of psychotherapy for depression (we previously found [77] that only 11 of 115 [10%] controlled trials of psychotherapy for adult depression met all quality criteria), it is still too low. We recommend that future research use and report adequate randomization methods, correct blinding of outcome assessors, and intent-to-treat analyses.
Second, the number of studies in several subanalyses was relatively small and may have lacked statistical power to detect smaller effect sizes. A third limitation is that we found indications for publication bias in some analyses, although the mean effect sizes did not decrease considerably after adjustment for publication bias, and none of the resulting effect sizes differed significantly from the unadjusted effect sizes. This contradicts a recent meta-analysis of publication bias in psychotherapy for adult depression (78), which found no indications for publication bias of IPT studies. The present analysis included more studies, and its results can be assumed to be more up-to-date.
Despite these limitations, we found clear indications for the efficacy of IPT for unipolar depression. IPT is one of the best empirically validated psychological treatments for depression currently available, and its inclusion in treatment guidelines is justified.