The delineation of patient characteristics that usefully predict the outcome of treatment is a long-sought but somewhat elusive goal in psychotherapy research. One goal in the identification of such predictors is to aid in the selection of patients who will or will not respond to treatment. This distinction, if made with sufficient accuracy, may reduce the cost of a treatment, by offering it only to those with a high probability of success, and may allow for early identification of nonresponders to one treatment who may respond to another treatment. The identification of variables associated with poor outcome may also lead to hypotheses concerning the reasons for failure of a particular treatment, possibly leading to improvements in the treatment being studied.
At present, cognitive behavior therapy is recognized as being the most effective treatment for bulimia nervosa, having been demonstrated superior to most other psychotherapies and also to a single trial of antidepressant medication (1). Unfortunately, even with cognitive behavior therapy only some 50% of bulimic patients recover. Hence, it is important to identify the characteristics of those who will and will not respond to cognitive behavior therapy, so that more effective treatment strategies can be developed. Such characteristics have been identified in previous studies. Among the pretreatment variables associated with poor outcome in some studies are low self-esteem (2–4), low weight or previous anorexia nervosa (5, 6), a higher frequency or severity of binge eating (2, 7–11), comorbid personality disorder (9, 12–14) or depression (7, 8, 15), attitudes toward weight and shape (4) (the most severely disturbed patients show greater improvement), and a history of obesity (8). A further predictor of outcome may be early response to treatment. In a recent study (11), rapid responders to cognitive behavior therapy were significantly more likely to do well in treatment than slow responders.
On average, 20% of participants in controlled trials of cognitive behavior therapy for bulimia nervosa drop out, with a range from 0% to 35% (16). Hence, dropouts make a considerable contribution to therapy failure rates. To date, findings from some studies have suggested that the following factors characterize dropouts, as compared with those who complete treatment: more severe depression (16), more severe bulimic symptoms (14, 17), interpersonal difficulties as reflected by comorbid personality disorder (4, 14, 18, 19), and impulsivity (20).
Different studies have found different sets of predictors both for treatment outcome and for attrition. In many cases, predictors identified as statistically significant in one study were not found significant in others. Several factors may contribute to these discordant findings. First, the type of therapy, the mode of delivery (i.e., individual or group format, outpatient or inpatient treatment), and the population of bulimic subjects have varied among studies. Second, many studies have had too few subjects to reliably identify outcome predictors. Third, the definition of treatment success has varied; some studies have used abstinence from binge eating and purging, and others have used the criterion of no longer meeting the DSM-III-R diagnostic criteria. Fourth, both pretreatment variables and the methods of assessing treatment outcome have varied among studies. In addition, these studies offer only general guidance to the clinician as to which patient will not do well in therapy.
The present study addresses some of these problems—first, by entering a relatively large number of participants (N=194) meeting the DSM-III-R criteria for bulimia nervosa into the study; second, by treating the participants with a widely used manual-based form of cognitive behavior therapy (21); third, by paying careful attention to the integrity of the therapy delivered; and fourth, by using signal detection analyses to determine if optimal cutoff points could be established to differentiate between patients for whom treatment is likely to be a success and those for whom it will fail. The study involved three treatment sites—Cornell University, University of Minnesota, and Rutgers University—and a data and monitoring center at Stanford University. The data presented here are derived from the first phase of a controlled study of the efficacy of pharmacological and psychotherapeutic treatments for nonresponders to cognitive behavior therapy.
A total of 194 women meeting the DSM-III-R criteria for bulimia nervosa entered the study and were treated with cognitive behavior therapy. Participants were recruited from advertisements in the local media and from eating disorders clinics. Potential participants were first screened by telephone to ascertain their eligibility for the study. Of 851 individuals calling the three treatment centers, 592 were screened out; the major reasons were not meeting the binge-eating or purging frequency criteria for bulimia nervosa, having been treated with an adequate trial of an antidepressant medication, or not being interested in the study. Hence, 259 individuals were offered appointments for further screening, 39 of whom did not keep their appointments. At the interview, the study procedures were described in detail and the potential participants then gave their written consent to participate. A further 26 individuals were screened out at this interview; the principal reason was the absence of one or more criteria for the diagnosis of bulimia nervosa. Therefore, the final number of subjects was 194. Other exclusion factors were current anorexia nervosa, current alcohol or drug abuse, associated severe physical or psychiatric illness (e.g., psychosis, significant suicidal risk, cancer), use of any medication known to affect weight, current psychiatric or psychotherapeutic treatment, or an adequate trial of an antidepressant medication or cognitive behavior therapy. During the treatment phase of the study, six participants were withdrawn: one became pregnant, four developed major depression requiring antidepressant medication, and one developed a manic episode.
The mean age of the participants was 28.1 years (SD=7.9); of the 188 remaining participants, 88% were white (N=166), 5% were African American (N=10), 3% were Hispanic (N=6), and 3% were Asian (N=6). Two-thirds (66%, N=124) had never married, 24% were currently married (N=45), and 10% were divorced (N=19). The participants reported that their bulimic symptoms had begun an average of 10.2 years (SD=7.6) before the study. Their median rate of binge eating was 21.0 episodes during a 4-week period, and their median rate of purging was 34.0 episodes over 4 weeks. Nearly one-quarter of the participants (22%, N=42) reported a previous episode of anorexia nervosa, 59% had a past history of major depression (N=110), and 23% had a current major depression (N=43). Personality disorders were diagnosed in 43% of the participants (N=81); of these disorders, about one-half were in cluster B.
Cognitive behavior therapy, which was manual based and had been used in previous treatment research (21), was carried out by doctorate-level psychologists experienced in the treatment of eating disorders. Treatment consisted of 18 individual 50-minute outpatient sessions over 16 weeks. Sessions were held twice weekly for the first 2 weeks and then weekly. None of the patients received any other psychotherapy or pharmacotherapy during this period. To standardize the therapeutic procedures within and across sites, a training and monitoring process was instituted. This process was aimed at diminishing the likelihood of site-by-treatment interactions, ensuring therapist compliance with the therapeutic procedures, and allowing replication by others. The therapists were trained in the procedures at a workshop, and each therapist practiced the treatment with two patients with weekly on-site supervision. A second workshop was held 6 months later, just before entry of the first participant, to review the therapeutic procedures and address common problems. These workshops continued at 6-month intervals during this phase of the study. In addition, a randomly selected sample of audiotapes was reviewed, and feedback on the accuracy of therapy was provided to the therapist by fax and at times by telephone. On-site supervision of therapy continued at weekly intervals.
The patients were assessed before treatment through both structured interviews and questionnaires. Weight and height were measured in order to calculate body mass index. Psychopathology was assessed by using the Structured Clinical Interview for DSM-III-R (22). Specific eating-related pathology was assessed before and after treatment by using the Eating Disorder Examination ratings for frequencies of binge eating and purging, dietary restraint, and concern about weight, shape, and eating (23). During treatment the number of purging episodes during the previous week was recorded at 2-week intervals by means of a computerized questionnaire assessment.
Administration of questionnaires was aimed at further assessing 1) specific eating-related pathology, 2) aspects of general psychopathology particularly pertinent to bulimia nervosa, and 3) interpersonal functioning. In the first category the questionnaire used was the Bulimic Thoughts Questionnaire, a measure of bulimic cognitions (24) and self-efficacy in terms of overcoming binge eating and purging (25). In the second category were the Beck Depression Inventory (26), the Rosenberg Self-Esteem Scale (27), and the impulsivity scale of the Multidimensional Personality Questionnaire (28). In the third category were the Inventory of Interpersonal Problems (29), a measure of interpersonal relationships, and the questionnaire form of the Social Adjustment Scale (30).
Posttreatment status was derived from Eating Disorder Examination interviews, allowing classification of the participants as responders (no binge eating or purging during the past 4 weeks) or nonresponders. The reliability of this measure was determined for 20 participants. With the exception of subjective binges, which were not used in this study, agreement on all measures exceeded r=0.90.
The analytic approach used here was descriptive and hypothesis generating, rather than hypothesis testing. In the first phase of the analysis, the pretreatment characteristics of the dropouts were compared with those of the patients who completed treatment, and the treatment responders were compared with the nonresponders. For continuous outcome measures, Cohen’s d (the standardized mean difference between the groups) was used as an effect size. For binary outcome measures, the natural logarithm of the odds ratio comparing the response rates of the two groups was used. While there are no absolute standards of what constitutes small, medium, and large effect sizes, generally 0.2 (odds ratio=1.2) is considered small, 0.5 (odds ratio=1.6) is considered moderate, and 0.8 (odds ratio=2.2) is considered large.
Because adequate prediction of success and failure often requires use of combinations of, rather than individual, variables, the next step was based on use of signal detection. This method was used to determine the most sensitive and specific algorithm to, first, identify treatment dropouts and, second, identify treatment nonresponders (31). Signal detection is a well-established procedure, in many ways ideally suited to clinical decision making but not as familiar in this context as are standard parametric methods, such as multiple logistic regression analysis or multiple linear discriminant analysis. However, signal detection has major advantages over these methods:
1. Signal detection is nonparametric and distribution free, whereas multiple logistic regression analysis involves linearity assumptions and multiple linear discriminant analysis, in addition, assumes multivariate normal distributions.
2. Signal detection leads to an "and/or" rule that identifies which patients require attention and which not, a rule that clinicians find easy to apply in practice. In contrast, multiple logistic regression analysis and multiple linear discriminant analysis result in weighted averages of predictors, which are cumbersome for clinicians to compute for individual patients and which, at best, order patients in terms of their need for attention. These are typically useful in research applications but often are difficult for clinicians to apply to individual patients.
3. Signal detection explicitly requires an evaluation of the relative clinical importance of false positives and false negatives. Multiple logistic regression analysis and multiple linear discriminant analysis, by their nature, place equal importance on both, whatever the clinical situation.
4. Signal detection is highly sensitive to interactive effects of predictors. Multiple logistic regression analysis and multiple linear discriminant analysis, like other linear models, require inclusion of all "main effects" before interactions can be considered. As a result, they have relatively low power to detect even strong interactions.
5. Signal detection can identify different subgroups of subjects who have similar probabilities of the outcome but for different reasons. Multiple logistic regression analysis and multiple linear discriminant analysis would merely identify these subjects as having similar probabilities. As a result, the clinician is not alerted to the fact that the type of attention needed might differ among these subgroups.
6. Signal detection can take the costs of evaluations into consideration, although this capacity was not used here. Multiple logistic regression analysis and multiple linear discriminant analysis cannot take costs into consideration.
7. Signal detection, multiple logistic regression analysis, and multiple linear discriminant analysis, when used stepwise, as done here, all are hypothesis-generating, not hypothesis-testing, methods. This limitation is often overlooked in consideration of the results of multiple logistic regression analysis and multiple linear discriminant analysis, but it is hard to overlook in the use of signal detection methods.
Briefly, at the first step, signal detection considers each possible predictor (including a range of different cutoff points for any ordinal predictor). For each, it computes the sensitivity and specificity of that "test" against the outcome. Using the selected weighting of the relative clinical importance of false positives and false negatives, it finds the optimal predictor (and optimal cutoff point for an ordinal predictor). This is then used to split the initial population into two subsets, the one positive on the first "test" and the one that is negative. The process is repeated on each of these two subsets. At this stage the same "test" may be found for both subsets (which would then act like a "main effect" in a linear model) or two different "tests" (like an "interaction"). The process is then repeated on each of the resulting four subsets, then on the resulting eight subsets, etc., ultimately creating a decision tree. The process stops when there are no more "tests," when the sample size in some subset is too small, or when the optimal test does not achieve some preset criterion (often a statistically significant two-by-two chi-square test at the 5% level, here used as a stopping rule, not as a testing procedure for an a priori hypothesis).
One-fourth of the participants (26%, N=48) dropped out of treatment at an average of 4.6 weeks, and 29% of these (N=14) dropped out by the second week. Of the dropouts, 21% (N=10) cited moving or lack of time as the reason for dropping out, 21% (N=10) did not feel that the treatment was suitable, and the remainder dropped out for unknown reasons. According to subject recall for the preceding week, nine of the dropouts had ceased to vomit; however, this is probably an overestimate of the number who can be regarded as recovered because binge eating and other purging methods were not assessed, and a 1-week recall is not a reliable indicator of recovery. Hence, we decided to include all the dropouts in the analyses. Of the 140 participants who completed treatment, 58 (41%) had stopped binge eating and purging according to scores on the Eating Disorder Examination for a 28-day period.
Differences Between Dropouts and Completers
The pretreatment characteristics of those who dropped out of treatment (N=48) were compared with those of the patients who completed treatment (N=140) (t1). The magnitude of the effect sizes indicates that the dropouts tended to have higher levels of bulimic cognitions (effect size=0.61), greater concern about shape (effect size=0.58), and greater impulsivity (effect size=–0.53).
A signal detection analysis using the variables listed in t1 revealed that the best combination of predictors of dropping out of treatment was a score on the Bulimic Thoughts Questionnaire higher than 75 plus a score on the impulsivity scale of the Multidimensional Personality Questionnaire lower than 6, indicating greater impulsivity (χ2=29.5, df=3, p=0.0001). This test has a sensitivity of 71% and a specificity of 77%. From a clinical viewpoint, if a different treatment were designed for likely dropouts, this test would correctly assign 69% of the group either to the new treatment or to cognitive behavior therapy. However, 32% of the individuals who would not drop out would be incorrectly assigned to the new treatment, hence depriving them of cognitive behavior therapy or treating them unnecessarily with an additional treatment if the new therapy was combined with cognitive behavior therapy. In an attempt to more precisely select dropouts, a test with the highest specificity was performed. This test identified only six dropouts and misclassified three individuals who would not drop out.
Baseline Differences Between Recovered and Nonrecovered Patients
The pretreatment variables were compared for those who stopped binge eating and purging (N=58), denoted here as recovered, and those who did not (N=82) minus those who dropped out before completing treatment. The nonrecovered patients, as shown in t2, were more likely to have reported current depression (effect size=–0.57), a low body mass index (effect size=–0.41) (probably indicating severe dietary restriction), and poor social adjustment (effect size=0.46).
A signal detection analysis showed that the best indicator of poor response to treatment was a score on the Social Adjustment Scale higher than 2.0 (indicating poor social adjustment) combined with a body mass index less than 25 (χ2=16.6, df=2, p=0.001). This test has a sensitivity of 77% and a specificity of 56%. Put another way, if this test was used to select persons who were not going to respond to cognitive behavior therapy and give them a second treatment, 63% would get the correct treatment, 7% would respond to cognitive behavior therapy but would be given the second treatment unnecessarily, and 30% would get cognitive behavior therapy but would not respond to that treatment.
Differences in Early Performance Between Recovered and Nonrecovered Patients
Because early performance in treatment may predict outcome more successfully than pretreatment variables alone, the purging data at weeks 2, 4, 6, and 8 of treatment were added to the pretreatment variables used in the preceding analysis. The only significant variable chosen as a cutoff point in this signal detection analysis was the percentage change in purging after 4 weeks (six sessions) of treatment (χ2=42.5, df=1, p<0.001); those who reduced purging less than 70% were more likely to be treatment nonresponders. The sensitivity of this test was 86%, and the specificity was 69%. If we used these criteria to select patients likely not to respond to cognitive behavior therapy and added a second treatment, 74% would get the correct treatment, 4% of those who would have recovered with cognitive behavior therapy would get the second treatment unnecessarily, and 22% who were assigned to cognitive behavior therapy would not respond to that treatment.
In order to model the predictive value of this test in a more clinical manner, the cutoff point was tested on patients who had not dropped out by session 6 (N=162), i.e., the cohort that would be available to the clinician at session 6. For this signal detection analysis, 70% of the group would get the correct treatment, 6% would get the second treatment unnecessarily, and 24% who were assigned to cognitive behavior therapy would not respond to that treatment—results similar to those just reported.
Overall, the previously reported predictors of outcome for cognitive behavior therapy were confirmed in this study. The principal exceptions were high frequencies of binge eating and purging; although previously noted to be associated with both dropping out (13, 16) and poor treatment response (2, 7–11), in this study they were not associated either with the tendency to drop out of treatment or with treatment response. In the present study, the dropouts had significantly more frequent bulimic thoughts and exhibited greater impulsivity than did subjects who completed treatment. Moderate effect sizes suggested that the dropouts were more likely to have a past history of anorexia nervosa, a past history of major depression, and poorer social adjustment than the completers. Patients who did not respond to treatment were significantly more likely to have poor social adjustment and a lower body mass index than were responders; the low body mass index probably indicates a history of severe dietary restraint.
Do these findings provide useful insight into modifications of cognitive behavior therapy either for dropouts or for treatment completers? In the case of dropouts, it is possible that cognitive behavior therapy does not deal with intense bulimic cognitions rapidly enough to prevent the more impulsive individual from dropping out of treatment. This is likely because the initial focus of cognitive behavior therapy is on increasing meal regularity and reducing dietary restraint (21). This focus, including regular weighing, may increase concerns about shape and intensify bulimic cognitions, because patients fear that they will gain weight if they eat more regularly. It is possible that reducing the emphasis on regular eating and dealing with bulimic cognitions early in treatment might reduce the probability of dropping out. Alternatively, a treatment approach that does not focus on eating behavior directly, such as interpersonal therapy, which has been shown in one study to be as effective as cognitive behavior therapy although slower in achieving maximal effectiveness (32), might be valuable for these individuals.
The comparison of treatment responders and nonresponders provides less guidance concerning potential changes in treatment strategies. It is possible that those with a lower body mass index, indicating a long history of dietary restriction, may be more resistant to treatment or require longer treatment. In addition, poor social adjustment may interfere with the therapeutic relationship. However, although such associations may provide some guidance in choosing more effective treatment strategies, they can provide the clinician with only broad and imprecise indications as to which patient will drop out or not respond to treatment.
Signal detection analysis can be used to establish cutoff points for a variable or set of variables that provide maximal efficiency in distinguishing between two mutually exclusive groups; in this case, the first distinction is between dropouts and nondropouts, and the second is between responders and nonresponders to cognitive behavior therapy. In the first case, pretreatment characteristics, while statistically significant, were not clinically useful in identifying who would or would not drop out of treatment, because too high a proportion of nondropouts (32%) would be considered dropouts. However, the reason for dropping out of the study was unknown for 68% of the dropouts, posing a significant limitation for this analysis.
In the second case, the best predictor of response to treatment was the reduction in purging at session 6, providing a better prediction than any pretreatment variable. Modeling this cutoff point indicated that 70% of the bulimic patients would be identified and correctly treated, at a cost of only 6% of the patients who would have recovered with cognitive behavior therapy but who would be assigned to the second treatment. The remaining patients, who would not respond to cognitive behavior therapy, could be given the alternative treatment at the end of cognitive behavior therapy. Assigning the majority of patients to a second treatment early in the course of therapy should decrease the cost of therapy. This is an important issue in the era of managed care. Such triage would allow treatment providers to better manage the limited mental health care benefits available to their patients. Moreover, more rapidly abandoning an ineffective treatment may lessen patient frustration and lower the dropout rate. The identification of cutoff points that provide useful guidance for judging who will or will not improve early in treatment is important for the development of treatment algorithms. When tested in outcome studies, such algorithms can guide clinicians as to when to change a treatment and which treatment to then offer to their patients.
These results require replication in another large group of patients with bulimia nervosa treated with cognitive behavior therapy. However, in the interim the results suggest that clinicians using cognitive behavior therapy might consider adding a second treatment, for example, an antidepressant, if six treatment sessions do not lead the patient to reduce purging by at least 70%.
Received April 16, 1999; revisions received Aug. 24 and Nov. 30, 1999; accepted Jan. 6, 2000. From the Department of Psychiatry, Stanford University; the Department of Psychiatry, University of Minnesota, Minneapolis; the Department of Psychiatry, Cornell University, New York; the Department of Neuroscience, University of North Dakota, and Neuropsychiatric Research Institute, Grand Forks, N.D.; and the Department of Psychology, Rutgers University, Piscataway, N.J. Address reprint requests to Dr. Agras, Department of Psychiatry, Stanford University, 401 Quarry Rd., Stanford, CA 94305-5722.Supported in part by grants from the McKnight Foundation to Cornell University, the University of Minnesota, Rutgers University, and Stanford University.