The depressive syndrome is one of the oldest in psychiatry, having been clearly described by the physicians of antiquity (1). Many issues have long been debated in the nosology of depression, including the relationship between "melancholic" and "neurotic/reactive" depression (2–6), the value of the primary-secondary distinction (7, 8) and "familial" subtypes (7, 9), and the diagnostic interface between depression and schizophrenia (10–12). However, less attention has been paid to the boundaries of the depressive syndrome itself. Perhaps this is because the key diagnostic question in patients with depressive symptoms in clinical settings, where most depression research has been carried out, is usually, "What is the best diagnosis?" However, in epidemiologic studies, where the range of symptoms is broader and mild cases are common, the more relevant question in individuals with depressive symptoms is often, "Is this a case of major depression?"
We evaluated two mutually exclusive hypotheses about the syndrome of major depression—as articulated in DSM-IV:
1. Is major depression a discrete syndrome with "points of rarity" at its boundaries? That is, is there a discontinuity in etiologic processes so that major depression differs qualitatively and not just quantitatively from subsyndromal conditions?
2. Is major depression a diagnostic convention imposed on a continuum of depressive symptoms of varying severity and duration?
We evaluated three key features of the syndrome of major depression: 1) number of symptoms listed under criterion A for major depressive episode (DSM-IV requires that at least five be present), 2) level of severity or impairment required for rating individual symptoms as present (DSM-IV requires either significant distress or significant impairment in functioning for the entire syndrome), and 3) duration (DSM-IV requires a minimum of 2 weeks). Our strategy was to examine, in an epidemiologic sample of female twins, individuals with depressive symptoms who reported varying numbers of symptoms, duration, and levels of severity or impairment, both above and below the DSM-IV diagnostic threshold. In these individuals, we examined how varying levels of these features predicted not only future episodes of major depression but also risk for major depression in the co-twin—two of the validating criteria most widely used for psychiatric disorders (13).
As shown in F1, and previously articulated by Kendell and Brockington (12), hypothesis 1 predicts a discontinuity in the relationship among symptoms, duration, and impairment and the validating criteria. For example, this hypothesis predicts a much larger difference in the risk of major depression in co-twins between individuals having four versus five of the criterion A symptoms than between those having three versus four symptoms or five versus six. By contrast, hypothesis 2 predicts that the relationship among symptoms, duration, and impairment will be relatively smooth and continuous with no apparent "breaks" at the points articulated in DSM-IV.
The Caucasian female same-sex twins studied in this report are part of a longitudinal study of genetic and environmental risk factors for common psychiatric disorders. The twins, ascertained from the population-based Virginia Twin Registry, were eligible to participate in this study if both members of the pair had previously responded to a mailed questionnaire, to which the individual response rate was 64%. In our first series of personal interviews, we succeeded in interviewing 92% (N=2,163) of the eligible individuals. Ninety percent of the interviews were face-to-face; the rest were completed by telephone. Written informed consent was obtained before all face-to-face interviews, and personal assent was obtained for all telephone interviews. The mean age of the participating twins was 30.1 years (SD=7.6). Zygosity was determined blindly by using standard questions (14), photographs, and, when necessary, DNA testing (15).
Since the original interview, we have completed two additional series of telephone interviews, which succeeded in interviewing 2,001 (92.5%) and 1,898 (87.7%) of the originally interviewed subjects, respectively. The mean number of months between the first and third interviews was 61.3 (SD=5.1).
For these analyses, we used two subsamples: 1) individuals who completed all three personal interviews (N=1,822), in whom we attempted to predict risk for major depression at the time of either the second or third interview as a function of the characteristics of depressive syndromes reported at the first interview, and 2) members of pairs of known zygosity where both members were interviewed at least once over the first, second, and third interviews (N=2,058).
Information was collected from all respondents at each of the three interviews on the occurrence of 20 individual symptoms during the year before interview. Fourteen of these symptoms were disaggregated versions of the nine symptoms listed under criterion A for major depressive episode in DSM-III-R (p. 222). Six of the nine symptoms were each represented by a single item. Two symptoms were disaggregated into two items each: criterion A4 was divided into separate insomnia and hypersomnia items, while criterion A5 was divided into separate items assessing psychomotor agitation and retardation. Criterion A2 was disaggregated into four items: decreased appetite, increased appetite, decreased weight, and increased weight.
Symptoms were required to have a duration of at least 5 days. For every symptom reported present by the subject, the interviewer inquired as to the possibility that it was due to physical illness or medication. If in the interviewer's judgment this was the case, which occurred 17.4% of the time across all symptoms, then the symptom was considered not present.
For each symptom reported as present, we also inquired about the severity of the symptom and/or symptom-related impairment. This was assessed in several ways. Usually, we asked how much the specific symptom (e.g., feeling of depression, tiredness/fatigue) interfered with the subject's daily life. The response options were "hardly at all," "some," "a lot," and "completely." For weight gain and loss, we asked the number of pounds lost or gained, respectively. For insomnia and hypersomnia, we asked the number of hours of lost sleep or hours spent in extra sleep, respectively. For increased or decreased appetite and for psychomotor agitation, interviewers, after probing, rated the symptom as severe, moderate, or mild. We reduced ratings of severity or impairment for each symptom into three categories—mild, moderate, and severe. For symptoms where we inquired about impairment, we converted the ratings as follows: mild=hardly at all, moderate=some, and severe=a lot or completely. For weight gain and weight loss, we defined mild, moderate, and severe as an episode-related weight change of <10 lb, 10–14 lb, and ≥15 lb, respectively. For insomnia and hypersomnia, we defined mild, moderate, and severe as an episode-related change in sleep of <2 hours, 3–4 hours, and ≥5 hours, respectively. For DSM-III-R criteria that were represented by more than one item, we took the most severe symptom-related impairment reported.
After inquiring about the individual symptoms, the interviewer asked the twin which if any of the endorsed symptoms co-occurred in the last year. For the purposes of this study, we defined a minimal depressive syndrome as consisting of at least three co-occurring depressive symptoms, regardless of level of severity or impairment—one of which had to be depressed mood or loss of interest/pleasure—and lasting at least 5 days.
To examine the impact of the number of symptoms reported, we required a minimum duration of illness of 2 weeks and, in accord with DSM-III-R, did not require any level of severity or impairment for an individual symptom to be counted as present. To examine the impact of level of severity or impairment, we required a minimum duration of 2 weeks and at least five endorsed criterion A symptoms. We then defined three hierarchically organized groups. Subjects who had five or more of the criterion A symptoms at the severe level were classified as severe. Those who had five or more criterion A symptoms at the moderate level but were not classified in the severe group were classified as moderate. Those who met five or more criteria only at the mild level (and thus did not meet criteria for either the moderate or severe group) were classified as mild. To examine the impact of duration, we required a minimum of five criterion A symptoms but no accompanying level of severity or impairment and had no requirement for a reported duration longer than 5 days.
In addition to inquiring about the history of depressive symptoms in the last year, at both the first and third interviews we inquired about the lifetime history of major depression, using a section adapted from the Structured Clinical Interview for DSM-III-R (16). We classified a twin as having a lifetime history of major depression if she reported one or more episodes meeting DSM-III-R criteria during any one of the three interviews.
Our analyses examined two potential validators of the nosologic boundaries of major depression. The first of these was risk of recurrence of DSM-III-R-defined major depression at the second and third interviews as predicted by characteristics of the depressive syndrome assessed at the first interview. These analyses were conducted by using logistic regression, operationalized by PROC LOGISTIC in SAS (17). Subgroups of twins with a depressive syndrome (e.g., those with three, four, five, or more criterion A symptoms) were compared with twins who denied a minimal depressive syndrome at the first interview.
The second validator examined was the hazard rate of lifetime major depression, defined by DSM-III-R, in the co-twin as a function of features of the first minimal depressive syndrome experienced in the first, second, or third interview. This was performed by using the Cox Proportional Hazard method, as operationalized in the PHREG procedure in SAS (17). Each group was compared with twins who denied ever experiencing even a minor depressive syndrome at all assessments. If a twin was not interviewed at all three interviews and reported no lifetime history of major depression, then her age at last interview was treated as her age.
For both the logistic and Cox analyses, we report the regression coefficient, on which tests for linearity are appropriately made, and chi-square with df=1. One-tailed p values are reported because we had a clear a priori directional hypothesis. For the logistic and Cox regression analyses, we also present, because of their ease of interpretation, the odds ratio and risk ratio, respectively.
To correct for the correlated observations in members of a twin pair, we multiplied the variance of the parameter estimate initially obtained by the following equation: ([1+r]x+y)/(x+y), where r equals the intraclass correlation in twin pairs for the dependent variable, x equals the number of complete twin pairs, and y equals the number of unpaired individuals in the analysis. This formula was algebraically derived from the formula for clustered sampling as described by Kish (18). Chi-square and p values were recalculated based on this new and larger estimate of the sampling variance.
We wished to test statistically whether the relationships between our predictor and validator variables were continuous (hypothesis 2 in F1) or contained a discontinuity (hypothesis 1 in F1). To do so, we compared the fit—measured in log likelihood units—of a series of logistic and Cox regression models, including only subjects with a minimal depressive syndrome. We first fit a "covariates only" model, where the sole predictor variable was year of birth. Next, we examined the improvement in fit obtained when a single linear function was added. This function, as depicted in F1 (hypothesis 2), assumes a continuous linear relationship between the predictor and validator measures and can be specified by a single parameter. Finally, we fitted a range of more complex models containing two or more parameters, which introduced discontinuities into the relationship either in the form of a second linear function or a dummy variable specifying that an individual category of the predictor variable has a unique value. The fits of these models were compared by –2×(log likelihood), which approximates a chi-square distribution.
Number of Criterion A Symptoms
Compared with subjects who denied a minimal depressive syndrome during the previous year at the first interview, the future risk for an episode of DSM-III-R major depression was significantly higher for individuals with a depressive episode that included only three or four criterion A symptoms (T1). In general, the greater the number of criterion A symptoms present, the greater the risk for a future episode of major depression. No substantial discontinuity was apparent between four and five criterion A symptoms as hypothesized by DSM-IV. However, the risk for future depressive episodes was substantially greater in subjects with seven or more than for those with six or fewer criterion A symptoms.
In modeling the relationship between the number of criterion A symptoms in subjects with a minimal depressive syndrome and the risk for future depressive episodes, we found that, compared with a covariates-only model, a marked improvement in fit was obtained by a simple model with a single linear function (χ2=20.61, df=1, p<0.0001). No other significant improvements in fit could be found, the closest being the addition of a second linear function, which increased the slope of the regression line between six and seven criterion A symptoms (χ2=1.60, df=1, n.s.).
Compared with subjects who denied a minimal depressive syndrome during the previous year at the first- , second- , and third interview, the risk for lifetime major depression in the co-twins was significantly greater for those twins who reported a depressive syndrome with only three or four criterion A symptoms (T1). In general, the risk of major depression in the co-twin increased with an increasing number of criterion A symptoms. No discontinuity was seen between four and five symptoms, as hypothesized by DSM-IV, or between six and seven symptoms, as seen with the prediction of risk for future depressive episodes.
In modeling this relationship, compared with a covariates-only model, a large improvement in fit was obtained by a simple model with a single linear function (χ2=13.94, df=1, p=0.0002). No significant improvements in fit could be found by the introduction of any discontinuities in the relationship between number of criterion A symptoms and risk for major depression in the co-twin.
Compared with subjects who did not endorse DSM-III-R criteria for major depression in the previous year, those who met criteria for major depression only when symptoms were defined as having mild severity or impairment still had a significantly greater risk for major depression in future years (T2). The magnitude of the increased future risk was actually less for those who met criteria when individual symptoms were required to have at least moderate severity or impairment (T2) and substantially greater for those who met criteria when symptoms had to have severe impairment or severity to be counted present (T2).
In modeling this relationship, compared with a covariates-only model, a significant improvement in fit was obtained by a simple model with a single linear function (χ2=4.97, df=1, p=0.03). The addition of a second, steeper linear function between moderate and severe produced a further significant improvement in fit (χ2=5.16, df=1, p=0.02). No other significant improvements in fit could be found.
The level of severity or impairment required for depressive symptoms predicted an increasing risk for major depression in the co-twin in what appeared to be a monotonic continuous fashion (T2). For subjects who met criteria for major depression only when symptoms were defined as having mild severity or impairment, their co-twins had a significantly higher risk for major depression than co-twins of subjects who did not meet DSM-III-R criteria (T2). These relative risks increased to 1.79 and 2.22, respectively, when, to be counted present, symptoms had to have moderate or severe levels of severity or impairment.
In examining a statistical model of this relationship, we found that, compared with a covariates-only model, a significant improvement in fit was obtained by a simple model with a single linear function (χ2=5.77, df=1, p=0.02). No other significant improvements in fit could be found.
Compared with those twins who reported no major depressive episode in the last year, twins with an episode lasting 5 to 13 days had a significantly higher risk for a future episode of DSM-III-R major depression (T3). The odds ratio for future episodes was slightly lower for those with depressive episodes lasting 14–29 or 30–59 days (T3). A modestly greater risk for future episodes was associated with durations of greater than 60 days.
Compared with a covariates-only model, no significant improvement in fit was found with the addition of a single linear function (χ2=0.22, df=1, n.s.), indicating no statistically significant relationship between duration and future risk for major depression. No significant improvements in fit for this model could be found.
Duration of the reported depressive episode in the index twin was an inconsistent index of the risk for major depression in the co-twin. For episodes shorter than 90 days, the risk in the co-twin increased modestly with increasing duration but decreased again with episodes of 91 or more days in length. Co-twins of subjects with a depressive episode of less than 14 days' duration had a highly significantly increased risk for major depression (T3).
In modeling this relationship, we found that, compared with a covariates-only model, no significant improvement in fit was obtained by a model with a single linear function (χ2=2.44, df=1, p=0.12). The only significant improvement to this was found with a model in which the twins with depressive episodes lasting for 60–90 days differed from all other categories (χ2=8.05, df=1, p=0.005).
Our goal was to evaluate two contrasting hypotheses about the nature of the boundaries of the major depressive syndrome, illustrated in F1. Do DSM-IV criteria for major depression "carve nature at its joints" or is major depression a manmade concept imposed on a diagnostic continuum? We tested three fundamental features of the depressive syndrome against two key validating criteria long used in psychiatric research: occurrence of future depressive episodes and risk of major depression in a co-twin. We failed to find evidence for a discontinuity at the boundaries proposed by DSM-IV. Syndromes that met fewer than five criterion A symptoms, lasted for less than 2 weeks, or were formed of symptoms that were quite mild or produced no impairment had considerable predictive and familial validity. That is, these subsyndromes consistently predicted, at high levels of significance, the risk for subsequent DSM-III-R-defined episodes of major depression and the risk for major depression in a co-twin. We found statistical evidence for only a single discontinuity in the prediction of future depressive episodes (between those meeting symptom definitions requiring mild or moderate versus severe impairment), but this discontinuity was not in line with DSM-IV criteria, nor was it replicated in the prediction of risk of illness in the co-twin.
We have shown that, taken one at a time, three major "gate keeping" DSM-IV criteria for major depression—symptom number, impairment, and duration—do not appear to carve nature at its joints (12). We did not perform a formal taxonomic analysis. Therefore, our results do not directly address the question of whether a discrete depressive syndrome exists in nature, as has been suggested by a number of statistical methods such as cluster analysis (19, 20), grade of membership analysis (21, 22), and latent class analysis (20, 23–25). However, our results do suggest that if a discrete depressive syndrome exists in nature, the current DSM-IV criteria that we evaluated do not perform well in detecting it.
Our results suggesting a continuity of risk between subsyndromal and syndromal major depression are not without precedent. Subsyndromal depressive symptoms have been associated with substantial social morbidity (26–28), a much greater risk for first-onset major depression (29), and a greater risk for mood disorders in first-degree relatives (30). In a recent analysis of the National Comorbidity Survey, Kessler et al. (31) examined three definitions of depression: minor (meeting two, three, or four criterion A symptoms), major (meeting five or six criterion A symptoms), and severe major (meeting seven or more criterion A symptoms). They found, across the three categories, monotonic increases for average number of episodes, average length of longest episodes, impairment, comorbidity, and parental history of psychiatric disorders.
Although the use of operationalized diagnostic criteria has resulted in substantial improvement in the reliability of psychiatric diagnoses, demonstrating their validity has been more problematic. These results suggest that our current DSM-IV diagnostic conventions for major depression, derived largely from the Washington University criteria (32) and Research Diagnostic Criteria (33), may be arbitrary and not reflective of a natural discontinuity in depressive symptoms as experienced in the general population.
The sample studied was entirely female. Results might differ in male subjects. The history of depressive symptoms was assessed, at different points in the interview, for the last year and for lifetime. Given problems in the recall of depressive episodes (34, 35), it is possible that errors of memory might have produced a blurring of a true diagnostic boundary.
Received Jan. 9, 1997; revision received Aug. 5, 1997; accepted Aug. 11, 1997. From the Virginia Institute for Psychiatric and Behavioral Genetics and the Department of Psychiatry and the Department of Human Genetics, Medical College of Virginia of Virginia Commonwealth University, Richmond. Address reprint requests to Dr. Kendler, P.O. Box 980126, Richmond, VA 23298-0126; email@example.com (e-mail). Supported by NIMH grants MH-40828, MH-49492, MH-01277, and MH-54150. The Virginia Twin Registry, established and maintained by W. Nance, M.D., Ph.D., and L. Corey, Ph.D., is supported by grant HD-26746 from the National Institute of Child Health and Human Development and grant NS-31564 from the National Institute of Neurological and Communicative Disorders and Stroke.
Two Hypothesized Relationships Between Levels of Diagnostic Criteria for Major Depression and Validating Variablesa
aWith hypothesis 1, a sharp discontinuity is observed in the validating variable (risk of recurrence of major depression or risk of major depression in a co-twin) as a function of the level of the diagnostic criteria (the number of symptoms listed for criterion A in the DSM-III-R diagnosis of major depressive episode, the level of impairment required for rating a symptom as present, and the duration of episodes). Such a discontinuity may indicate a "true" diagnostic boundary in nature. By contrast, with hypothesis 2, continuity is observed between the validating variable and the diagnostic criteria. Such a pattern of results would argue that, with respect to the validators examined, the diagnostic criteria reflect a continuum of severity and not a discrete syndrome. This figure is adapted from Kendell and Brockington (12).