0
Get Alert
Please Wait... Processing your request... Please Wait.
You must sign in to sign-up for alerts.

Please confirm that your email address is correct, so you can successfully receive this alert.

1
Reviews and Overviews   |    
The Hamilton Depression Rating Scale: Has the Gold Standard Become a Lead Weight?
R. Michael Bagby, Ph.D.; Andrew G. Ryder, M.A.; Deborah R. Schuller, M.D.; Margarita B. Marshall, B.Sc.
Am J Psychiatry 2004;161:2163-2177. doi:10.1176/appi.ajp.161.12.2163

Abstract

OBJECTIVE: The Hamilton Depression Rating Scale has been the gold standard for the assessment of depression for more than 40 years. Criticism of the instrument has been increasing. The authors review studies published since the last major review of this instrument in 1979 that explicitly examine the psychometric properties of the Hamilton depression scale. The authors’ goal is to determine whether continued use of the Hamilton depression scale as a measure of treatment outcome is justified. METHOD: MEDLINE was searched for studies published since 1979 that examine psychometric properties of the Hamilton depression scale. Seventy studies were identified and selected, and then grouped into three categories on the basis of the major psychometric properties examined—reliability, item-response characteristics, and validity. RESULTS: The Hamilton depression scale’s internal reliability is adequate, but many scale items are poor contributors to the measurement of depression severity; others have poor interrater and retest reliability. For many items, the format for response options is not optimal. Content validity is poor; convergent validity and discriminant validity are adequate. The factor structure of the Hamilton depression scale is multidimensional but with poor replication across samples. CONCLUSIONS: Evidence suggests that the Hamilton depression scale is psychometrically and conceptually flawed. The breadth and severity of the problems militate against efforts to revise the current instrument. After more than 40 years, it is time to embrace a new gold standard for assessment of depression.

Abstract Teaser
Figures in this Article

The Hamilton Depression Rating Scale (1) was developed in the late 1950s to assess the effectiveness of the first generation of antidepressants and was originally published in 1960. Although Hamilton (1) recognized that the scale had "room for improvement" (p. 56) and that further revision was necessary, the scale quickly became the standard measure of depression severity for clinical trials of antidepressants (2, 3). The Hamilton depression scale has retained this function and is now the most commonly used measure of depression (3). Our objective in this article is to provide a review of the Hamilton depression scale literature published since the last major evaluation of its psychometric properties, more than 20 years ago (4). More recent reviews have appeared (3, 5–7), but they have not systematically examined the literature with regard to a broad range of measurement issues. Significant developments in psychometric theory and practice have been made since the 1950s and need to be applied to instruments currently in use. We evaluate the Hamilton depression scale in light of these current standards and conclude by presenting arguments for and against retaining, revising, or rejecting the Hamilton depression scale as the gold standard for assessment of depression.

Studies for the review were identified by means of MEDLINE searches for both "depression" and "Hamilton." All studies published during the period since the last major review (January 1980 to May 2003) were considered. Studies selected for review had to be explicitly designed to evaluate empirically the psychometric properties of the instrument or to review conceptual issues related to the instrument’s development, continued use, and/or shortcomings. At least 20 published versions of the Hamilton depression scale exist, including both longer and shortened versions. This review was limited to studies that examined the original 17-item version, as the majority of the studies that evaluated the scale’s psychometrics used the 17-item version. Only a small number of studies evaluated other versions, and most of these versions contain the original 17 items. Seventy articles met the selection criteria and were categorized into three groups on the basis of the major psychometric property examined—reliability, item response, and validity. t1 lists the articles included in the review.

+

Reliability

Clinician-rated instruments should demonstrate three types of reliability: 1) internal reliability, 2) retest reliability, and 3) interrater reliability. Cronbach’s alpha statistic (78) is used to evaluate internal reliability, and estimates ≥0.70 reflect adequate reliability (79, 80). The internal reliability of individual items is calculated by using corrected item-to-total correlation with Pearson’s r; items should have a correlation greater than 0.20 (79, 80). Retest reliability assesses the extent to which multiple administrations of the scale generate the same results. When scores on an instrument are expected to change in response to effective treatment, it is necessary to demonstrate that these scores remain the same in the absence of treatment. Interrater reliability assesses the extent to which multiple raters generate the same result. Although Pearson’s r is often used to compute these estimates, the preferred method is the intraclass r (81), which allows for adjustment for agreement by chance. Estimates of retest and interrater reliability should be at a minimum of 0.70 (Pearson’s r) and 0.60 (intraclass r) (82). For retest reliability of scale items, Pearson’s r >0.70 is considered acceptable (83).

+

Internal Reliability

t2 summarizes the results from studies examining internal reliability of the total Hamilton depression scale. Estimates ranged from 0.46 to 0.97, and 10 studies reported estimates ≥0.70. t3 summarizes the studies that examined internal reliability at the item level. The majority of Hamilton depression scale items show adequate reliability. Six items met the reliability criteria in every sample (guilt, middle insomnia, psychic anxiety, somatic anxiety, gastrointestinal, general somatic), and an additional five items met the criteria in all but one sample (depressed mood, suicide, early insomnia, late insomnia, work and interests, hypochondriasis). Loss of insight was the item with the most variable findings, suggesting a potential problem with this item.

+

Interrater Reliability

Total Hamilton depression scale interrater reliabilities are displayed in t2. Pearson’s r ranged from 0.82 to 0.98, and the intraclass r ranged from 0.46 to 0.99. Some investigators provided evidence that the skill level or expertise of the interviewer and the provision of structured queries and scoring guidelines affect reliability (19, 23, 35, 54). Across studies, the best estimate mean of interrater reliability for studies reporting higher levels of interviewer skill and use of expert raters, structured queries, and scoring guidelines did not statistically differ from that for other studies (z=0.81, n.s.).

At the individual item level, interrater reliability is poor for many items. Cicchetti and Prusoff (19) assessed reliability before treatment initiation and 16 weeks later at trial end. Only early insomnia was adequately reliable before treatment, and only depressed mood was adequately reliable after treatment. Thirteen items had coefficients <0.50 before treatment, and 11 items had coefficients <0.50 after treatment. Rehm and O’Hara (61) performed a similar analysis with data from two samples. Six items showed adequate reliability in the first sample (early insomnia, middle insomnia, late insomnia, somatic anxiety, gastrointestinal, loss of libido), as did 10 in the second sample (depressed mood, guilt, suicide, early insomnia, middle insomnia, late insomnia, work/interests, psychic anxiety, somatic anxiety, gastrointestinal). Loss of insight showed the lowest interrater agreement in both samples. Craig et al. (20) found that only one item, work/interests, had adequate interrater reliability. Moberg et al. (50) reported that nine items demonstrated adequate reliability when the standard Hamilton depression scale was administered (depressed mood, guilt, suicide, early insomnia, late insomnia, agitation, psychic anxiety, hypochondriasis, loss of insight), but all items showed adequate reliability when the scale was administered with interview guidelines. Potts et al. (59) demonstrated that a single omnibus coefficient can mask specific problems. Using a structured interview version of the Hamilton depression scale, they found an overall intraclass coefficient of 0.92; however, two trained psychiatrists differed at least 20% of the time in their ratings of psychic anxiety, psychomotor agitation, and psychomotor retardation, and they differed by at least two points 15% of the time in their ratings of loss of libido. The ratings of trained raters disagreed with the psychiatrists’ ratings on psychomotor agitation (50% of the time), hypochondriasis (60%), loss of libido (90%), and loss of energy (100%).

+

Retest Reliability

Retest reliability for the Hamilton depression scale ranged from 0.81 to 0.98 (t2). Retest reliability at the item level (t3) ranged from 0.00 to 0.85. Williams (76) argued in favor of using structured interview guides to boost item and total scale reliability and developed the Structured Interview Guide for the Hamilton Depression Rating Scale. This effort increased the mean retest reliability across individual items to 0.54, although only four items met the criteria for adequate reliability (depressed mood, early insomnia, psychic anxiety, and loss of libido).

+

Item Characteristics

+

Content and scaling

Standard psychometric practice dictates that items within an instrument should measure a single symptom and contain response options linked to increasing or decreasing amounts of that symptom. Each item is assumed to contribute equally to the total score or be backed with evidence in support of differential weighting. These criteria are not consistently met by using the current scaling procedure or the options for rating symptoms. Although improperly scaled items can cause problems in quantitative measurement, evaluation of item scaling takes place first at a qualitative level. Some Hamilton depression scale items measure single symptoms along a meaningful continuum of severity; many do not. The item assessing depressed mood includes a combination of affective, behavioral, and cognitive features, such as gloomy attitude, pessimism about the future, subjective feeling of sadness, and tendency to weep. The general somatic symptoms item, which is also symptomatically heterogeneous, includes feelings of heaviness, diffuse backache, and loss of energy. Headache is coded only as part of somatic anxiety along with such symptoms as indigestion, palpitations, and respiratory difficulties. Genital symptoms for women entail loss of libido and menstrual disturbances. The problems inherent in the heterogeneity of these rating descriptors reduce the potential meaningfulness of these items, a problem exacerbated if the different components of an item actually measure multiple constructs and thus measure different effects.

Most items on the Hamilton depression scale at least are scaled so that increasing scores represent increasing severity. It is less clear whether the anchors used for different scores on certain items actually assess the same underlying construct/syndrome. This ambiguity is most obvious for severity ratings involving psychotic features. The feelings of guilt item, for example, is graded as follows: 0=absent, 1=self-reproach, 2=ideas of guilt or rumination over past errors or sinful deeds, 3=present illness is a punishment, and 4=hears accusatory or denunciatory voices and/or experiences threatening visual hallucinations. A patient with guilt-themed hallucinations may be more severely ill than a patient who has nonpsychotic guilty feelings, but is he/she feeling more guilt? The psychotic features may instead represent a qualitatively different construct/syndrome associated with more severe illness. Similarly, the hypochondriasis item progresses through bodily self-absorption (rated 1) and preoccupation with health (rated 2) before switching to querulous attitude (rated 3) and then again to hypochondriacal delusions (rated 4). These item-scoring anchors violate basic measurement principles, because nominal scaling and ordinal scaling are combined in a single item.

Although Hamilton (1) explained the rationale for the inclusion of both 3-point and 5-point items, the argument was not made on the grounds of differential weighting. Hamilton believed that certain items would be difficult to anchor dimensionally and therefore assigned them fewer response options. The end result is that certain items contribute more to the total score than others. Contrasting psychomotor retardation and psychomotor agitation, for example, reveals that a severe manifestation of the former contributes 4 points, whereas an equally severe manifestation of the latter contributes 2 points. Similarly, someone who weeps all the time can contribute 3 or 4 points on depressed mood, whereas someone who feels tired all the time can contribute only 2 points on the general somatic symptoms item.

+

Item Response Analysis

A psychiatric rating scale should measure a single psychopathological construct (i.e., an illness or syndrome) and be composed of items that adequately cover a range of symptoms that are consistently associated with the syndrome. Item response theory, a method used increasingly in the evaluation and construction of psychometric instruments, permits empirical evaluation of these premises. It is important to note that this method was not available when the original Hamilton depression scale was developed, although some researchers more recently used this method to evaluate this instrument. According to item response theory, a scale and its constituent items may have good reliability estimates but still fail to meet item response theory criteria. For example, if a depression scale were composed only of items measuring mild depression, the instrument would have great difficulty distinguishing between moderate and severe cases of depression, as both would be characterized by high scores on all items. This issue is particularly pressing in studies of clinical change; not only is a wide range of severity often represented in this research, but individual patients are expected to move along this continuum as they improve. Continued use of items insensitive to change underestimates the strength of actual treatment effects and makes it necessary to have larger samples to demonstrate that an effect is statistically significant. Falsely identifying patients as not having changed represents an additional source of "noise" and weakens the "signal" of a true treatment effect. A pragmatic implication of such lack of sensitivity is that new compounds shown to be promising in the laboratory may appear spuriously ineffective in clinical trials.

A related issue concerns the extent to which a severity score actually measures a single unidimensional syndrome. To summarize a syndrome with a single score requires a precise understanding of what that score represents. The implicit assumption is that the severity score represents a single dimension (84); if depression is heterogeneous, interpretation of a single summed score is unclear. If, for example, items assessing psychological and physical symptoms were only loosely related, a single score would not distinguish between two potentially different groups of depressed patients—one group whose symptoms were primarily psychological and another group with primarily vegetative symptoms. Any effects of an intervention targeting only one of these aspects would be harder to detect.

Gibbons et al. (85) presented a strategy for identifying a unidimensional set of items from a psychiatric rating scale and evaluating the extent to which these items adequately measure the full range of depression severity. Subsequently, a subset of Hamilton depression scale items that would measure a single dimension of depression across a wide range of severity was developed (30). This subset included depressed mood, which was sensitive at low levels; work/interests, psychic anxiety, and loss of libido, which were sensitive at mild levels; somatic anxiety, psychomotor agitation, and guilt, which were sensitive at moderate levels; and suicide, which was sensitive at severe levels. These items were proposed as a psychometrically stronger form of the full Hamilton depression scale.

Santor and Coyne (64, 65) used item response theory to examine the functioning of the full Hamilton depression scale and its individual items. In one of these studies (65) they examined individual Hamilton depression scale item performance in a combined sample of primary care patients and depressed patients from the National Institute of Mental Health Treatment of Depression Collaborative Research Program. One expects different item ratings at different levels of depression severity, with zeroes more common at mild levels of overall depression and higher item scores more common with more severe overall depression. Moreover, whereas most items on the Hamilton depression scale are, overall, sensitive to depression severity, 12 items had at least one problematic response option (the five items that had no such problems were depressed mood, guilt, suicide, work/interests, and psychic anxiety) (64). For example, the likelihood of receiving a rating of 1 on the insomnia items was essentially the same regardless of the overall severity of depression, but the likelihood of receiving a rating of 4 on somatic anxiety was very low even when overall depression was severe. These findings confirm that the rating scheme is not ideal for many items on the Hamilton depression scale, with the unfortunate effect of decreasing the capacity of the Hamilton depression scale to detect change (6, 7).

+

Rasch Analysis

Additional efforts to analyze the performance of individual Hamilton depression scale items and to identify an underlying single dimension of depression severity have benefited from a technique known as Rasch analysis, a method similar to item response theory. Rasch analysis proposes an ideal underlying dimension based on mathematical and theoretical reasoning about the construct that is being measured and then assesses the extent to which actual data correspond to this ideal. This approach was first applied to the Hamilton depression scale by Bech et al. (86), who confirmed that six items previously shown to have properties associated with unidimensionality (87) could be combined to create a shorter scale that met the formal Rasch criteria. This six-item scale was thus proposed as a better measure than the full Hamilton depression scale for assessing depression severity along a single dimension; the six-item scale is composed of items for depressed mood, guilt, work/interests, psychomotor retardation, anxiety psychic, and general somatic symptoms (87). The unidimensionality of this six-item subscale has since been confirmed in two studies that used Rasch methods (13, 14). Maier and Philipp (44) used Rasch analysis to confirm unidimensionality for a subset of Hamilton depression scale items. The resulting scale was similar to that obtained by Bech et al. (86). In another study that used Rasch analysis (46), six items were found to be problematic: suicide, psychomotor agitation, anxiety somatic, general somatic symptoms, hypochondriasis, and loss of insight.

+

Validity

Validity of psychiatric rating scales such as the Hamilton depression scale comprises 1) content, 2) convergent, 3) discriminant, 4) factorial, and 5) predictive validity. Content validity is assessed by examining scale items to determine correspondence with known features of a syndrome. Convergent validity is adequate when a scale shows Pearson’s r values of at least 0.50 in correlations with other measures of the same syndrome. Discriminant validity is established by showing that groups differing in their diagnostic status can be separated by using the scale. Predictive validity for symptom severity measures such as the Hamilton depression scale is determined by a statistically significant (p<0.05) capacity to predict change with treatment. Factorial validity is established by using factor analysis or related techniques (e.g., principal-component analysis) to demonstrate that a meaningful structure can be found in multiple samples. An a priori criterion of 0.40 has been used to identify which items are part of which factors (88).

+

Content validity

Because of its wide use and long clinical tradition, the Hamilton depression scale seems to both define as well as measure depression. One could criticize DSM-IV for not adequately capturing Hamilton depression scale depression as much as one could criticize the Hamilton depression scale for not providing full coverage of DSM-IV depression. Nonetheless, the operational criteria provided in DSM-IV are used as the official nosology for much of psychiatry worldwide. The criteria for major depression have been revised three times in response to developments in field trial research and clinical consensus based on expert opinion, most recently in 1994. Researchers have developed a number of longer versions of the Hamilton depression scale that include additional symptoms such as the reverse vegetative features of atypical depression. However, the core items of the Hamilton depression scale have remained unchanged for more than 40 years. It is reasonable to ask whether this instrument captures depression as it is currently conceptualized. Several symptoms contained within the Hamilton depression scale are not official DSM diagnostic criteria, although they are recognized as features associated with depression (e.g., psychic anxiety). For other symptoms included in the Hamilton depression scale (e.g., loss of insight, hypochondriasis), the link with depression is more tenuous. More critically, important features of DSM-IV depression are often buried within more complex items and sometimes are not captured at all. The work/interests item includes anhedonic features along with listlessness, indecisiveness, social avoidance, and lowered productivity. It is impossible to determine the extent to which anhedonia per se influences severity. Guilt is captured in both Hamilton depression scale depression and DSM-IV depression, but the Hamilton depression scale contains no explicit assessment of feelings of worthlessness. Decision-making difficulties are buried within the work/interests item of the Hamilton depression scale, but concentration difficulties are not included. The reverse vegetative symptoms—weight gain, hyperphagia, and hypersomnia—were provided by Hamilton (1) as additional items but are not scored on the original Hamilton depression scale.

+

Convergent validity

A wide range of instruments has been used to examine the convergent validity of the Hamilton depression scale (t4). Most of the correlation coefficients met the preestablished criterion, and the Hamilton depression scale showed adequate convergent validity in correlations with all but two scales, including the major depression section of the Structured Clinical Interview for DSM-IV. The latter finding provides evidence of noncorrespondence between the Hamilton depression scale and DSM-IV.

+

Discriminant validity

Two approaches have been used to evaluate the discriminant validity of the Hamilton depression scale. In the first approach, several studies used the receiver operating curve as a statistical means of determining the cutoff scores for detecting depression and then provided corresponding rates of sensitivity, specificity, positive predictive power, and negative predictive power for the Hamilton depression scale in distinguishing depressed and nondepressed subjects. In other studies, researchers have examined the capacity of the Hamilton depression scale to distinguish different groups of clinical patients (e.g., patients with endogenous versus those with nonendogenous depression, patients with anxiety versus those with depression) using statistical techniques to detect mean group differences. Classification rates resulting from receiver operating curve analysis have not been widely reported in the Hamilton depression scale literature. Our search only identified seven studies (t5), and some of these investigations sought to detect depression in samples of patients with medical conditions other than psychiatric disorders (t1). Sensitivity, specificity, and negative predictive power were generally consistent and large, but positive predictive power was more variable, and two studies reported very low positive predictive power.

The second type of discriminant validity study attempts to distinguish different clinical groups. In a comparison of healthy, depressed, and bipolar depressed individuals, Rehm and O’Hara (61) found that the total Hamilton depression scale score clearly differentiated these three categories, with the depressed patients scoring higher than the healthy participants and with the bipolar depressed patients scoring higher than both of the other groups. At the item level, four items—psychomotor agitation, gastrointestinal symptoms, loss of insight, and weight loss—failed to differentiate depressed from healthy subjects. Only psychic anxiety and hypochondriasis significantly differentiated the subjects with unipolar and bipolar depression. Kobak et al. (37) showed significant total scale score differences between individuals with major depression, individuals with minor depression, and healthy comparison subjects. Zheng et al. (77) reported that the Hamilton depression scale was able to discriminate psychiatric patients classified as mildly, moderately, and severely dysfunctional on the basis of Global Severity Scale scores. Thase et al. (73) found that the Hamilton depression scale could distinguish patients with endogenous depression from patients with nonendogenous depression, with patients in the former category having higher scores. Gottlieb et al. (32) reported no significant differences between the Hamilton depression scale scores of patients classified as having low-severity versus high-severity Alzheimer’s disease. Several researchers have investigated the capacity of the Hamilton depression scale to differentiate between patients with anxiety and those with depression. Prusoff and Klerman (89) suggested the Hamilton depression scale could indeed separate these constructs, and Maier et al. (45) demonstrated that the Hamilton depression scale had a higher correlation with an external measure of depression than with an external measure of anxiety, but the saturation of the Hamilton depression scale with anxiety-related concepts was nonetheless considerable.

+

Predictive validity

Edwards et al. (90) performed a meta-analysis of 19 studies with a total of 1,150 patients that compared the predictive validity of the Hamilton depression scale and the Beck Depression Inventory. Treatments included pharmacotherapy, behavior therapy, cognitive restructuring, dynamic psychotherapy, and various combinations. The Hamilton depression scale was found to be more sensitive to change, compared to the Beck Depression Inventory. Lambert et al. (39) performed a meta-analysis that included 36 studies and a total of 1,850 patients and that compared the Hamilton depression scale to the Beck Depression Inventory and the Zung Self-Rating Depression Scale. They reported that the Hamilton depression scale was more sensitive to change than were the two self-report measures. Sayer et al. (66) also demonstrated that the Hamilton depression scale outperformed the Beck Depression Inventory in detecting change. Lambert et al. (40) reported that the Beck Depression Inventory is more likely to show treatment effects at 12 weeks than the Zung Self-Rating Depression Scale or the Hamilton depression scale; the Zung Self-Rating Depression Scale and the Hamilton depression scale were more likely to detect changes after 3 weeks.

One disadvantage of a multidimensional instrument such as the Hamilton depression scale in detecting change is that specific treatments may affect only a single dimension. If the total score includes somatic symptoms that actually reflect treatment side effects, estimates of treatment response will be spuriously low (44). In two studies and one meta-analysis researchers addressed this issue using the various unidimensional core depression item sets described earlier in the section on item characteristics (91, 92). The six-item subscale developed by Bech et al. (87) was found to be at least as responsive as the full Hamilton depression scale. A meta-analysis of eight fluoxetine studies with 1,658 patients showed that the different unidimensional subscales (44, 87) were more sensitive to change than was the full Hamilton depression scale score. These results were replicated in a second meta-analysis of four tricyclic antidepressant studies (25).

+

Factorial validity

A total of 15 studies with 17 samples reported a factor analysis of the Hamilton depression scale (t6). In most of the studies, researchers used the eigenvalue ≥1 rule to determine the number of factors, extracted those factors from the data using principal-component analysis, and then determined the optimal configuration of items on factors using varimax rotation. The number of factors identified ranged from two to eight. Insomnia items appeared consistently on the same factor in 13 data sets, suggesting a sleep disturbance factor. There was some support for the presence of a general depression factor, as depressed mood, guilt, and suicide appeared together on the same factor in six data sets, and the combination of depressed mood, suicide, and psychic anxiety appeared on the same factor in seven data sets. Support was also found for an anxiety/agitation factor, with the agitation, psychic anxiety, and somatic anxiety items appearing together in six samples. Clearly, the Hamilton depression scale is not unidimensional, as separate sets of items do seem to reliably represent general depression and insomnia factors; however, the exact structure of the Hamilton depression scale’s multidimensionality remains unclear.

The Hamilton depression scale has been the standard for the assessment of depression for more than 40 years. Researchers and policy makers charged with the task of providing standards to evaluate treatment outcomes in depression are faced with three possible solutions: retain, revise, or reject. The latter solution argues for the development of a new instrument or the replacement of the Hamilton depression scale with existing, psychometrically superior instruments.

Many of the psychometric properties of the Hamilton depression scale are adequate and consistently meet established criteria. The internal, interrater, and retest reliability estimates for the overall Hamilton depression scale are mostly good, as are the internal reliability estimates at the item level. Similarly, established criteria are met for convergent, discriminant, and predictive validity, although the latter does suffer somewhat due to multidimensionality. At the item level, interrater and retest coefficients are weak for many items, and the internal reliability coefficients indicate that some items are problematic. The lack of individual item reliability is not necessarily a fatal psychometric flaw; what is critical is that the items as a whole provide adequate reliability.

Evaluation of item response shows that many of the individual items are poorly designed and sum to generate a total score whose meaning is multidimensional and unclear. The problem of multidimensionality was highlighted in the evaluation of factorial validity, which showed a failure to replicate a single unifying structure across studies. Although the unstable factor structure of the Hamilton depression scale may be partly attributable to the diagnostic diversity of population samples, well-designed scales assessing clearly defined constructs produce factor structures that are invariant across different populations (88). Finally, the Hamilton depression scale is measuring a conception of depression that is now several decades old and that is, at best, only partly related to the operationalization of depression in DSM-IV.

These findings indicate that continued use of the Hamilton depression scale requires, at the very least, a complete overhaul of its constituent items. Accumulated empirical evidence offers some hope that substantial revision can redress a number of psychometric problems, thereby providing an improved measure. Shortened versions of the Hamilton depression scale converge on a common set of core features and in general have proven more effective in detecting change. The truncated item sets for these instruments, however, are limited in that they do not permit capture of the full depressive syndrome. Other studies based on item response theory methods have indicated that modifications of the rating scheme are readily implemented and can enhance the unidimensionality of these core symptoms in a manner that allows uniform assessment of change. Identifying a core set of symptoms with proven psychometric qualities, along with making rating scheme changes that would allow consistent assessment of the severity of depression, could provide a foundation for a reconstructed scale. One advantage of such a revision is that it would maintain continuity with the long-standing use of the original Hamilton depression scale. This sort of transition is probably more palatable and therefore more readily acceptable to regulatory commissions.

The Depression Rating Scale Standardization Team revised the Hamilton depression scale (i.e., the GRID-HAMD [93, 94]) by employing several of the methodological advances we have been advocating in this article. They used item response theory methods to inform, in part, the revision process; developed clear structured interview prompts and scoring guidelines; and to some extent standardized the scoring system. We nonetheless believe that by making an effort to retain the original 17 items, the Depression Rating Scale Standardization Team failed to address many of the flaws of the original instrument. Most of the items still measure multiple constructs, items that have consistently been shown to be ineffective have been retained, and the scoring system still includes differential weighting of items. Moreover, the GRID-HAMD content is virtually unchanged from the original. All the items that appeared on the Hamilton depression scale in 1960 are included in the GRID-HAMD. Thus, this revision has neither removed items based on outdated concepts nor added items that incorporate contemporary definitions of depression.

Rejection of the Hamilton depression scale and replacement with an alternative existing measure or the implementation of a new instrument has scientifically compelling advantages over revision. The Inventory of Depressive Symptomatology (95) and the Montgomery-Åsberg Depression Rating Scale (96), designed to address the limitations of the Hamilton depression scale, represent two potential replacement alternatives. Although these instruments measure contemporary definitions of depression (33), neither item response theory methods nor other contemporary measurement techniques were employed in their development. As indicated earlier, such techniques, especially item response theory, maximize the capacity of an instrument to detect change. On the other hand, the development and implementation of a new instrument that is based on current knowledge of depression and that takes advantage of psychometric and statistical advances might offer the best solution. The decision to replace the Hamilton depression scale with either an existing instrument or a newly developed instrument would ultimately rest on consensus that such an instrument could capture more adequately the full spectrum of the depression construct and on empirical evidence of the new instrument’s superiority in detecting treatment effects.

In conclusion, we have been struck with the marked contrast between the effort and scientific sophistication involved in designing new antidepressants and the continued reliance on antiquated concepts and methods for assessing change in the severity of the depression that these very medications are intended to affect. Effort in both areas is critical to the accessibility of new medications for patients with depression. Many scales and instruments used in psychiatry today are based on—or at least include—current DSM symptoms, and the measurement of depression should follow this trend. It is time to retire the Hamilton depression scale. The field needs to move forward and embrace a new gold standard that incorporates modern psychometric methods and contemporary definitions of depression.

           

Received Dec. 7, 2003; revision received Feb. 26, 2004; accepted March 22, 2004. From the Centre for Addiction and Mental Health, University of Toronto; and the Department of Psychology, University of British Columbia, Vancouver, B.C. Address reprint requests to Dr. Bagby, Centre for Addiction and Mental Health, 250 College St., Toronto, Ont., Canada M5T 1R8; michael_bagby@camh.net (e-mail). Supported in part by Eli Lilly and Co. and by a Senior Research Fellowship from the Ontario Mental Health Foundation to Dr. Bagby. Mr. Ryder was supported by a postdoctoral fellowship from the Michael Smith Foundation for Health Research, Vancouver, B.C., Canada. The authors thank Arun Ravindrun and Sid Kennedy for their comments and Natasha Owen for assistance with the manuscript.

Hamilton M: A rating scale for depression. J Neurol Neurosurg Psychiatry  1960; 23:56–62
[PubMed]
[CrossRef]
 
Demyttenaere K, De Fruyt J: Getting what you ask for: on the selectivity of depression rating scales. Psychother Psychosom  2003; 72:61–70
[PubMed]
[CrossRef]
 
Williams JB: Standardizing the Hamilton Depression Rating Scale: past, present, and future. Eur Arch Psychiatry Clin Neurosci 2001; 251(suppl 2):II6-II12
 
Hedlund JL, Vieweg BW: The Hamilton Rating Scale for Depression: a comprehensive review. J Operational Psychiatry  1979; 10:149–165
 
Bech P: Rating scales for affective disorders: their validity and consistency. Acta Psychiatr Scand Suppl  1981; 295:1–101
[PubMed]
 
Bech P: Psychometric development of the Hamilton scales: the spectrum of depression, dysthymia and anxiety, in The Hamilton Scales. Edited by Bech P, Coppen A. Berlin, Springer-Verlag, 1990, pp 72–79
 
Maier W: The Hamilton Depression Scale and its alternatives: a comparison of their reliability and validity, ibid, pp 64–71
 
Aben I, Verhey F, Lousberg R, Lodder J, Honig A: Validity of the Beck Depression Inventory, Hospital Anxiety and Depression Scale, SCL-90, and Hamilton Depression Rating Scale as screening instruments for depression in stroke patients. Psychosomatics  2002; 43:386–393
[PubMed]
[CrossRef]
 
Addington D, Addington J, Schissel B: A depression rating scale for schizophrenics. Schizophr Res  1990; 3:247–251
[PubMed]
[CrossRef]
 
Addington D, Addington J, Atkinson M: A psychometric comparison of the Calgary Depression Scale for Schizophrenia and the Hamilton Depression Rating Scale. Schizophr Res  1996; 19:205–212
[PubMed]
[CrossRef]
 
Akdemir A, Turkcapar MH, Orsel SD, Demirergi N, Dag I, Ozbay MH: Reliability and validity of the Turkish version of the Hamilton Depression Rating Scale. Compr Psychiatry  2001; 42:161–165
[PubMed]
[CrossRef]
 
Baca-Garcia E, Blanco C, Saiz-Ruiz J, Rico F, Diaz-Sastre C, Cicchetti DV: Assessment of reliability in the clinical evaluation of depressive symptoms among multiple investigators in a multicenter clinical trial. Psychiatry Res  2001; 102:163–173
[PubMed]
[CrossRef]
 
Bech P, Allerup P, Maier W, Albus M, Lavori P, Ayuso JL: The Hamilton scales and the Hopkins Symptom Checklist (SCL-90): a cross-national validity study in patients with panic disorders. Br J Psychiatry  1992; 160:206–211
[PubMed]
[CrossRef]
 
Bech P, Tanghoj P, Andersen HF, Overo K: Citalopram dose-response revisited using an alternative psychometric approach to evaluate clinical effects of four fixed citalopram doses compared to placebo in patients with major depression. Psychopharmacology (Berl)  2002; 163:20–25
[PubMed]
[CrossRef]
 
Berard RMF, Ahmed N: Hospital Anxiety and Depression Scale (HADS) as a screening instrument in a depressed adolescent and young adult population. Int J Adolesc Med Health  1995; 8:157–166
 
Berrios GE, Bulbena-Villarasa A: The Hamilton Depression Scale and the numerical description of the symptoms of depression, in The Hamilton Scales. Edited by Bech P, Coppen A. Berlin, Springer-Verlag, 1990, pp 80–92
 
Brown C, Schulberg HC, Madonia MJ: Assessing depression in primary care practice with the Beck Depression Inventory and the Hamilton Rating Scale for Depression. Psychol Assess  1995; 7:59–65
[CrossRef]
 
Carroll BJ, Feinberg M, Smouse PE, Rawson SG, Greden JF: The Carroll Rating Scale for Depression, I: development, reliability and validation. Br J Psychiatry  1981; 138:194–200
[PubMed]
[CrossRef]
 
Cicchetti DV, Prusoff BA: Reliability of depression and associated clinical symptoms. Arch Gen Psychiatry  1983; 40:987–990
[PubMed]
 
Craig TJ, Richardson MA, Pass R, Bregman Z: Measurement of mood and affect in schizophrenic inpatients. Am J Psychiatry  1985; 142:1272–1277
[PubMed]
 
Daradkeh T, Abou-Saleh M, Karim L: The factorial structure of the 17-item Hamilton Depression Rating Scale. Arab J Psychiatry  1997; 8:6–12
 
Deluty BM, Deluty RH, Carver CS: Concordance between clinicians’ and patients’ ratings of anxiety and depression as mediated by private self-consciousness. J Pers Assess  1986; 50:93–106
[PubMed]
[CrossRef]
 
Demitrack MA, Faries D, Herrera JM, DeBrota D, Potter WZ: The problem of measurement error in multisite clinical trials. Psychopharmacol Bull  1998; 34:19–24
[PubMed]
 
Entsuah R, Shaffer M, Zhang J: A critical examination of the sensitivity of unidimensional subscales derived from the Hamilton Depression Rating Scale to antidepressant drug effects. J Psychiatr Res  2002; 36:437–448
[PubMed]
[CrossRef]
 
Faries D, Herrera J, Rayamajhi J, DeBrota D, Demitrack M, Potter WZ: The responsiveness of the Hamilton Depression Rating Scale. J Psychiatr Res  2000; 34:3–10
[PubMed]
[CrossRef]
 
Feinberg M, Carroll BJ, Smouse PE, Rawson SG: The Carroll Rating Scale for Depression, III: comparison with other rating instruments. Br J Psychiatry  1981; 138:205–209
[PubMed]
[CrossRef]
 
Fleck MP, Poirier-Littre MF, Guelfi JD, Bourdel MC, Loo H: Factorial structure of the 17-item Hamilton Depression Rating Scale. Acta Psychiatr Scand  1995; 92:168–172
[PubMed]
[CrossRef]
 
Fuglum E, Rosenberg C, Damsbo N, Stage K, Lauritzen L, Bech P (Danish University Antidepressant Group): Screening and treating depressed patients: a comparison of two controlled citalopram trials across treatment settings: hospitalized patients vs patients treated by their family doctors. Acta Psychiatr Scand  1996; 94:18–25
[PubMed]
 
Gastpar M, Gilsdorf U: The Hamilton Depression Rating Scale in a WHO collaborative program, in The Hamilton Scales. Edited by Bech P, Coppen A. Berlin, Springer-Verlag, 1990, pp 10–19
 
Gibbons RD, Clark DC, Kupfer DJ: Exactly what does the Hamilton Depression Rating Scale measure? J Psychiatr Res  1993; 27:259–273
[PubMed]
[CrossRef]
 
Gilley DW, Wilson RS, Fleischman DA, Harrison DW, Goetz CG, Tanner CM: Impact of Alzheimer’s-type dementia and information source on the assessment of depression. Psychol Assess  1995; 7:42–48
[CrossRef]
 
Gottlieb GL, Gur RE, Gur RC: Reliability of psychiatric scales in patients with dementia of the Alzheimer type. Am J Psychiatry  1988; 145:857–860
[PubMed]
 
Gullion CM, Rush AJ: Toward a generalizable model of symptoms in major depressive disorder. Biol Psychiatry  1998; 44:959–972
[PubMed]
[CrossRef]
 
Hammond MF: Rating depression severity in the elderly physically ill patient: reliability and factor structure of the Hamilton and the Montgomery-Åsberg Depression Rating Scales. Int J Geriatr Psychiatry  1998; 13:257–261
[PubMed]
[CrossRef]
 
Hooijer C, Zitman FG, Griez E, van Tilburg W, Willemse A, Dinkgreve MA: The Hamilton Depression Rating Scale (HDRS); changes in scores as a function of training and version used. J Affect Disord  1991; 22:21–29
[PubMed]
[CrossRef]
 
Hotopf M, Sharp D, Lewis G: What’s in a name? a comparison of four psychiatric assessments. Soc Psychiatry Psychiatr Epidemiol  1998; 33:27–31
[PubMed]
 
Kobak KA, Greist JH, Jefferson JW, Mundt JC, Katzelnick DJ: Computerized assessment of depression and anxiety over the telephone using interactive voice response. MD Comput  1999; 16:64–68
[PubMed]
 
Koenig HG, Pappas P, Holsinger T, Bachar JR: Assessing diagnostic approaches to depression in medically ill older adults: how reliably can mental health professionals make judgments about the cause of symptoms? J Am Geriatr Soc  1995; 43:472–478
[PubMed]
 
Lambert MJ, Hatch DR, Kingston MD, Edwards BC: Zung, Beck, and Hamilton Rating Scales as measures of treatment outcome: a meta-analytic comparison. J Consult Clin Psychol  1986; 54:54–59
[PubMed]
[CrossRef]
 
Lambert MJ, Masters KS, Astle D: An effect-size comparison of the Beck, Zung, and Hamilton rating scales for depression: a three-week and twelve-week analysis. Psychol Rep  1988; 63:467–470
[PubMed]
[CrossRef]
 
Leentjens AF, Verhey FR, Lousberg R, Spitsbergen H, Wilmink FW: The validity of the Hamilton and Montgomery-Åsberg depression rating scales as screening and diagnostic tools for depression in Parkinson’s disease. Int J Geriatr Psychiatry  2000; 15:644–649
[PubMed]
[CrossRef]
 
Leung CM, Wing YK, Kwong PK, Lo A, Shum K: Validation of the Chinese-Cantonese version of the Hospital Anxiety and Depression Scale and comparison with the Hamilton Rating Scale of Depression. Acta Psychiatr Scand  1999; 100:456–461
[PubMed]
[CrossRef]
 
McAdams LA, Harris MJ, Bailey A, Fell R, Jeste DV: Validating specific psychopathology scales in older outpatients with schizophrenia. J Nerv Ment Dis  1996; 184:246–251
[PubMed]
[CrossRef]
 
Maier W, Philipp M: Improving the assessment of severity of depressive states: a reduction of the Hamilton Depression Rating Scale. Pharmacopsychiatry  1985; 18:114–115
[CrossRef]
 
Maier W, Philipp M, Heuser I, Schlegel S, Buller R, Wetzel H: Improving depression severity assessment, I: reliability, internal validity and sensitivity to change of three observer depression scales. J Psychiatr Res  1988; 22:3–12
[PubMed]
 
Maier W, Heuser I, Philipp M, Frommberger U, Demuth W: Improving depression severity assessment, II: content, concurrent and external validity of three observer depression scales. J Psychiatr Res  1988; 22:13–19
[PubMed]
[CrossRef]
 
Marcos T, Salamero M: Factor study of the Hamilton Rating Scale for Depression and the Bech Melancholia Scale. Acta Psychiatr Scand  1990; 82:178–181
[PubMed]
[CrossRef]
 
Meyer JS, Li YS, Thornby J: Validating mini-mental status, cognitive capacity screening and Hamilton depression scales utilizing subjects with vascular headaches. Int J Geriatr Psychiatry  2001; 16:430–435
[PubMed]
[CrossRef]
 
Middelboe T, Ovesen L, Mortensen EL, Bech P: Depressive symptoms in cancer patients undergoing chemotherapy: a psychometric analysis. Psychother Psychosom  1994; 61:171–177
[PubMed]
[CrossRef]
 
Moberg PJ, Lazarus LW, Mesholam RI, Bilker W, Chuy IL, Neyman I, Markvart V: Comparison of the standard and structured interview guide for the Hamilton Depression Rating Scale in depressed geriatric inpatients. Am J Geriatr Psychiatry  2001; 9:35–40
[PubMed]
 
Mottram P, Wilson K, Copeland J: Validation of the Hamilton Depression Rating Scale and Montgomery and Åsberg Rating Scales in terms of AGECAT depression cases. Int J Geriatr Psychiatry  2000; 15:1113–1119
[PubMed]
[CrossRef]
 
Naarding P, Leentjens AF, van Kooten F, Verhey FR: Disease-specific properties of the Rating Scale for Depression in patients with stroke, Alzheimer’s dementia, and Parkinson’s disease. J Neuropsychiatry Clin Neurosci  2002; 14:329–334
[PubMed]
[CrossRef]
 
O’Brien KP, Glaudin V: Factorial structure and factor reliability of the Hamilton Rating Scale for Depression. Acta Psychiatr Scand  1988; 78:113–120
[PubMed]
[CrossRef]
 
O’Hara MW, Rehm LP: Hamilton Rating Scale for Depression: reliability and validity of judgments of novice raters. J Consult Clin Psychol  1983; 51:318–319
[PubMed]
[CrossRef]
 
Olsen LR, Jensen DV, Noerholm V, Martiny K, Bech P: The internal and external validity of the Major Depression Inventory in measuring severity of depressive states. Psychol Med  2003; 33:351–356
[PubMed]
[CrossRef]
 
Onega LL, Abraham IL: Factor structure of the Hamilton Rating Scale for Depression in a cohort of community-dwelling elderly. Int J Geriatr Psychiatry  1997; 12:760–764
[PubMed]
[CrossRef]
 
Pancheri P, Picardi A, Pasquini M, Gaetano P, Biondi M: Psychopathological dimensions of depression: a factor study of the 17-item Hamilton depression rating scale in unipolar depressed outpatients. J Affect Disord  2002; 68:41–47
[PubMed]
[CrossRef]
 
Paykel ES: Use of the Hamilton Depression Scale in General Practice, in The Hamilton Scales. Edited by Bech P, Coppen A. Berlin, Springer-Verlag, 1990, pp 40–47
 
Potts MK, Daniels M, Burnam MA, Wells KB: A structured interview version of the Hamilton Depression Rating Scale: evidence of reliability and versatility of administration. J Psychiatr Res  1990; 24:335–350
[PubMed]
[CrossRef]
 
Ramos-Brieva JA, Cordero-Villafafila A: A new validation of the Hamilton Rating Scale for Depression. J Psychiatr Res  1988; 22:21–28
[PubMed]
[CrossRef]
 
Rehm LP, O’Hara MW: Item characteristics of the Hamilton Rating Scale for Depression. J Psychiatr Res  1985; 19:31–41
[PubMed]
[CrossRef]
 
Reynolds WM, Kobak KA: Reliability and validity of the Hamilton Depression Inventory: a paper-and-pencil version of the Hamilton Depression Rating Scale clinical interview. Psychol Assess  1995; 7:472–483
[CrossRef]
 
Riskind JH, Beck AT, Brown G, Steer RA: Taking the measure of anxiety and depression: validity of the reconstructed Hamilton scales. J Nerv Ment Dis  1987; 175:474–479
[PubMed]
[CrossRef]
 
Santor DA, Coyne JC: Evaluating the continuity of symptomatology between depressed and nondepressed individuals. J Abnorm Psychol  2001; 110:216–225
[PubMed]
[CrossRef]
 
Santor DA, Coyne JC: Examining symptom expression as a function of symptom severity: item performance on the Hamilton Rating Scale for Depression. Psychol Assess  2001; 13:127–139
[PubMed]
[CrossRef]
 
Sayer NA, Sackheim HA, Moeller JR, Prudic J, Devanand DP, Coleman EA, Kiersky JE: The relations between observer-rating and self-report of depressive symptomatology. Psychol Assess  1993; 5:350–360
[CrossRef]
 
Senra Rivera C, Racano Perez C, Sanchez Cao E, Barba Sixto S: Use of three depression scales for evaluation of pretreatment severity and of improvement after treatment. Psychol Rep  2000; 87:389–394
[PubMed]
 
Shain BN, Naylor M, Alessi N: Comparison of self-rated and clinician-rated measures of depression in adolescents. Am J Psychiatry  1990; 147:793–795
[PubMed]
 
Smouse PE, Feinberg M, Carroll BJ, Park MH, Rawson SG: The Carroll Rating Scale for Depression, II: factor analyses of the feature profiles. Br J Psychiatry  1981; 138:201–204
[PubMed]
[CrossRef]
 
Steinmeyer EM, Möller HJ: Facet theoretic analysis of the Hamilton-D scale. J Affect Disord  1992; 25:53–61
[PubMed]
[CrossRef]
 
Strik JJ, Honig A, Lousberg R, Denollet J: Sensitivity and specificity of observer and self-report questionnaires in major and minor depression following myocardial infarction. Psychosomatics  2001; 42:423–428
[PubMed]
[CrossRef]
 
Teri L, Wagner AW: Assessment of depression in patients with Alzheimer’s disease: concordance among informants. Psychol Aging  1991; 6:280–285
[PubMed]
[CrossRef]
 
Thase ME, Hersen M, Bellack AS, Himmelhoch JM, Kupfer DJ: Validation of a Hamilton subscale for endogenomorphic depression. J Affect Disord  1983; 5:267–278
[PubMed]
[CrossRef]
 
Thompson WM, Harris B, Lazarus J, Richards C: A comparison of the performance of rating scales used in the diagnosis of postnatal depression. Acta Psychiatr Scand  1998; 98:224–227
[PubMed]
[CrossRef]
 
Whisman MA, Strosahl K, Fruzzetti AE, Schmaling KB, Jacobson NS, Miller DM: A structured interview version of the Hamilton Rating Scale for Depression: reliability and validity. Psychol Assess  1989; 1:238–241
[CrossRef]
 
Williams JB: A structured interview guide for the Hamilton Depression Rating Scale. Arch Gen Psychiatry  1988; 45:742–747
[PubMed]
 
Zheng YP, Zhao JP, Phillips M, Liu JB, Cai MF, Sun SQ, Huang MF: Validity and reliability of the Chinese Hamilton Depression Rating Scale. Br J Psychiatry  1988; 152:660–664
[PubMed]
[CrossRef]
 
Cronbach LJ: Coefficient alpha and the internal structure of tests. Psychometrika  1951; 16:297–334
[CrossRef]
 
Briggs SR, Cheek JM: The role of factor analysis in the development and evaluation of personality scales. J Pers  1986; 54:106–148
[CrossRef]
 
Nunnally JC, Bernstein IH: Psychometric Theory, 3rd ed. New York, McGraw-Hill, 1994
 
Fleiss JL, Shrout PE: The effects of measurement errors on some multivariate procedures. Am J Public Health  1977; 67:1188–1191
[PubMed]
[CrossRef]
 
Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics  1977; 33:159–174
[PubMed]
[CrossRef]
 
Anastasi A, Urbina S: Psychological Testing, 7th ed. New York, MacMillan, 1997
 
Bock RD, Gibbons RD, Murraki E: Full information item factor analysis. Applied Psychol Measurement  1988; 12:261–280
[CrossRef]
 
Gibbons RD, Clark DC, VonAmmon CS, Davis JM: Application of modern psychometric theory in psychiatric research. J Psychiatr Res  1985; 19:43–55
[PubMed]
[CrossRef]
 
Bech P, Allerup P, Gram LF, Reisby N, Rosenberg R, Jacobsen O, Nagy A: The Hamilton depression scale: evaluation of objectivity using logistic models. Acta Psychiatr Scand  1981; 63:290–299
[PubMed]
[CrossRef]
 
Bech P, Gram LF, Dein E, Jacobsen O, Vitger J, Bolwig TG: Quantitative rating of depressive states. Acta Psychiatr Scand  1975; 51:161–170
[PubMed]
[CrossRef]
 
Gorsuch RL: Factor Analysis. Hillside, NJ, Lawrence Erlbaum Associates, 1983
 
Prusoff B, Klerman GL: Differentiating depressed from anxious neurotic outpatients. Arch Gen Psychiatry  1974; 30:302–309
[PubMed]
 
Edwards BC, Lambert MJ, Moran PW, McCully T, Smith KC, Ellingson AG: A meta-analytic comparison of the Beck Depression Inventory and the Hamilton Rating Scale for Depression as measures of treatment outcome. Br J Clin Psychol 1984; 23(part 2):93–99
 
O’Sullivan RL, Fava M, Agustin C, Baer L, Rosenbaum JF: Sensitivity of the six-item Hamilton Depression Rating Scale. Acta Psychiatr Scand  1997; 95:379–384
[PubMed]
[CrossRef]
 
Hooper CL, Bakish D: An examination of the sensitivity of the six-item Hamilton Rating Scale for Depression in a sample of patients suffering from major depressive disorder. J Psychiatry Neurosci  2000; 25:178–184
[PubMed]
 
Kalai A, Ginertini M, Kobak K, Engelhardt N, Williams JBW, Evans K, Bech P, Lipsitz J, Olin J, Pearson J, Rothman M: The GRID-HAMD: a reliability study in patients with major depression, in Abstracts of the 43rd Annual New Clinical Drug Evaluation Unit (NCDEU) Meeting. Bethesda, Md, NIMH, 2003, Poster I-19
 
Kalai A, Williams JB, Koback KA, Lipsitz J, Engelhardt N, Evans K, Olin J, Pearson J, Rothman M, Bech P: The new GRID HAM-D: pilot testing and international field trials. Int J Neuropsychopharmacol 2002; 5:S147-S148
 
Rush AJ, Giles DE, Schlesser MA, Fulton CL, Weissenburger J, Burns C: The Inventory for Depressive Symptomatology (IDS): preliminary findings. Psychiatry Res  1986; 18:65–87
[PubMed]
[CrossRef]
 
Montgomery SA, Åsberg M: A new depression scale designed to be sensitive to change. Br J Psychiatry  1979; 134:382–389
[PubMed]
[CrossRef]
 
+

References

Hamilton M: A rating scale for depression. J Neurol Neurosurg Psychiatry  1960; 23:56–62
[PubMed]
[CrossRef]
 
Demyttenaere K, De Fruyt J: Getting what you ask for: on the selectivity of depression rating scales. Psychother Psychosom  2003; 72:61–70
[PubMed]
[CrossRef]
 
Williams JB: Standardizing the Hamilton Depression Rating Scale: past, present, and future. Eur Arch Psychiatry Clin Neurosci 2001; 251(suppl 2):II6-II12
 
Hedlund JL, Vieweg BW: The Hamilton Rating Scale for Depression: a comprehensive review. J Operational Psychiatry  1979; 10:149–165
 
Bech P: Rating scales for affective disorders: their validity and consistency. Acta Psychiatr Scand Suppl  1981; 295:1–101
[PubMed]
 
Bech P: Psychometric development of the Hamilton scales: the spectrum of depression, dysthymia and anxiety, in The Hamilton Scales. Edited by Bech P, Coppen A. Berlin, Springer-Verlag, 1990, pp 72–79
 
Maier W: The Hamilton Depression Scale and its alternatives: a comparison of their reliability and validity, ibid, pp 64–71
 
Aben I, Verhey F, Lousberg R, Lodder J, Honig A: Validity of the Beck Depression Inventory, Hospital Anxiety and Depression Scale, SCL-90, and Hamilton Depression Rating Scale as screening instruments for depression in stroke patients. Psychosomatics  2002; 43:386–393
[PubMed]
[CrossRef]
 
Addington D, Addington J, Schissel B: A depression rating scale for schizophrenics. Schizophr Res  1990; 3:247–251
[PubMed]
[CrossRef]
 
Addington D, Addington J, Atkinson M: A psychometric comparison of the Calgary Depression Scale for Schizophrenia and the Hamilton Depression Rating Scale. Schizophr Res  1996; 19:205–212
[PubMed]
[CrossRef]
 
Akdemir A, Turkcapar MH, Orsel SD, Demirergi N, Dag I, Ozbay MH: Reliability and validity of the Turkish version of the Hamilton Depression Rating Scale. Compr Psychiatry  2001; 42:161–165
[PubMed]
[CrossRef]
 
Baca-Garcia E, Blanco C, Saiz-Ruiz J, Rico F, Diaz-Sastre C, Cicchetti DV: Assessment of reliability in the clinical evaluation of depressive symptoms among multiple investigators in a multicenter clinical trial. Psychiatry Res  2001; 102:163–173
[PubMed]
[CrossRef]
 
Bech P, Allerup P, Maier W, Albus M, Lavori P, Ayuso JL: The Hamilton scales and the Hopkins Symptom Checklist (SCL-90): a cross-national validity study in patients with panic disorders. Br J Psychiatry  1992; 160:206–211
[PubMed]
[CrossRef]
 
Bech P, Tanghoj P, Andersen HF, Overo K: Citalopram dose-response revisited using an alternative psychometric approach to evaluate clinical effects of four fixed citalopram doses compared to placebo in patients with major depression. Psychopharmacology (Berl)  2002; 163:20–25
[PubMed]
[CrossRef]
 
Berard RMF, Ahmed N: Hospital Anxiety and Depression Scale (HADS) as a screening instrument in a depressed adolescent and young adult population. Int J Adolesc Med Health  1995; 8:157–166
 
Berrios GE, Bulbena-Villarasa A: The Hamilton Depression Scale and the numerical description of the symptoms of depression, in The Hamilton Scales. Edited by Bech P, Coppen A. Berlin, Springer-Verlag, 1990, pp 80–92
 
Brown C, Schulberg HC, Madonia MJ: Assessing depression in primary care practice with the Beck Depression Inventory and the Hamilton Rating Scale for Depression. Psychol Assess  1995; 7:59–65
[CrossRef]
 
Carroll BJ, Feinberg M, Smouse PE, Rawson SG, Greden JF: The Carroll Rating Scale for Depression, I: development, reliability and validation. Br J Psychiatry  1981; 138:194–200
[PubMed]
[CrossRef]
 
Cicchetti DV, Prusoff BA: Reliability of depression and associated clinical symptoms. Arch Gen Psychiatry  1983; 40:987–990
[PubMed]
 
Craig TJ, Richardson MA, Pass R, Bregman Z: Measurement of mood and affect in schizophrenic inpatients. Am J Psychiatry  1985; 142:1272–1277
[PubMed]
 
Daradkeh T, Abou-Saleh M, Karim L: The factorial structure of the 17-item Hamilton Depression Rating Scale. Arab J Psychiatry  1997; 8:6–12
 
Deluty BM, Deluty RH, Carver CS: Concordance between clinicians’ and patients’ ratings of anxiety and depression as mediated by private self-consciousness. J Pers Assess  1986; 50:93–106
[PubMed]
[CrossRef]
 
Demitrack MA, Faries D, Herrera JM, DeBrota D, Potter WZ: The problem of measurement error in multisite clinical trials. Psychopharmacol Bull  1998; 34:19–24
[PubMed]
 
Entsuah R, Shaffer M, Zhang J: A critical examination of the sensitivity of unidimensional subscales derived from the Hamilton Depression Rating Scale to antidepressant drug effects. J Psychiatr Res  2002; 36:437–448
[PubMed]
[CrossRef]
 
Faries D, Herrera J, Rayamajhi J, DeBrota D, Demitrack M, Potter WZ: The responsiveness of the Hamilton Depression Rating Scale. J Psychiatr Res  2000; 34:3–10
[PubMed]
[CrossRef]
 
Feinberg M, Carroll BJ, Smouse PE, Rawson SG: The Carroll Rating Scale for Depression, III: comparison with other rating instruments. Br J Psychiatry  1981; 138:205–209
[PubMed]
[CrossRef]
 
Fleck MP, Poirier-Littre MF, Guelfi JD, Bourdel MC, Loo H: Factorial structure of the 17-item Hamilton Depression Rating Scale. Acta Psychiatr Scand  1995; 92:168–172
[PubMed]
[CrossRef]
 
Fuglum E, Rosenberg C, Damsbo N, Stage K, Lauritzen L, Bech P (Danish University Antidepressant Group): Screening and treating depressed patients: a comparison of two controlled citalopram trials across treatment settings: hospitalized patients vs patients treated by their family doctors. Acta Psychiatr Scand  1996; 94:18–25
[PubMed]
 
Gastpar M, Gilsdorf U: The Hamilton Depression Rating Scale in a WHO collaborative program, in The Hamilton Scales. Edited by Bech P, Coppen A. Berlin, Springer-Verlag, 1990, pp 10–19
 
Gibbons RD, Clark DC, Kupfer DJ: Exactly what does the Hamilton Depression Rating Scale measure? J Psychiatr Res  1993; 27:259–273
[PubMed]
[CrossRef]
 
Gilley DW, Wilson RS, Fleischman DA, Harrison DW, Goetz CG, Tanner CM: Impact of Alzheimer’s-type dementia and information source on the assessment of depression. Psychol Assess  1995; 7:42–48
[CrossRef]
 
Gottlieb GL, Gur RE, Gur RC: Reliability of psychiatric scales in patients with dementia of the Alzheimer type. Am J Psychiatry  1988; 145:857–860
[PubMed]
 
Gullion CM, Rush AJ: Toward a generalizable model of symptoms in major depressive disorder. Biol Psychiatry  1998; 44:959–972
[PubMed]
[CrossRef]
 
Hammond MF: Rating depression severity in the elderly physically ill patient: reliability and factor structure of the Hamilton and the Montgomery-Åsberg Depression Rating Scales. Int J Geriatr Psychiatry  1998; 13:257–261
[PubMed]
[CrossRef]
 
Hooijer C, Zitman FG, Griez E, van Tilburg W, Willemse A, Dinkgreve MA: The Hamilton Depression Rating Scale (HDRS); changes in scores as a function of training and version used. J Affect Disord  1991; 22:21–29
[PubMed]
[CrossRef]
 
Hotopf M, Sharp D, Lewis G: What’s in a name? a comparison of four psychiatric assessments. Soc Psychiatry Psychiatr Epidemiol  1998; 33:27–31
[PubMed]
 
Kobak KA, Greist JH, Jefferson JW, Mundt JC, Katzelnick DJ: Computerized assessment of depression and anxiety over the telephone using interactive voice response. MD Comput  1999; 16:64–68
[PubMed]
 
Koenig HG, Pappas P, Holsinger T, Bachar JR: Assessing diagnostic approaches to depression in medically ill older adults: how reliably can mental health professionals make judgments about the cause of symptoms? J Am Geriatr Soc  1995; 43:472–478
[PubMed]
 
Lambert MJ, Hatch DR, Kingston MD, Edwards BC: Zung, Beck, and Hamilton Rating Scales as measures of treatment outcome: a meta-analytic comparison. J Consult Clin Psychol  1986; 54:54–59
[PubMed]
[CrossRef]
 
Lambert MJ, Masters KS, Astle D: An effect-size comparison of the Beck, Zung, and Hamilton rating scales for depression: a three-week and twelve-week analysis. Psychol Rep  1988; 63:467–470
[PubMed]
[CrossRef]
 
Leentjens AF, Verhey FR, Lousberg R, Spitsbergen H, Wilmink FW: The validity of the Hamilton and Montgomery-Åsberg depression rating scales as screening and diagnostic tools for depression in Parkinson’s disease. Int J Geriatr Psychiatry  2000; 15:644–649
[PubMed]
[CrossRef]
 
Leung CM, Wing YK, Kwong PK, Lo A, Shum K: Validation of the Chinese-Cantonese version of the Hospital Anxiety and Depression Scale and comparison with the Hamilton Rating Scale of Depression. Acta Psychiatr Scand  1999; 100:456–461
[PubMed]
[CrossRef]
 
McAdams LA, Harris MJ, Bailey A, Fell R, Jeste DV: Validating specific psychopathology scales in older outpatients with schizophrenia. J Nerv Ment Dis  1996; 184:246–251
[PubMed]
[CrossRef]
 
Maier W, Philipp M: Improving the assessment of severity of depressive states: a reduction of the Hamilton Depression Rating Scale. Pharmacopsychiatry  1985; 18:114–115
[CrossRef]
 
Maier W, Philipp M, Heuser I, Schlegel S, Buller R, Wetzel H: Improving depression severity assessment, I: reliability, internal validity and sensitivity to change of three observer depression scales. J Psychiatr Res  1988; 22:3–12
[PubMed]
 
Maier W, Heuser I, Philipp M, Frommberger U, Demuth W: Improving depression severity assessment, II: content, concurrent and external validity of three observer depression scales. J Psychiatr Res  1988; 22:13–19
[PubMed]
[CrossRef]
 
Marcos T, Salamero M: Factor study of the Hamilton Rating Scale for Depression and the Bech Melancholia Scale. Acta Psychiatr Scand  1990; 82:178–181
[PubMed]
[CrossRef]
 
Meyer JS, Li YS, Thornby J: Validating mini-mental status, cognitive capacity screening and Hamilton depression scales utilizing subjects with vascular headaches. Int J Geriatr Psychiatry  2001; 16:430–435
[PubMed]
[CrossRef]
 
Middelboe T, Ovesen L, Mortensen EL, Bech P: Depressive symptoms in cancer patients undergoing chemotherapy: a psychometric analysis. Psychother Psychosom  1994; 61:171–177
[PubMed]
[CrossRef]
 
Moberg PJ, Lazarus LW, Mesholam RI, Bilker W, Chuy IL, Neyman I, Markvart V: Comparison of the standard and structured interview guide for the Hamilton Depression Rating Scale in depressed geriatric inpatients. Am J Geriatr Psychiatry  2001; 9:35–40
[PubMed]
 
Mottram P, Wilson K, Copeland J: Validation of the Hamilton Depression Rating Scale and Montgomery and Åsberg Rating Scales in terms of AGECAT depression cases. Int J Geriatr Psychiatry  2000; 15:1113–1119
[PubMed]
[CrossRef]
 
Naarding P, Leentjens AF, van Kooten F, Verhey FR: Disease-specific properties of the Rating Scale for Depression in patients with stroke, Alzheimer’s dementia, and Parkinson’s disease. J Neuropsychiatry Clin Neurosci  2002; 14:329–334
[PubMed]
[CrossRef]
 
O’Brien KP, Glaudin V: Factorial structure and factor reliability of the Hamilton Rating Scale for Depression. Acta Psychiatr Scand  1988; 78:113–120
[PubMed]
[CrossRef]
 
O’Hara MW, Rehm LP: Hamilton Rating Scale for Depression: reliability and validity of judgments of novice raters. J Consult Clin Psychol  1983; 51:318–319
[PubMed]
[CrossRef]
 
Olsen LR, Jensen DV, Noerholm V, Martiny K, Bech P: The internal and external validity of the Major Depression Inventory in measuring severity of depressive states. Psychol Med  2003; 33:351–356
[PubMed]
[CrossRef]
 
Onega LL, Abraham IL: Factor structure of the Hamilton Rating Scale for Depression in a cohort of community-dwelling elderly. Int J Geriatr Psychiatry  1997; 12:760–764
[PubMed]
[CrossRef]
 
Pancheri P, Picardi A, Pasquini M, Gaetano P, Biondi M: Psychopathological dimensions of depression: a factor study of the 17-item Hamilton depression rating scale in unipolar depressed outpatients. J Affect Disord  2002; 68:41–47
[PubMed]
[CrossRef]
 
Paykel ES: Use of the Hamilton Depression Scale in General Practice, in The Hamilton Scales. Edited by Bech P, Coppen A. Berlin, Springer-Verlag, 1990, pp 40–47
 
Potts MK, Daniels M, Burnam MA, Wells KB: A structured interview version of the Hamilton Depression Rating Scale: evidence of reliability and versatility of administration. J Psychiatr Res  1990; 24:335–350
[PubMed]
[CrossRef]
 
Ramos-Brieva JA, Cordero-Villafafila A: A new validation of the Hamilton Rating Scale for Depression. J Psychiatr Res  1988; 22:21–28
[PubMed]
[CrossRef]
 
Rehm LP, O’Hara MW: Item characteristics of the Hamilton Rating Scale for Depression. J Psychiatr Res  1985; 19:31–41
[PubMed]
[CrossRef]
 
Reynolds WM, Kobak KA: Reliability and validity of the Hamilton Depression Inventory: a paper-and-pencil version of the Hamilton Depression Rating Scale clinical interview. Psychol Assess  1995; 7:472–483
[CrossRef]
 
Riskind JH, Beck AT, Brown G, Steer RA: Taking the measure of anxiety and depression: validity of the reconstructed Hamilton scales. J Nerv Ment Dis  1987; 175:474–479
[PubMed]
[CrossRef]
 
Santor DA, Coyne JC: Evaluating the continuity of symptomatology between depressed and nondepressed individuals. J Abnorm Psychol  2001; 110:216–225
[PubMed]
[CrossRef]
 
Santor DA, Coyne JC: Examining symptom expression as a function of symptom severity: item performance on the Hamilton Rating Scale for Depression. Psychol Assess  2001; 13:127–139
[PubMed]
[CrossRef]
 
Sayer NA, Sackheim HA, Moeller JR, Prudic J, Devanand DP, Coleman EA, Kiersky JE: The relations between observer-rating and self-report of depressive symptomatology. Psychol Assess  1993; 5:350–360
[CrossRef]
 
Senra Rivera C, Racano Perez C, Sanchez Cao E, Barba Sixto S: Use of three depression scales for evaluation of pretreatment severity and of improvement after treatment. Psychol Rep  2000; 87:389–394
[PubMed]
 
Shain BN, Naylor M, Alessi N: Comparison of self-rated and clinician-rated measures of depression in adolescents. Am J Psychiatry  1990; 147:793–795
[PubMed]
 
Smouse PE, Feinberg M, Carroll BJ, Park MH, Rawson SG: The Carroll Rating Scale for Depression, II: factor analyses of the feature profiles. Br J Psychiatry  1981; 138:201–204
[PubMed]
[CrossRef]
 
Steinmeyer EM, Möller HJ: Facet theoretic analysis of the Hamilton-D scale. J Affect Disord  1992; 25:53–61
[PubMed]
[CrossRef]
 
Strik JJ, Honig A, Lousberg R, Denollet J: Sensitivity and specificity of observer and self-report questionnaires in major and minor depression following myocardial infarction. Psychosomatics  2001; 42:423–428
[PubMed]
[CrossRef]
 
Teri L, Wagner AW: Assessment of depression in patients with Alzheimer’s disease: concordance among informants. Psychol Aging  1991; 6:280–285
[PubMed]
[CrossRef]
 
Thase ME, Hersen M, Bellack AS, Himmelhoch JM, Kupfer DJ: Validation of a Hamilton subscale for endogenomorphic depression. J Affect Disord  1983; 5:267–278
[PubMed]
[CrossRef]
 
Thompson WM, Harris B, Lazarus J, Richards C: A comparison of the performance of rating scales used in the diagnosis of postnatal depression. Acta Psychiatr Scand  1998; 98:224–227
[PubMed]
[CrossRef]
 
Whisman MA, Strosahl K, Fruzzetti AE, Schmaling KB, Jacobson NS, Miller DM: A structured interview version of the Hamilton Rating Scale for Depression: reliability and validity. Psychol Assess  1989; 1:238–241
[CrossRef]
 
Williams JB: A structured interview guide for the Hamilton Depression Rating Scale. Arch Gen Psychiatry  1988; 45:742–747
[PubMed]
 
Zheng YP, Zhao JP, Phillips M, Liu JB, Cai MF, Sun SQ, Huang MF: Validity and reliability of the Chinese Hamilton Depression Rating Scale. Br J Psychiatry  1988; 152:660–664
[PubMed]
[CrossRef]
 
Cronbach LJ: Coefficient alpha and the internal structure of tests. Psychometrika  1951; 16:297–334
[CrossRef]
 
Briggs SR, Cheek JM: The role of factor analysis in the development and evaluation of personality scales. J Pers  1986; 54:106–148
[CrossRef]
 
Nunnally JC, Bernstein IH: Psychometric Theory, 3rd ed. New York, McGraw-Hill, 1994
 
Fleiss JL, Shrout PE: The effects of measurement errors on some multivariate procedures. Am J Public Health  1977; 67:1188–1191
[PubMed]
[CrossRef]
 
Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics  1977; 33:159–174
[PubMed]
[CrossRef]
 
Anastasi A, Urbina S: Psychological Testing, 7th ed. New York, MacMillan, 1997
 
Bock RD, Gibbons RD, Murraki E: Full information item factor analysis. Applied Psychol Measurement  1988; 12:261–280
[CrossRef]
 
Gibbons RD, Clark DC, VonAmmon CS, Davis JM: Application of modern psychometric theory in psychiatric research. J Psychiatr Res  1985; 19:43–55
[PubMed]
[CrossRef]
 
Bech P, Allerup P, Gram LF, Reisby N, Rosenberg R, Jacobsen O, Nagy A: The Hamilton depression scale: evaluation of objectivity using logistic models. Acta Psychiatr Scand  1981; 63:290–299
[PubMed]
[CrossRef]
 
Bech P, Gram LF, Dein E, Jacobsen O, Vitger J, Bolwig TG: Quantitative rating of depressive states. Acta Psychiatr Scand  1975; 51:161–170
[PubMed]
[CrossRef]
 
Gorsuch RL: Factor Analysis. Hillside, NJ, Lawrence Erlbaum Associates, 1983
 
Prusoff B, Klerman GL: Differentiating depressed from anxious neurotic outpatients. Arch Gen Psychiatry  1974; 30:302–309
[PubMed]
 
Edwards BC, Lambert MJ, Moran PW, McCully T, Smith KC, Ellingson AG: A meta-analytic comparison of the Beck Depression Inventory and the Hamilton Rating Scale for Depression as measures of treatment outcome. Br J Clin Psychol 1984; 23(part 2):93–99
 
O’Sullivan RL, Fava M, Agustin C, Baer L, Rosenbaum JF: Sensitivity of the six-item Hamilton Depression Rating Scale. Acta Psychiatr Scand  1997; 95:379–384
[PubMed]
[CrossRef]
 
Hooper CL, Bakish D: An examination of the sensitivity of the six-item Hamilton Rating Scale for Depression in a sample of patients suffering from major depressive disorder. J Psychiatry Neurosci  2000; 25:178–184
[PubMed]
 
Kalai A, Ginertini M, Kobak K, Engelhardt N, Williams JBW, Evans K, Bech P, Lipsitz J, Olin J, Pearson J, Rothman M: The GRID-HAMD: a reliability study in patients with major depression, in Abstracts of the 43rd Annual New Clinical Drug Evaluation Unit (NCDEU) Meeting. Bethesda, Md, NIMH, 2003, Poster I-19
 
Kalai A, Williams JB, Koback KA, Lipsitz J, Engelhardt N, Evans K, Olin J, Pearson J, Rothman M, Bech P: The new GRID HAM-D: pilot testing and international field trials. Int J Neuropsychopharmacol 2002; 5:S147-S148
 
Rush AJ, Giles DE, Schlesser MA, Fulton CL, Weissenburger J, Burns C: The Inventory for Depressive Symptomatology (IDS): preliminary findings. Psychiatry Res  1986; 18:65–87
[PubMed]
[CrossRef]
 
Montgomery SA, Åsberg M: A new depression scale designed to be sensitive to change. Br J Psychiatry  1979; 134:382–389
[PubMed]
[CrossRef]
 
+
+

CME Activity

There is currently no quiz available for this resource. Please click here to go to the CME page to find another.
Submit a Comments
Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
Comments are moderated and will appear on the site at the discertion of APA editorial staff.

* = Required Field
(if multiple authors, separate names by comma)
Example: John Doe



Web of Science® Times Cited: 297

Related Content
Articles
Books
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 8.  >
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 8.  >
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 7.  >
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 7.  >
Gabbard's Treatments of Psychiatric Disorders, 4th Edition > Chapter 24.  >
Topic Collections
Psychiatric News
APA Guidelines
PubMed Articles