The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
Letter to the EditorFull Access

Why the Hamilton Depression Rating Scale Endures

To the Editor: The article by Dr. Bagby et al. presented a thorough review and argued persuasively for the rejection of the Hamilton depression scale as the gold standard for the measurement of depression. The results are particularly useful for those who might consider using the scale in a clinical trial.

However, we would like to raise a few concerns regarding the psychometric terms and the statistical indices used in the study. First, the authors used “predictive validity” to determine the ability of the Hamilton depression scale to detect change in depression after treatment. However, predictive validity is commonly used to predict future health status or use of health services. For example, Lahey et al. (1) examined the predictive validity of the DSM-IV diagnostic criteria for attention deficit hyperactivity disorder to predict 3-year symptoms and associated impairment. To describe the extent of a scale’s ability to detect change, “responsiveness” is often used in the literature (2).

Second, Pearson’s correlation coefficient (r) is not appropriate to be used to summarize the item-level agreement (i.e., interrater reliability and retest reliability) (3) of the Hamilton depression scale. Pearson’s r examines the level of linear association—but not agreement—between two (continuous) measurements whose distributions are assumed to follow the normal curve. However, the measurement level of each item of the scale is ordinal. Instead, the weighted kappa examines the agreement between ordinal measurements and adjusts for chance agreement and level of agreement and is the appropriate index to be used in this instance (3). The drawbacks of using Pearson’s r in examining item-level reliability should have been noted.

Third, the purposes of the criteria used for appraising reliability (e.g., Cronbach’s alpha ≥0.70 reflecting adequate reliability or Pearson’s r>0.7 indicating acceptable reliability) of the Hamilton depression scale were not clearly specified. The criteria used for appraising reliability in their study are acceptable for research purposes (i.e., for group comparisons) but not for clinical application (i.e., for individual comparisons) (3). For example, if the retest reliability coefficient of a scale is 0.7 (e.g., r=0.7), it means that only 49% of the variance in the data is accounted for (or up to 51% of measurement errors) between test and retest measurement. A higher benchmark (e.g., alpha ≥0.90) for appraising the reliability of a measure is suggested for monitoring an individual’s score (2).

The concerns we raised do not affect the main conclusion of this article. However, they should be clarified for readers.

References

1. Lahey BB, Pelham WE, Loney J, Kipp H, Ehrhardt A, Lee SS, Willcutt EG, Hartung CM, Chronis A, Massetti G: Three-year predictive validity of DSM-IV attention deficit hyperactivity disorder in children diagnosed at 4–6 years of age. Am J Psychiatry 2004; 161:2014–2020LinkGoogle Scholar

2. Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res 2002; 11:193–205Crossref, MedlineGoogle Scholar

3. Tooth LR, Ottenbacher KJ: The kappa statistic in rehabilitation research: an examination. Arch Phys Med Rehabil 2004; 85:1371–1376Crossref, MedlineGoogle Scholar