
Am J Psychiatry 162:2395, December 2005
doi: 10.1176/appi.ajp.162.12.2395
© 2005 American Psychiatric Association
Why the Hamilton Depression Rating Scale Endures
CHING-LIN HSIEH, PH.D. Taipei, Taiwan, and
CHENG-HIS HSIEH, M.D. Taoyuan, Taiwan
To the Editor: The article by Dr. Bagby et al. presented a thorough review and argued persuasively for the rejection of the Hamilton depression scale as the gold standard for the measurement of depression. The results are particularly useful for those who might consider using the scale in a clinical trial.
However, we would like to raise a few concerns regarding the psychometric terms and the statistical indices used in the study. First, the authors used "predictive validity" to determine the ability of the Hamilton depression scale to detect change in depression after treatment. However, predictive validity is commonly used to predict future health status or use of health services. For example, Lahey et al. (1) examined the predictive validity of the DSM-IV diagnostic criteria for attention deficit hyperactivity disorder to predict 3-year symptoms and associated impairment. To describe the extent of a scales ability to detect change, "responsiveness" is often used in the literature (2).
Second, Pearsons correlation coefficient (r) is not appropriate to be used to summarize the item-level agreement (i.e., interrater reliability and retest reliability) (3) of the Hamilton depression scale. Pearsons r examines the level of linear associationbut not agreementbetween two (continuous) measurements whose distributions are assumed to follow the normal curve. However, the measurement level of each item of the scale is ordinal. Instead, the weighted kappa examines the agreement between ordinal measurements and adjusts for chance agreement and level of agreement and is the appropriate index to be used in this instance (3). The drawbacks of using Pearsons r in examining item-level reliability should have been noted.
Third, the purposes of the criteria used for appraising reliability (e.g., Cronbachs alpha 0.70 reflecting adequate reliability or Pearsons r>0.7 indicating acceptable reliability) of the Hamilton depression scale were not clearly specified. The criteria used for appraising reliability in their study are acceptable for research purposes (i.e., for group comparisons) but not for clinical application (i.e., for individual comparisons) (3). For example, if the retest reliability coefficient of a scale is 0.7 (e.g., r=0.7), it means that only 49% of the variance in the data is accounted for (or up to 51% of measurement errors) between test and retest measurement. A higher benchmark (e.g., alpha 0.90) for appraising the reliability of a measure is suggested for monitoring an individuals score (2).
The concerns we raised do not affect the main conclusion of this article. However, they should be clarified for readers.
References
- Lahey BB, Pelham WE, Loney J, Kipp H, Ehrhardt A, Lee SS, Willcutt EG, Hartung CM, Chronis A, Massetti G: Three-year predictive validity of DSM-IV attention deficit hyperactivity disorder in children diagnosed at 46 years of age. Am J Psychiatry 2004; 161:20142020[Abstract/Free Full Text]
- Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res 2002; 11:193205[CrossRef][Medline]
- Tooth LR, Ottenbacher KJ: The kappa statistic in rehabilitation research: an examination. Arch Phys Med Rehabil 2004; 85:13711376[CrossRef][Medline]
Get information about faster international access.
a>
Privacy Policy
Copyright © 2005
American Psychiatric Association.
All rights reserved.
Home
| Search
| Current Issue
| Past Issues
| Subscribe
| All APPI Journals
| Help
| Contact Us
|