Please confirm that your email address is correct, so you can successfully receive this alert.
To the Editor: The article by Dr. Bagby et al. presented a thorough review and argued persuasively for the rejection of the Hamilton depression scale as the gold standard for the measurement of depression. The results are particularly useful for those who might consider using the scale in a clinical trial.
However, we would like to raise a few concerns regarding the psychometric terms and the statistical indices used in the study. First, the authors used "predictive validity" to determine the ability of the Hamilton depression scale to detect change in depression after treatment. However, predictive validity is commonly used to predict future health status or use of health services. For example, Lahey et al. (1) examined the predictive validity of the DSM-IV diagnostic criteria for attention deficit hyperactivity disorder to predict 3-year symptoms and associated impairment. To describe the extent of a scale’s ability to detect change, "responsiveness" is often used in the literature (2).
Second, Pearson’s correlation coefficient (r) is not appropriate to be used to summarize the item-level agreement (i.e., interrater reliability and retest reliability) (3) of the Hamilton depression scale. Pearson’s r examines the level of linear association—but not agreement—between two (continuous) measurements whose distributions are assumed to follow the normal curve. However, the measurement level of each item of the scale is ordinal. Instead, the weighted kappa examines the agreement between ordinal measurements and adjusts for chance agreement and level of agreement and is the appropriate index to be used in this instance (3). The drawbacks of using Pearson’s r in examining item-level reliability should have been noted.
Third, the purposes of the criteria used for appraising reliability (e.g., Cronbach’s alpha ≥0.70 reflecting adequate reliability or Pearson’s r>0.7 indicating acceptable reliability) of the Hamilton depression scale were not clearly specified. The criteria used for appraising reliability in their study are acceptable for research purposes (i.e., for group comparisons) but not for clinical application (i.e., for individual comparisons) (3). For example, if the retest reliability coefficient of a scale is 0.7 (e.g., r=0.7), it means that only 49% of the variance in the data is accounted for (or up to 51% of measurement errors) between test and retest measurement. A higher benchmark (e.g., alpha ≥0.90) for appraising the reliability of a measure is suggested for monitoring an individual’s score (2).
The concerns we raised do not affect the main conclusion of this article. However, they should be clarified for readers.
Download citation file:
Web of Science® Times Cited: 1