0
Get Alert
Please Wait... Processing your request... Please Wait.
You must sign in to sign-up for alerts.

Please confirm that your email address is correct, so you can successfully receive this alert.

1
Letters to the Editor   |    
Response to Spitzer et al. Letter
Helena Chmura Kraemer, Ph.D.; David J. Kupfer, M.D.; Diana E. Clarke, Ph.D.; William E. Narrow, M.D., M.P.H.; Darrel A. Regier, M.D., M.P.H.
Am J Psychiatry 2012;169:537-538. doi:10.1176/appi.ajp.2012.12010083r
View Author and Article Information
Palo Alto, Calif.
Pittsburgh
Arlington, Va.

The authors' disclosures accompany the original commentary.

Accepted for publication in March 2012.

Copyright © American Psychiatric Association

Accepted March , 2012.

To the Editor: Homage must be paid to the DSM-III field trials (1) that strongly influenced the design of the DSM-5 field trials. It could hardly be otherwise, since methods for evaluating categorical diagnoses were developed for DSM-III by Dr. Spitzer and his colleagues, Drs. Fleiss and Cohen. However, in the 30 years after 1979, the methodology and the understanding of kappa have advanced (2), and DSM-5 reflects that as well.

Like DSM-III, DSM-5 field trials sampled typical clinic patients. However, in the DSM-III field trials, participating clinicians were allowed to select the patients to evaluate and were trusted to report all results. In the DSM-5 field trials, symptomatic patients at each site were referred to a research associate for consent, assigned to an appropriate stratum, and randomly assigned to two participating clinicians for evaluation, with electronic data entry. In DSM-III field trials, the necessary independence of the two clinicians evaluating each patient was taken on trust. Stronger blinding protections were implemented in the DSM-5 field trials. Selection bias and lack of blindness tend to inflate kappas.

The sample sizes used in DSM-III, by current standards, were small. There appear to be only three diagnoses for which 25 or more cases were seen: any axis II personality disorder (kappa=0.54), all affective disorders (kappa=0.59), and the subcategory of major affective disorders (kappa=0.65). Four kappas of 1.00 were reported, each based on three or fewer cases; two kappas below zero were also reported based on 0–1 cases. In the absence of confidence intervals, other kappas may have been badly under- or overestimated. Since the kappas differ from one diagnosis to another, the overall kappa cited is uninterpretable (1).

Standards reflect not what we hope ideally to achieve but what the reliabilities are of diagnoses that are actually useful in practice. Recognizing the possible inflation in DSM-III and DSM-IV results, DSM-5 did not base its standards for kappa entirely on their findings. Fleiss articulated his standards before 1979 when there was little experience using kappa. Are the experience-based standards (3) we proposed unreasonable? There seems to be major disagreement only about kappas between 0.2 and 0.4. We indicated that such kappas might be acceptable with low-prevalence disorders, where a small amount of random error can overwhelm a weak signal. Higher kappas may, in such cases, be achievable only in the following cases: when we do longitudinal follow-up, not with a single interview; when we use unknown biological markers; when we use specialists in that particular disorder; when we deal more effectively with comorbidity; and when we accept that “one size does not fit all” and develop personalized diagnostic procedures.

Greater validity may be achievable only with a small decrease in reliability. The goal of DSM-5 is to maintain acceptable reliability while increasing validity based on the accumulated research and clinical experience since DSM-IV. The goal of the DSM-5 field trials is to present accurate and precise estimates of reliability when used for real patients in real clinics by real clinicians trained in DSM-5 criteria.

Spitzer  RL;  Forman  JBW;  Nee  J:  DSM-III field trials, I: initial interrater diagnostic reliability.  Am J Psychiatry 1979; 136:815–817
[PubMed]
 
Kraemer  HC:  Evaluating Medical Tests: Objective and Quantitative Guidelines .  Newbury Park, Calif,  Sage Publications, 1992
 
Kraemer  HC;  Kupfer  DJ;  Clarke  DE;  Narrow  WE;  Regier  DA:  DSM-5: how reliable is reliable enough? Am J Psychiatry 2012; 169:13–15
[CrossRef] | [PubMed]
 
References Container
+

References

Spitzer  RL;  Forman  JBW;  Nee  J:  DSM-III field trials, I: initial interrater diagnostic reliability.  Am J Psychiatry 1979; 136:815–817
[PubMed]
 
Kraemer  HC:  Evaluating Medical Tests: Objective and Quantitative Guidelines .  Newbury Park, Calif,  Sage Publications, 1992
 
Kraemer  HC;  Kupfer  DJ;  Clarke  DE;  Narrow  WE;  Regier  DA:  DSM-5: how reliable is reliable enough? Am J Psychiatry 2012; 169:13–15
[CrossRef] | [PubMed]
 
References Container
+
+

CME Activity

There is currently no quiz available for this resource. Please click here to go to the CME page to find another.
Submit a Comments
Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
Comments are moderated and will appear on the site at the discertion of APA editorial staff.

* = Required Field
(if multiple authors, separate names by comma)
Example: John Doe



Related Content
Books
Manual of Clinical Psychopharmacology, 7th Edition > Chapter 2.  >
Manual of Clinical Psychopharmacology, 7th Edition > Chapter 2.  >
Manual of Clinical Psychopharmacology, 7th Edition > Chapter 2.  >
Manual of Clinical Psychopharmacology, 7th Edition > Chapter 2.  >
Manual of Clinical Psychopharmacology, 7th Edition > Chapter 2.  >
Topic Collections
Psychiatric News
Read more at Psychiatric News >>
PubMed Articles