0
Get Alert
Please Wait... Processing your request... Please Wait.
You must sign in to sign-up for alerts.

Please confirm that your email address is correct, so you can successfully receive this alert.

1
Letters to the Editor   |    
Standards for DSM-5 Reliability
Robert L. Spitzer, M.D.; Janet B.W. Williams, Ph.D.; Jean Endicott, Ph.D.
Am J Psychiatry 2012;169:537-537. doi:10.1176/appi.ajp.2012.12010083
View Author and Article Information
Princeton, N.J.
New York City

Dr. Spitzer reports no financial relationships with commercial interests. Dr. Williams works for MedAvante, a pharmaceutical services company. Dr. Endicott has received research support from Cyberonics, the New York State Office of Mental Hygiene, and NIH and has served as a consultant or advisory board member for AstraZeneca, Bayer Schering, Berlex, Cyberonics, Eli Lilly, Forest Laboratories, GlaxoSmithKline, Otsuka, Shire, and Wyeth-Ayerst.

Accepted for publication in March 2012.

Copyright © American Psychiatric Association

Accepted March , 2012.

To the Editor: In the January issue of the Journal, Helena Chmura Kraemer, Ph.D., and colleagues (1) ask, in anticipation of the results of the DSM-5 field trial reliability study, how much reliability is reasonable to expect. They argue that standards for interpreting kappa reliability, which have been widely accepted by psychiatric researchers, are unrealistically high. Historically, psychiatric reliability studies have adopted the Fleiss standard, in which kappas below 0.4 have been considered poor (2). Kraemer and colleagues propose that kappas from 0.2 to 0.4 be considered “acceptable.” After reviewing the results of three test-retest studies in different areas of medicine (diagnosis of anemia based on conjunctival inspection, diagnosis of pediatric skin and soft tissue infections, and bimanual pelvic examinations) in which kappas fall within ranges of 0.36–0.60, 0.39–0.43, and 0.07–0.26, respectively, Kraemer et al. conclude that “to see κI for a DSM-5 diagnosis above 0.8 would be almost miraculous; to see κI between 0.6 and 0.8 would be cause for celebration.” Therefore, they note that for psychiatric diagnoses, “a realistic goal is κI between 0.4 and 0.6, while κI between 0.2 and 0.4 would be acceptable.”

When we (R.L.S., J.B.W.W.) conducted the DSM-III field trial, following the Fleiss standard, we considered kappas above 0.7 to be “good agreement as to whether or not the patient has a disorder within that diagnostic class” (3). According to the Kraemer et al. commentary, the DSM-III field trial results should be cause for celebration: the overall kappa for axis I disorders in the test-retest cohort (the one most comparable methodologically to the DSM-5 sample) was 0.66 (3). Therefore, test-retest diagnostic reliability of at least 0.6 is achievable by clinicians in a real-world practice setting, and any results below that standard are a cause for concern.

Kraemer and colleagues' central argument for these diagnostic reliability standards is to ensure that “our expectations of DSM-5 diagnoses…not be set unrealistically high, exceeding the standards that pertain to the rest of medicine.” Although the few cited test-retest studies have kappas averaging around 0.4, it is misleading to depict these as the “standards” of what is acceptable reliability in medicine. For example, the authors of the pediatric skin lesion study (4) characterized their measured test-retest reliability of 0.39–0.43 as “poor.” Calling for psychiatry to accept kappa values that are characterized as unreliable in other fields of medicine is taking a step backward. One hopes that the DSM-5 reliability results are at least as good as the DSM-III results, if not better.

Kraemer  HC;  Kupfer  DJ;  Clarke  DE;  Narrow  WE;  Regier  DA:  DSM-5: how reliable is reliable enough?  Am J Psychiatry 2012; 169:13–15
[CrossRef] | [PubMed]
 
Fleiss  J:  Statistical Methods for Rates and Proportions , 2nd ed.  New York,  Wiley, 1981
 
Spitzer  R;  Forman  J;  Nee  J:  DSM-III field trials, I: initial interrater diagnostic reliability.  Am J Psychiatry 1979; 136:815–817
[PubMed]
 
Marin  JR;  Bilker  W;  Lautenbach  E;  Alpern  ER:  Reliability of clinical examinations for pediatric skin and soft-tissue infections.  Pediatrics 2010; 126:925–930
[CrossRef] | [PubMed]
 
References Container
+

References

Kraemer  HC;  Kupfer  DJ;  Clarke  DE;  Narrow  WE;  Regier  DA:  DSM-5: how reliable is reliable enough?  Am J Psychiatry 2012; 169:13–15
[CrossRef] | [PubMed]
 
Fleiss  J:  Statistical Methods for Rates and Proportions , 2nd ed.  New York,  Wiley, 1981
 
Spitzer  R;  Forman  J;  Nee  J:  DSM-III field trials, I: initial interrater diagnostic reliability.  Am J Psychiatry 1979; 136:815–817
[PubMed]
 
Marin  JR;  Bilker  W;  Lautenbach  E;  Alpern  ER:  Reliability of clinical examinations for pediatric skin and soft-tissue infections.  Pediatrics 2010; 126:925–930
[CrossRef] | [PubMed]
 
References Container
+
+

CME Activity

There is currently no quiz available for this resource. Please click here to go to the CME page to find another.
Submit a Comments
Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
Comments are moderated and will appear on the site at the discertion of APA editorial staff.

* = Required Field
(if multiple authors, separate names by comma)
Example: John Doe



Web of Science® Times Cited: 5

Related Content
Books
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 45.  >
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 45.  >
The American Psychiatric Publishing Textbook of Geriatric Psychiatry, 4th Edition > Chapter 9.  >
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 34.  >
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 56.  >
Topic Collections
Psychiatric News