The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×

To the Editor: We agree in part with Dr. Janca that our very high levels of interrater reliability regarding the DSM-IV axis V clinician rating scales may have been influenced by extensive training, high motivation on the part of the clinicians, and the clinicians’ working within a larger research protocol. Also, it is fairly common that interrater reliability for a variety of clinical conditions or constructs is higher between raters at the same site than for raters across sites (1, 2). However, our results are quite similar to those from a number of other studies involving the Global Assessment of Functioning Scale and its predecessor, the Global Assessment Scale (310). This prior research has demonstrated the interrater reliability of the Global Assessment of Functioning Scale as in the “good” or “excellent” range (ICC/k=0.60–0.74 and ICC/k=0.75 or more, respectively [11]). In addition, the WHO Short Disability Assessment Schedule, which possesses subcomponents similar to those in the Global Assessment of Relational Functioning Scale and the Social and Occupational Functioning Assessment Scale, has been shown to possess “good” interrater reliability (ICC=0.62) in at least one multisite field trial (12).

Furthermore, we disagree with Dr. Janca’s conclusion that our findings may not represent a true psychometric evaluation of these scales. We base this disagreement on three potentially related issues for further research in the assessment of multiaxial psychiatric functioning. Our discussion of these issues is particularly relevant to the rating of patient-clinician interactions and interview narratives in psychology and psychiatry.

First, the high level of agreement between the two raters in our study of the DSM-IV axis V scales suggests that these measures may be used to reliably rate the general severity of psychopathology and relational, social, and occupational functioning. The specific rating criteria developed for the DSM-IV axis V scales appear sufficiently clear to produce high levels of interrater reliability. The extensive supervised training of raters in the use of this scale likely contributed to the high level of agreement between raters. The low interrater reliability coefficients for the DSM-IV axis V scales found in other studies may not be assumed to reflect poor coding criteria or scale definition but rather may be due to poor or inadequate rater training.

While time constraints may prohibit such extensive training, it provides an optimal level of familiarity with the DSM-IV axis V scales and helps raters make subtle distinctions between scores before rating the patients included in the data analyses. The excellent interrater reliability coefficients achieved in our study suggested that the general severity of psychopathology and relational, social, and occupational functioning can be reliably coded and suggested the importance of training judges before coding begins.

Second, we encourage future investigators to examine the differential impact of the time or length of the interview in relation to reliability. The length of interviews used in most reliability field trials usually ranges from approximately 45 minutes up to 2 hours. The ratings from our original study were based on two sessions, each lasting approximately 3 hours. The higher levels of interrater reliability that were found in our work may be related to the clinician’s spending this additional time interacting with the patient. The implications of time or length of interviews on reliability have rarely been discussed in the psychiatric literature, and given the current impingement of third-party payers and the reduced support for more thorough evaluations (13, 14), this seems an especially important issue. If clinicians are unduly limited in the time spent on an assessment, then less reliability, misdiagnosis, and potential problems for treatment may result.

In addition to including extra time spent by the clinicians, both in training and in interacting with their patients, our study also focused parts of the interview on key relational episodes from patients’ lives. This focus on patient narratives during the interview (15), as well as the organization of the interview and feedback session from a therapeutic assessment model (16), may have contributed to the higher reliability of the interview or videotape raters. Rather than focusing simply on the description of psychiatric symptoms or on a structured interview (i.e., the Structured Clinical Interview for DSM), the patients were encouraged to describe and explore relational interactions (thoughts, feelings, and fantasies) associated with the appearance of symptoms. In this manner, the clinicians attempted to enlist the patients to help them clarify and understand the impact of these experiences, both past and present, on their functioning. This relationally based exploration was focused on helping clinicians gain a better understanding of the personal meaning of life experiences related to psychiatric symptoms as well as explore prior successful and unsuccessful ways of coping with problems or symptoms.

The amount of prerequisite training on any scale applied to interview data (or any patient-clinician interaction) will invariably affect the subsequent reliability of that scale or measure. It is also possible that additional time spent and/or the relational focus of an interview can aid clinicians in making more reliable assessments of the general severity of psychopathology and relational, social, and occupational functioning. Perhaps when examining a patient’s general severity of psychopathology and relational, social, and occupational functioning, clinicians should be aided by first training to meet an acceptable criterion for accuracy on a given scale, spending additional time with the patient, and then examining psychiatric symptoms and relational, social, and occupational functioning within an interpersonal and narrative context. In contrast, when adequate prerequisite training, involved patient-clinician interaction, and exploration of functioning within a relational context are not present, the true psychometric properties of any clinician rating scale may be underestimated.

References

1. Keller MB, Klein DN, Hirschfeld RM, Kocsis JH, McCullough JP, Miller I, First MB, Holzer CP III, Keitner GI, Marin DB, Shea T: Results of the DSM-IV Mood Disorders Field Trial. Am J Psychiatry 1995; 152:843-849LinkGoogle Scholar

2. Perry JC, Hoglend P, Shear K, Vaillant GE, Horowitz M, Kardos ME, Bille H, Kagan D: Field trial of a diagnostic axis for defense mechanisms for DSM-IV. J Personal Disord 1998; 12:56-68Crossref, MedlineGoogle Scholar

3. Endicott J, Spitzer RL, Fleiss JL, Cohen J: The Global Assessment Scale: a procedure for measuring overall severity of psychiatric disturbance. Arch Gen Psychiatry 1976; 33:766-771Crossref, MedlineGoogle Scholar

4. Spitzer RL, Forman JB: DSM-III field trials, II: initial experience with the multiaxial system. Am J Psychiatry 1979; 136:818-820LinkGoogle Scholar

5. Strakowski SM, Keck PE Jr, McElroy SL, West SA, Sax KW, Hawkins JM, Kmetz GF, Upadhyaya VH, Tugrul KC, Bourne ML: Twelve-month outcome after a first hospitalization for affective psychosis. Arch Gen Psychiatry 1998; 55:49-55Crossref, MedlineGoogle Scholar

6. Hollon SD, DeRubeis RJ, Evans MD, Wiemer MJ, Garvey MJ, Grove WM, Tuason VB: Cognitive therapy and pharmacotherapy for depression: singly and in combination. Arch Gen Psychiatry 1992; 49:774-781Crossref, MedlineGoogle Scholar

7. Hoglend P: Transference interpretations and long-term change after dynamic psychotherapy of brief to moderate length. Am J Psychother 1993; 47:494-507Crossref, MedlineGoogle Scholar

8. Hooley JM, Hoffman PD: Expressed emotion and clinical outcome in borderline personality disorder. Am J Psychiatry 1999; 156:1557-1562LinkGoogle Scholar

9. Durbin CE, Klein DN, Schwartz JE: Predicting the 2 ≡-year outcome of dysthymic disorder: the roles of childhood adversity and family history of psychopathology. J Consult Clin Psychol 2000; 68:57-63Crossref, MedlineGoogle Scholar

10. Williams JBW, Gibbon M, First MB, Spitzer RL, Davies M, Borus J, Howes MJ, Kane J, Pope HG, Rounsaville B, Wittchen H-U: The Structured Clinical Interview for DSM-III-R (SCID), II: multisite test-retest reliability. Arch Gen Psychiatry 1992; 49:630-636Crossref, MedlineGoogle Scholar

11. Fleiss J: Statistical Methods for Rates and Proportions, 2nd ed. New York, Wiley, 1981Google Scholar

12. Michels R, Siebel U, Freyberger HJ, Stieglitz RD, Schaub RT, Dilling H: The multiaxial system of ICD-10: evaluation of a preliminary draft in a multicentric field trial. Psychopathology 1996; 29:347-356Crossref, MedlineGoogle Scholar

13. Eisman E, Dies R, Finn SE, Eyde L, Kay GG, Kubiszyn T, Meyer GJ, Moreland K: Problems and limitations in the use of psychological assessment in contemporary healthcare delivery. Professional Psychol: Res Practice 2000; 31:131-140CrossrefGoogle Scholar

14. Piotrowski C: Assessment practices in the era of managed care: current status and future directions. J Clin Psychol 1999; 55:787-796Crossref, MedlineGoogle Scholar

15. Westen D: Divergences between clinical and research methods for assessing personality disorders: implications for research and the evolution of axis II. Am J Psychiatry 1997; 154:895-903LinkGoogle Scholar

16. Finn SE, Tonsager M: Information-gathering and therapeutic models of assessment: complementary paradigms. Psychol Assess 1997; 19:374-385CrossrefGoogle Scholar