OBJECTIVE: This study determined if actors could portray depressed patients to establish the interrater reliability of raters using the Hamilton Depression Rating Scale (HDRS). METHOD: Actors portrayed depressed patients using scripts derived from HDRS assessments obtained at three points during treatment. Four experienced raters blindly viewed videotapes of two patients and two actors. They guessed if each interviewee was a patient or an actor and rated the certainty of their guesses. For each interview, they also rated the realism of the portrayal and completed the HDRS. RESULTS: Experienced raters could not distinguish actors and patients better than chance and were equally certain of their right and wrong guesses. Actors and patients received high scores on the realism of their portrayals. The HDRS scores of the actor-patient pairs were correlated. CONCLUSIONS: Actors can effectively portray depressed patients. Future studies will determine if actors can accurately portray patients with anxiety and psychosis.

In any controlled clinical trial, the validity of the study conclusion is limited by the reliability of the outcomes measured (1). In studies that use more than one rater, it is critically important that all raters throughout the duration of the study similarly use all rating instruments in a reliable way. For continuous measures such as clinical rating scales, interrater reliability can be measured by calculating intraclass correlation coefficients (2). Despite the importance of statistically establishing rater reliability, a review of recent literature suggests that this issue is most often ignored (3).

Problems establishing a pool of reliable raters include the logistical obstacles involved in multicenter trials, which may have as many as 40–50 raters at multiple sites. The initial investigators’ meetings typically do not test reliability among the raters present. To establish reliability, videotapes of actual patients can be mailed to each rater, who would then return both the videotapes and the rating scores to a central coordinator. This is a cumbersome, labor-intensive, and often expensive process. Furthermore, when patients decide that they no longer wish to have their videotaped interviews used for training purposes, it might be difficult to be sure that all videotapes have been returned to ensure privacy.

To address this problem, we have developed a web-based system to train raters in the use of psychiatric scales and to test for interrater reliability within a defined group of raters. The advantage of this system over traditional videotapes is that the video images can be transmitted anywhere electronically without the unwieldy process of sending and receiving videotapes. Furthermore, through interactive technology, the rating scores can be saved online in a centralized database, and interrater reliability can be calculated in real time. This process also minimizes errors due to coding and entry of data from paper-and-pencil entries.

A potential disadvantage of this proposed system is the cost of developing interactive videotapes with actual patients who at any time could withdraw permission to use their interviews. Therefore, we conducted a study to assess the validity of using videotaped interviews of actors portraying depressed patients. Actors simulating patients have been accepted for a variety of training purposes in medical education (4–9). However, this practice has not been widely used in psychiatry. Therefore, the actors’ ability to convey both the verbal and subtle nonverbal cues of a person with a psychiatric illness in the course of an interview needs to be demonstrated. This initial study compares the Hamilton Depression Rating Scale (HDRS) scores of experienced raters blindly rating videotapes of actors and actual patients with various levels of depression.

Method

Two patients with major depression (one older man and one younger woman) had been assessed by using an unpublished semistructured interview for the HDRS. These assessments had been videotaped at the initiation of treatment, during treatment, and after successful completion of treatment. Both patients had provided informed consent for the use of their videotapes for research purposes. The six selected interviews illustrated HDRS scores below 10 (an absence of depression), scores of 11–20 (mild to moderate depression), and scores above 21 (severe depression) (Figure 1). Scripts were generated from these videotapes. In order to create realistic portrayals of different stages of depression, a male actor and a female actor were recruited. These two actors were mental health professionals and had worked for several years with depressed patients. They were trained by viewing the videotaped interviews and by using scripts derived from the interviews. The actors then portrayed the three different HDRS interviews from the same-gender patients. The actors’ portrayals of the interviews were videotaped in the same room and with the same camera as the actual patients’ interviews.

The videotapes of both actors and patients were sent to a collaborating research site at Cornell University to ensure that none of the raters had any prior knowledge of either the actors or the patients. Four experienced HDRS raters at Cornell University with previously established reliability for the HDRS participated in six sessions over 3 weeks. During each session, a rater assessed three videotaped interviews of a single subject (either an actor or a patient) shown without depression, with mild to moderate depression, and with severe depression. The rater was told that the person on the videotape was either a patient or an actor.

To determine to what extent the raters could distinguish the actors and the patients, they were asked to guess whether they had rated a patient or an actor portraying a patient. They scored their certainty of that guess on a 10-point scale. In addition, the raters were asked their opinion of the quality of the subjects’ (actors’ or patients’) portrayals of depressive psychopathology by answering the following question, “Was this subject presenting in a way that was consistent with a depressed person during a course of depression?” on a scale of 0 (not at all) to 10 (very much). Finally, the raters were asked to complete the HDRS for each interview. Correlations were calculated for the HDRS scores of each actor-patient pair, and intraclass correlation coefficients were calculated for the ratings of both the actors and the patients.

Results

Experienced raters correctly identified actors or patients seven (44%) of 16 times or less than what would be expected by chance (i.e., 50%). When raters guessed incorrectly, they were as certain of their guesses as when they guessed correctly. In terms of “presenting in a way that was consistent with…depression” on a 10-point scale, the mean scores were 7.1 (SD=2.4) for the actors and 6.5 (SD=2.4) for the patients.

The scores generated for the actors’ interviews were highly correlated with the scores generated for the patients’ interviews (r=0.99, p<0.001). Intraclass correlation coefficients calculated with the ratings of the actors and the patients were 0.99 for both groups.

Discussion

These results demonstrate the feasibility of using trained actors to portray depressive psychopathology to establish interrater reliability. Therefore, training and testing materials for raters participating in multicenter clinical trials could be developed by using trained actors without the risk of disseminating the clinical information of actual patients. In this study, the two actors had extensive experience in the field of mental health assessments. This experience probably contributed to a more realistic portrayal of depression. This highlights the importance of selecting and training actors appropriately to ensure an even quality in the portrayal of psychopathology, especially if this method becomes common for use in clinical trials.

Received Aug. 29, 2003; revision received Nov. 17, 2003; accepted March 9, 2004. From the Department of Psychiatry and the Katz Graduate School of Business, University of Pittsburgh, Pittsburgh; the Geriatric Research, Education, and Clinical Center, VA Pittsburgh Health Care System, Pittsburgh; Weill Medical College of Cornell University, White Plains, N.Y.; and Fox Learning Systems, Inc., Bridgeville, Pa. Address reprint requests to Dr. Mulsant, 3811 O’Hara St., Pittsburgh, PA 15213; [email protected] (e-mail). Supported in part by grants AG-19088, HS-11976, MH-01613, MH-01634, MH-52247, and MH-61639. Dr. Rosen and Ms. Fox have a commercial interest in Fox Learning Systems, Inc. Drs. Bruce and Mulsant have been consultants for Fox Learning Systems, Inc.

Figure 1. Scatterplot of Hamilton Depression Rating Scale Scores of Actor-Patient Pairs^a
^aFour of the 24 pairs of scores are not visible because of overlapping data.

References

1. Kobak KA, Greist JJ, Jefferson JW, Katzelnick DJ: Computer-administered clinical rating scales: a review. Psychopharmacology (Berl) 1996; 127:291–301Crossref, Medline, Google Scholar

2. Bartko JJ, Carpenter WT: On the methods and theory of reliability. J Nerv Ment Dis 1976; 163:307–317Crossref, Medline, Google Scholar

3. Mulsant BH, Kastango KB, Rosen J, Stone RA, Mazumdar S, Pollock BG: Interrater reliability in clinical trials of depressive disorders. Am J Psychiatry 2002; 159:1598–1600Link, Google Scholar

4. Syder D: The use of simulated clients to develop the clinical skills of speech and language therapy students. Eur J Disord Commun 1996; 31:181–192Crossref, Medline, Google Scholar

5. Peabody JW, Luck J, Glassman P, Dresselhaus TR, Lee M: Comparison of vignettes, standardized patients, and chart abstraction: a prospective validation study of 3 methods for measuring quality. JAMA 2000; 283:1715–1722Crossref, Medline, Google Scholar

6. Hazelkorn HM, Robins LS: Actors play patients: using surrogate patients to look into private practice. Public Health Rep 1996; 111:129–132Medline, Google Scholar

7. Nestel D, Muir E, Plant M, Kidd J, Thurlow S: Modelling the lay expert for first-year medical students: the actor-patient as teacher. Med Teach 2002; 24:562–564Crossref, Medline, Google Scholar

8. Silvestre AJ, Gehl MB, Encandela J, Schelzel G: A participant observation study using actors at 30 publicly funded HIV counseling and testing sites in Pennsylvania. Am J Public Health 2000; 90:1096–1099Crossref, Medline, Google Scholar

9. Loayssa JR, Garcia GM, Diez EJ: [Simulated consultation with actors for teaching clinical interviews.] Aten Primaria 1993; 11:320 (Spanish)Medline, Google Scholar

Volume 161
Issue 10

October 2004
Pages 1909-1911

Metrics

PDF download

History

Published online 22 December 2014

Published in print 1 October 2004

Sign In

Change Password

Your password must have 6 characters or more:

Password Changed Successfully

Create your account

Forget yout Password?

Forgot your Username?

Actors’ Portrayals of Depression to Test Interrater Reliability in Clinical Trials

Abstract

Method

Results

Discussion