Interview Quality and Signal Detection in Clinical Trials
To the Editor: The quality of assessments in clinical trials is an important methodological variable that is often overlooked. Few trials examine raters’ applied clinical skills, and no studies, to our knowledge, have examined the impact of interview quality on signal detection. Thus, the question as to whether patients who receive high-quality clinical interviews are more or less likely to separate active drug from placebo has not been empirically examined.
Data were obtained from all raters (N=34) conducting outcome assessments in a phase II multicenter (N=20) antidepressant trial. All baseline Hamilton Depression Rating Scale interviews were audiotaped as part of an ongoing quality-control effort. A random sample of 25% (N=56) of the baseline audiotapes was reviewed by one of four external reviewers. Interviews were evaluated for interview quality along four dimensions with the Rater Applied Performance Scale (1): adherence to interview guidelines, use of appropriate follow-up questions, use of questions to clarify ambiguous information, and neutrality, i.e., avoiding leading questions that direct the patient toward specific responses. Each dimension was rated “unsatisfactory,” “fair,” “good,” or “excellent.” To protect the confidentiality of the study sponsor, only the active comparator (paroxetine) (N=109) and placebo (N=107) cells of the trial were made available for analysis.
Overall, paroxetine failed to distinguish itself from placebo (change in paroxetine: mean=9.72, change in placebo: mean=9.22, difference: mean=0.5) (t=0.51, df=214, p=0.61). However, subjects whose interviews were rated “good” or “excellent” did achieve significant drug-placebo separation (change in paroxetine: mean=11.61, change in placebo: mean=4.78, difference: mean=6.83) (t=2.61, df=20, p<0.02). The subjects with a mean baseline Rater Applied Performance Scale interview of “fair” or “unsatisfactory” failed to distinguish themselves from placebo (change in paroxetine: mean=7.56, change in placebo: mean=10.44, difference: mean=–2.88) (t=–1.13, df=32, p=0.27). The difference in drug-placebo separation for good ratings (6.84) was significantly greater than the drug-placebo separation for bad ratings (–2.88) (F=6.46, df=1, 52, p<0.02).
Interview quality had a profound impact on signal detection in this study. This points to the need for increased attention to the rater’s applied clinical skills in clinical trials. Training programs targeting applied clinical skills can be effective if sufficient time is devoted to this endeavor. Use of new technologies to train raters remotely using videoconferencing has been empirically shown to improve both didactic and applied skills (2). Rater training and rater certification (including applied skills), in addition to ongoing monitoring for rater quality, need to become standardized parts of clinical trial methodology (3).
1. Lipsitz J, Kobak K, Feiger A, Sikich D, Moroz G, Engelhardt A: The Rater Applied Performance Scale (RAPS): development and reliability. Psychiatry Res 2004; 127:147–155Crossref, Medline, Google Scholar
2. Kobak KA, Lipsitz JD, Feiger AD: Development of a standardized training program for the Hamilton Depression Scale using Internet-based technologies: results from a pilot study. J Psychiatr Res 2003; 37:509–515Crossref, Medline, Google Scholar
3. Kobak KA, Engelhardt N, Williams JB, Lipsitz JD: Rater training in multicenter clinical trials: issues and recommendations. J Clin Psychopharmacol 2004; 24:113–117Crossref, Medline, Google Scholar