The Diagnostic and Statistical Manual of Mental Disorders (DSM ) is under revision. One proposal for the pending DSM-V is dimensionalizing personality disorders, and the Five-Factor Model (FFM [2, 3]) has received the most attention, either as a supplement or replacement for axis II. Whereas the DSM-IV classifies maladaptive personality with 10 discrete disorders defined by unique criteria, the FFM describes personality in a continuous manner along 30 traits (facets) grouped into five factors (Figure 1) identified as reflecting the bulk of the variance among personalities (4–6). The FFM is a promising candidate for the DSM-V because it has been shown to be biologically based, universal, temporally stable, and can avoid problems with the DSM-IV axis II categories including high comorbidity and arbitrary diagnostic thresholds (7, 8).
However, one significant issue seldom examined is whether the FFM will be clinically useful. Clinical utility means the extent to which a diagnostic system assists clinicians in fulfilling key clinical functions, including making treatment plans and prognoses, communicating with patients or other clinicians, and describing a patient’s global personality or important personality problems (9, 10). The current study investigates a potential challenge the FFM may encounter with respect to its clinical utility.
The FFM proposal for psychopathology is to score a person with potential personality problems on each of the 30 facets from low to high (2) as shown in the first column of Figure 1. That is, the FFM uses the same descriptors to profile all cases and all types of personality. However, descriptors general enough to apply to many categories are inherently ambiguous. For instance, a low score on the “gregariousness” facet can mean paranoid fears (as in paranoid personality disorder), fear of not being liked by others (avoidant), or indifference to others (schizoid) (11–13). A high score on “anger” can mean temper tantrums (histrionic) or lack of control over anger (borderline) (14). Indeed, research in cognitive science (15–19) has demonstrated that the meanings of descriptors are relative to the categories they describe (e.g., large molecule versus large mountain; open hand versus open bottle; strong woman versus strong man), and thus a modifier without any category information can be ambiguous. The DSM diagnostic criteria are less likely to suffer from this problem because the descriptors are specific and framed in the context of a diagnosis. We suggest, however, that FFM profiles without a diagnosis may not be specific enough to convey subtle but important clinical information.
In the current study, we attempt to demonstrate the ambiguity of FFM descriptors by having clinicians provide DSM-IV personality disorder diagnoses based on FFM descriptions alone. For instance, clinicians received an FFM description like the one shown in Figure 1 as a description of a hypothetical patient, and made DSM-IV diagnoses based only on that information. Previous studies (20–22) showed that clinicians could translate DSM-IV personality disorders into FFM ratings with high interrater reliability (e.g., a prototype of avoidant personality disorder is agreed to be low on “gregariousness”). However, if FFM descriptors are ambiguous to clinicians, back-translating an FFM profile into a DSM-IV diagnosis should be difficult because it would be a many-to-one mapping. For instance, one needs to choose one specific meaning from many possible meanings of low “gregariousness” (e.g., paranoid fears or indifference to others) to make a DSM-IV diagnosis. Thus, difficulty in back-translating can serve as a demonstration of the ambiguity in FFM descriptions.
We also hypothesize that if the FFM traits alone are not specific enough to convey clinically important distinctions, clinicians might feel that the FFM’s clinical utility is low. Following First et al.’s initial proposal (9), we also asked clinicians to rate the FFM on measures of clinical utility.
Only a few studies have tested the clinical utility of the FFM and the results are mixed. The general procedure used in this past research was to have clinicians consider a patient, make either a DSM-IV or FFM assessment, and rate the clinical utility of the assessment system. However, the specific methods differed with respect to the level of detail with which clinicians processed each system. Sprock (22) had clinicians assess case vignettes on the five broad factors of the FFM and found that they judged the FFM as less useful than the DSM-IV. But when Samuel and Widiger (23) had clinicians assess case vignettes on the 30 facets of the FFM, requiring more detailed processing of the FFM, they judged the FFM as more useful than the DSM-IV. In a recent study by Spitzer et al. (24), clinicians had to process the DSM-IV in much greater detail than in the previous studies; they read through all the diagnostic criteria of the DSM-IV personality disorders as part of the DSM-IV assessment. The results showed that their clinicians judged the DSM-IV as more useful than the FFM. Thus, past results taken together suggest that clinicians gave higher clinical utility judgments when they processed information in a more detailed way during assessment. This pattern is consistent with our hypothesis that the specificity of descriptors, which could be influenced by more detailed processing of patient information, can affect clinical use. Of interest, Spitzer et al. (24) also found the FFM’s utility to be lower than that of the Shedler-Westen Assessment Procedure (SWAP-200 [25–26]; see Figure 1). This finding is also consistent with our hypothesis because SWAP uses 200 concrete descriptors, only some of which describe any given case, rather than applying the same set of a limited number of traits to all cases.
Although previous studies provide suggestive evidence in support of our hypothesis, the current study more directly examines how ambiguities in patient descriptions may lower clinical use of a diagnostic system. In addition to back-translating FFM descriptions into DSM-IV diagnoses, our clinician participants rated the clinical utility of the FFM descriptions presented as profiles of hypothetical patients without other information about the patients. This method differs from the previous studies (22–24), in which clinicians considered either a vignette or one of the clinician’s actual patients before assessing utility, which could have disambiguated the meanings of the FFM descriptors. We predict that when an FFM description is presented alone without any specific context to disambiguate the description, clinicians would judge the clinical utility of the FFM to be low.
To summarize, we propose that the FFM descriptors may be too ambiguous to capture clinically important but subtle information. To test this proposal, we examine whether FFM descriptions alone are specific enough to allow clinicians to recognize known DSM-IV personality disorders, and whether ambiguities in FFM descriptors result in lower clinical utility of the FFM.
Two studies are reported. The first study examined cases of a single DSM-IV personality disorder (prototypic). The second study examined cases with multiple personality disorders (comorbid). The methods of Study 1 and Study 2 are presented next, followed by the results of both studies. An integrated discussion follows after the methods and results of both studies.
Study 1: Prototypic Cases
Psychiatrists identified as psychotherapists by the APA, practicing psychologists (Ph.D.s or Psy.D.s) from the American Psychological Association, and social workers from the 2005 Register of Clinical Social Workers (27) were recruited by mail. Fifty-eight psychiatrists, 64 psychologists, and 65 social workers participated for a response rate of 12%, 26%, and 17%, respectively. The experiment took 21 minutes on average, and participants were compensated with a $30.00 gift certificate to an online retailer. After presenting a complete description of the study, informed consent was obtained.
There were three conditions, which described prototypic cases of the 10 DSM-IV personality disorders in the FFM, DSM, or SWAP style.
The materials for the FFM condition were derived from a previous study (20) in which experienced clinicians thought about a prototypic case of one of the 10 DSM-IV personality disorders and rated it on the 30 FFM facets. For instance, a clinician was asked to consider the most prototypic case of borderline personality disorder and to rate the extent to which the patient is neurotic, etc. To make the FFM prototypes as easy to interpret as possible, we created a graphic (Figure 1) to display the 30 facet scores with a couple of low and high facet adjectives previously used (23).
The DSM condition was included to ensure that any difficulty participants may have in providing diagnoses from FFM profiles was not due to a lack of background knowledge about the personality disorders. This condition used DSM prototypes, each of which listed all of the DSM-IV-TR diagnostic criteria for each disorder (e.g., Figure 1). If participants diagnose cases presented in the DSM-IV format with high accuracy, we can be reasonably assured that they have good knowledge of the DSM-IV.
The SWAP condition was included as an additional contrast due to differences in the length of DSM and FFM profiles. SWAP profiles are as long as FFM but the descriptors are less ambiguous, like the DSM-IV (e.g., tends to react to criticism with feelings of rage or humiliation). SWAP prototypes were taken directly from a previous study (25), in which clinicians identified which SWAP items were most descriptive of the prototypic cases of the 10 DSM-IV personality disorders (e.g., Figure 1).
The 10 personality disorders were divided into three sets, each including one disorder from the three DSM-IV-TR clusters, except the third set, which included two disorders from cluster B. Each participant received one of the three sets. In each set, one of the cases was presented in each of the DSM, FFM, and SWAP styles. Overall, the design ensured that each participant saw at least one case in each of the three conditions, and that the order of the three conditions and the disorders within a given set were counterbalanced.
The study was performed online. Participants were told that they would be presented with descriptions of adult patients and were asked to imagine that these patients were referred to them along with a patient description from a previous consultation. Participants were told that the patients “do not have schizophrenia or any other psychotic disorder, and their symptoms do not occur due to the direct effect of any general medical condition.” This instruction was included so that participants would not avoid giving personality disorder diagnoses (e.g., a schizoid personality disorder diagnosis is not allowed if it occurs exclusively during the course of schizophrenia). Finally, participants were instructed not to consult the DSM.
Next, participants saw three (or four) patient descriptions in the FFM, DSM, or SWAP style (e.g., Figure 1). For each description, participants were asked to “provide any DSM-IV diagnoses you believe this patient to have.” Participants also rated the utility of the system with six questions on a five-point scale from not at all, slightly, moderately, very, and extremely. The six questions were the following:
How informative is this description in making a prognosis for this person?
How informative is this description in devising treatment plans for this person?
How useful do you feel the system used to describe this person would be for communicating information about this individual with other mental health professionals?
How useful do you feel the system used to describe this person would be for communicating information about the individual to him or herself?
How useful is the system used to describe this person for comprehensively describing all the important personality problems this individual has?
How useful was the system used to describe this person for describing the individual’s global personality?
The order of the diagnosis question and the utility ratings was counterbalanced across participants and presented on different website pages with the patient profile still visible. Finally, participants provided demographic information and familiarity with the diagnostic systems (1: “not at all familiar” to 7: “extremely familiar”). The study was approved by the Yale University Institutional Review Board. The results of Study 1 appear in the section titled “Results, Study 1: Prototypic Cases” on the next page.
Sixty-six psychiatrists, 58 psychologists, and 67 social workers recruited from the same sources as study 1 completed study 2 (response rates of 10%, 16%, and 12%, respectively). The experiment took 21 minutes on average and participants received a $30.00 gift certificate to an online retailer.
Comorbid cases were used, as they are considered a more accurate test of real-world patients (28, 29). The materials were developed based on three cases (Earnest, Madeline, and Ted) from Samuel and Widiger (23) in which participants rated the FFM as more useful than the DSM-IV. Two conditions described the cases in either the FFM or DSM style.
The FFM condition used the clinicians’ average FFM facet ratings on the three cases obtained by Samuel and Widiger (23). No case vignettes were presented.
For the DSM condition, pretesting was necessary to empirically develop symptom-level DSM descriptions of the cases (Figure 1). We asked 29 clinicians to rate on a five-point scale the presence or absence of each DSM-IV diagnostic criterion for all 10 personality disorders in each of the three cases. Using these ratings, we chose a cutoff such that our DSM descriptions contained enough symptoms to match Samuel and Widiger’s (23) participants’ consensus DSM-IV diagnoses (Earnest: avoidant and schizoid; Madeline: narcissistic, histrionic, and borderline; Ted: antisocial and narcissistic) as closely as possible (i.e., including enough symptoms to reach the threshold of only the consensus diagnoses, and not any other diagnoses) so that the two results are comparable. A few symptoms with high ratings from other diagnoses were also included. (See data supplement Figures 1a-c for the DSM and FFM profiles and results broken down by the three cases.)
The SWAP was not included in study 2 because comorbid case profiles in the SWAP format have not been externally verified like the FFM profiles.
After giving consent, participants saw all three cases in either the DSM condition (N=95) or the FFM condition (N=96). The presentation order of the cases was counterbalanced using a Latin square design. The procedure was the same as the first study except that the diagnosis and utility ratings were performed on one web page, again in counterbalanced order. The study was approved by the Yale Institutional Review Board.
Study 1: Prototypic Cases
(See data supplement Table 1 available at http://ajp.psychiatryonline.org for details.) Respondents for Study 1 had spent on average 20 years (SD=9) in clinical practice, worked with patients an average of 32 hours (SD=12) weekly, 11 of those hours (SD=8) with patients with personality disorders. As expected, participants were more familiar with the DSM-IV (mean=5.69, SD=1.26) than FFM (mean=2.17, SD=1.65) (t=25.49, df=180, p<0.01) and more familiar with FFM than SWAP (mean=1.18, SD=0.70) (t=8.17, df=179, p<0.01). Yet, all results reported in both studies correlated only weakly with familiarity of the respective model (all r<0.20).
For prototypic cases, participants gave correct DSM-IV diagnoses much more frequently for DSM (82.4%) and SWAP (75.9%) than for FFM (47.1%), (Figure 2). McNemar tests showed that participants gave significantly more correct diagnoses for both the DSM (χ2=44.94, df=1, N=187, p<0.01) and SWAP (χ2=36.96, df=1, N=187, p<0.01) than FFM, but DSM and SWAP did not differ (χ2=2.01, df=1, N=187, p=0.15).
Incorrect diagnoses were defined as any axis I, II, or higher-order diagnosis mismatching the correct diagnosis, and any non-DSM-IV diagnosis. Participants gave significantly more incorrect diagnoses for the FFM (mean=1.13, SD=1.08) than either DSM (mean=0.50, SD=0.84) (t=7.52, df=186, p<0.01) or SWAP (mean=0.67, SD=0.94) (t=5.33, df=186, p<0.01) and more incorrect diagnoses for SWAP than DSM (t=2.20, df=186, p=0.03) (Figure 3). As the goal of SWAP is to define new diagnostic criteria that do not necessarily map onto existing DSM-IV categories, this finding is not unexpected. Other methods of counting correct or incorrect diagnoses (not counting features or traits, or counting “sociopath” for antisocial, “cluster A” for paranoid) did not change the main results.
For prototypic cases, paired t tests showed that for each of the utility measures, participants rated SWAP most useful, then DSM, and finally FFM (Figure 4; all ps<0.01), except that DSM and SWAP did not differ for making a prognosis, p=0.16; and DSM and FFM did not differ for communicating with patients, p=0.35. These results largely replicate those of Spitzer et al. (24). When only looking at conditions presented first to participants, the same general pattern of results held with two exceptions: FFM was higher than DSM for communicating with patients, p=0.02, and not significantly higher than DSM on describing global personality, p=0.27.
The general pattern of results was consistent when broken down by profession and disorder (data supplement Tables 2 and 3 show mean correct/incorrect diagnoses by disorder). Conclusions from these results are discussed in the general discussion.
Respondents for Study 2 had spent on average 20 years (SD=9) in clinical practice, and worked with patients 34 hours per week (SD=13), 12 of those hours (SD=10) with patients with personality disorders. (See also data supplement Table 1.) Participants were more familiar with the DSM-IV (mean=5.48, SD=1.40) than the FFM (mean=2.01, SD=1.49) (t=25.64, df=189, p<0.01). Participants in the DSM condition also reported being slightly more familiar with the DSM-IV (mean=5.81, SD=1.18) compared to participants in the FFM condition (mean=5.16, SD=1.53) (t=3.23, df=188, p<0.01). Analyses using familiarity with the FFM as a covariate yielded the same conclusions as the results presented below (see data supplement).
For comorbid cases, we used clinicians’ consensus DSM-IV diagnoses from Samuel and Widiger (23) as correct diagnoses. For each participant, we identified the percentage of correct diagnoses per case (e.g., 50% if one of two correct diagnoses was provided) and averaged across the three comorbid cases. The overall accuracy score was almost three times higher in the DSM condition (mean=60%, SD=0.23) than FFM (mean=21%, SD=0.21) (t=12.03, df=189, p<0.01) (see Figure 2).
Participants gave significantly more incorrect diagnoses (averaged across the three cases) in the FFM condition (mean=0.99, SD=0.70) than DSM (mean=0.59, SD=0.51) (t=4.50, df=189, p<0.01) (see Figure 3).
As with the prototypic cases, participants rated the DSM as more useful than FFM for five of the six utility questions for the comorbid cases (all ps<0.05, see Figure 5). Participants rated the FFM as marginally more useful than the DSM for communicating with patients (t=1.92, df=189, p=0.06).
We found that clinicians were largely unable to back-translate prototypic and comorbid FFM profiles of cases into DSM-IV diagnoses, despite being able to recognize the DSM-IV disorders in the DSM condition. These results suggest that the FFM descriptors are ambiguous to clinicians without additional contextual information, and that the FFM may be less able to convey important clinical details than the DSM-IV.
Previous studies have demonstrated that DSM personality disorder concepts can be reliably translated into FFM descriptions (20–22). However, those studies did not assess whether clinicians can use their existing concepts of disorders when thinking about an FFM profile. In the current study, practicing clinicians had difficulty recognizing even prototypic personality disorder cases when presented in the FFM style alone. Although these prior studies have also shown that statistical techniques can produce a DSM-IV diagnosis from an FFM profile (see Clark  for a summary), such findings do not address the difficulty practicing clinicians may have forming a coherent image of an FFM profile, as suggested by the present results. This is an important aspect of clinical utility.
We also found that participants judged the FFM to be less clinically useful than the DSM-IV. In past research on clinical utility (22–24), clinicians read or thought about a concrete patient case, potentially disambiguating the meaning of the FFM traits. In contrast, because the current studies presented case profiles using only the information contained within the DSM-IV or FFM descriptions, our studies are able to assess the utility of the systems alone. Overall, these findings suggest serious challenges to the possibility of replacing DSM-IV axis II diagnoses with the FFM.
We emphasize that our goal was not to compare the DSM-IV and the FFM in the exact format proposed to be adopted and determine which system excels. For instance, the methods used in our studies are not based on the assumption that the FFM, if adopted, would be used without case vignettes or diagnostic information. Instead, our goal was to use an experimentally manipulated paradigm to examine specific cognitive difficulties that need to be recognized. We acknowledge that the current methods do not experimentally control for all possible differences between the DSM-IV and FFM (e.g., clinicians’ familiarity with the systems), but chose this approach so that results would be comparable to previous studies (22–24). Moreover, by not overcontrolling for practicing clinicians’ current understanding of the FFM, the results identify consequences that normal clinicians would face if the FFM replaced the DSM-IV axis-II diagnoses. Overall, any potential descriptive system to be incorporated into the DSM-V should take into account not only validity, but also clinicians’ ability to reason with the system.