Predicting Suicide Attempts and Suicide Deaths Following Outpatient Visits Using Electronic Health Records
Abstract
Objective:
The authors sought to develop and validate models using electronic health records to predict suicide attempt and suicide death following an outpatient visit.
Method:
Across seven health systems, 2,960,929 patients age 13 or older (mean age, 46 years; 62% female) made 10,275,853 specialty mental health visits and 9,685,206 primary care visits with mental health diagnoses between Jan. 1, 2009, and June 30, 2015. Health system records and state death certificate data identified suicide attempts (N=24,133) and suicide deaths (N=1,240) over 90 days following each visit. Potential predictors included 313 demographic and clinical characteristics extracted from records for up to 5 years before each visit: prior suicide attempts, mental health and substance use diagnoses, medical diagnoses, psychiatric medications dispensed, inpatient or emergency department care, and routinely administered depression questionnaires. Logistic regression models predicting suicide attempt and death were developed using penalized LASSO (least absolute shrinkage and selection operator) variable selection in a random sample of 65% of the visits and validated in the remaining 35%.
Results:
Mental health specialty visits with risk scores in the top 5% accounted for 43% of subsequent suicide attempts and 48% of suicide deaths. Of patients scoring in the top 5%, 5.4% attempted suicide and 0.26% died by suicide within 90 days. C-statistics (equivalent to area under the curve) for prediction of suicide attempt and suicide death were 0.851 (95% CI=0.848, 0.853) and 0.861 (95% CI=0.848, 0.875), respectively. Primary care visits with scores in the top 5% accounted for 48% of subsequent suicide attempts and 43% of suicide deaths. C-statistics for prediction of suicide attempt and suicide death were 0.853 (95% CI=0.849, 0.857) and 0.833 (95% CI=0.813, 0.853), respectively.
Conclusions:
Prediction models incorporating both health record data and responses to self-report questionnaires substantially outperform existing suicide risk prediction tools.
Suicide accounted for almost 45,000 deaths in the United States in 2016, a 25% increase since 2000 (1). Nonfatal suicide attempts account for almost 500,000 emergency department visits annually (2). Half of people who die by suicide and two-thirds of people who survive suicide attempts received some mental health diagnosis or treatment during the previous year (3, 4). Mindful of those prevention opportunities, a Joint Commission Sentinel Event Alert issued in 2016 recommends detection of suicide risk across health care (5). Unfortunately, traditional clinical detection of suicide risk is hardly better than chance (6).
We previously reported (7) that brief depression questionnaires can accurately predict suicide attempt or death. Outpatients who report having thoughts of death or self-harm “nearly every day” on item 9 of the Patient Health Questionnaire (PHQ-9) are seven times as likely to attempt suicide and six times as likely to die by suicide over the following 90 days compared with patients who report having such thoughts “not at all” (7). The sensitivity of this tool, however, is only moderate. One-third of suicide attempts and deaths occur among patients reporting having no suicidal ideation at all. Accurate identification of high risk is also only moderate. The 6% of patients who report suicidal ideation “more than half the days” or “nearly every day” account for only 35% of suicide attempts and deaths. More accurate tools are needed for identifying both low- and high-risk patients.
Recent research has used various modeling methods to predict suicidal behavior from electronic health records. Examples include prediction of suicide death among Veterans Health Administration service users (8), prediction of suicide death following psychiatric hospitalization among U.S. Army soldiers (9), distinguishing patients attempting suicide from those with other injuries or poisonings (10), and prediction of suicide or accidental death following civilian general hospital discharge (11). Two recent analyses have used health record data to predict suicide attempt or suicide death following outpatient visits. Kessler and colleagues (12) used health records and military service records to predict suicide death among U.S. Army soldiers in the 26 weeks following a mental health visit. Approximately one-quarter of suicide deaths occurred after the 5% of visits rated as highest risk. Barak-Corren and colleagues (13) used health record data to predict suicide attempt or death among outpatients making three or more visits in two large academic health systems. One-third of suicide attempts and deaths occurred in the 5% of patients with highest risk scores.
In this study, we combined data typically available from electronic health records with depression questionnaire data in seven large health systems to develop and validate models predicting suicide attempt and suicide death over the 90 days following a mental health or primary care visit.
Method
The seven health systems that participated in this research (HealthPartners; Henry Ford Health System; and the Colorado, Hawaii, Northwest, Southern California, and Washington regions of Kaiser Permanente) serve a combined population of about 8 million members in nine states. Each system provides insurance coverage and comprehensive health care (including general medical and specialty mental health care) to a defined population enrolled through employer-sponsored insurance, individual insurance, capitated Medicaid or Medicare, and subsidized low-income programs. Members are representative of each system’s service area in age, race/ethnicity, and socioeconomic status. All systems recommend using the PHQ-9 at mental health visits and primary care visits for depression, but implementation varied across systems during the study period.
As members of the Mental Health Research Network, each health system maintains a research data warehouse following the Health Care Systems Research Network’s Virtual Data Warehouse model (14). This resource combines data from insurance enrollment records, electronic health records, insurance claims, pharmacy dispensings, state mortality records, and census-derived neighborhood characteristics. Responsible institutional review boards for each health system approved use of these de-identified data for this research.
The study sample included any outpatient visit by a member age 13 or older either to a specialty mental health clinic or to a primary care clinic when a mental health diagnosis was recorded. Sampling was limited to visits to health system clinics (to ensure availability of electronic health record data) and people insured by the health system’s insurance plan (to ensure availability of insurance claims data). All qualifying visits from Jan. 1, 2009, through June 30, 2015, were included, except at the Henry Ford Health System, where only visits after implementation of a new electronic health record system on Dec. 1, 2012, were included.
Potential predictors extracted from health system records for up to 5 years before each visit included demographic characteristics (age, sex, race, ethnicity, source of insurance, and neighborhood income and educational attainment), current and past mental health and substance use diagnoses (organized in 12 categories), past suicide attempts, other past injury or poisoning diagnoses, dispensed prescriptions for mental health medication (organized in four categories), past inpatient or emergency department mental health care, general medical diagnoses (by Charlson Comorbidity Index [15] categories), and recorded scores on the PHQ-9 (16) (including total score and item 9 score).
Potential predictors were represented as dichotomous indicators. Each diagnosis category was represented by three overlapping indicators (recorded at or within 90 days before the visit, recorded within 1 year before, and recorded within 5 years before). Each category of medication or of emergency or inpatient utilization was represented by three overlapping indicators (occurred within 90 days before the visit, 1 year before, or any time before). To represent temporal patterns of prior PHQ-9 item 9 scores, 24 indicators were calculated for each encounter to represent number of observations, maximum value, and modal value (including value of missing) during three overlapping time periods (previous 90 days, previous 183 days, and previous 365 days). The final set of potential predictors for each encounter included 149 indicators and 164 possible interactions (see Appendix 9A in the online supplement for a complete list).
Diagnoses of self-harm or probable suicide attempt were ascertained from all injury or poisoning diagnoses recorded in electronic health records and insurance claims accompanied by an ICD-9 cause of injury code indicating intentional self-harm (codes E950–E958) or undetermined intent (codes E980–E989). Data from these health systems during the study period indicate that inclusion of injuries and poisonings with undetermined intent increases ascertainment of probable suicide attempts by approximately 25% (7) (see also Appendix 4 in the online supplement). Although use of E-codes varied across the United States during the study period (17), participating health systems were selected for high and consistent rates of E-code use (see Appendix 1 in the online supplement). Record review (7) also supports the positive predictive value of this definition for identification of true self-harm in these health systems (see also Appendix 2 in the online supplement). Furthermore, observation of coding changes across the transition from ICD-9 to the more specific ICD-10 coding scheme indicates that most “undetermined” ICD-9 diagnoses actually reflect self-harm (18) (see also Appendix 3 in the online supplement). Ascertainment of suicide attempts was censored at health system disenrollment, after which insurance claims data regarding self-harm diagnoses at external facilities would not be available.
Suicide deaths were ascertained from state mortality records. Following common recommendations (19, 20), all deaths with an ICD-10 diagnosis of self-inflicted injury (codes X60–X84) or injury/poisoning with undetermined intent (codes Y10–Y34) were considered probable suicide deaths. Inclusion of injury and poisoning deaths with undetermined intent increases ascertainment of probable suicide deaths by 5%−10% (7) (see also Appendix 4 in the online supplement).
All predictor and outcome variables were completely specified and calculated prior to model training.
Prediction models were developed separately for mental health specialty and primary care visits, with a random sample of 65% of each used for model training and 35% set aside for validation. Models included multiple visits per person in order to accurately represent changes in risk within patients over time. For each visit, analyses considered any outcome in the following 90 days, regardless of a subsequent visit in between. This approach uses all data available at the time of the index visit but avoids informative or biased censoring related to timing of visits following the index date. In the initial variable selection step, separate models predicting risk of suicide attempt and suicide death were estimated using logistic regression with penalized LASSO (least absolute shrinkage and selection operator) variable selection (21). The LASSO penalization factor selects important predictors by shrinking coefficients for weaker predictors toward zero, excluding predictors with estimated zero coefficients from the final sparse prediction model. To avoid overfitting models to idiosyncratic relationships in the training samples, variable selection used 10-fold cross-validation (22) to select the optimal level of tuning or penalization, measured by the Bayesian information criterion (23). In the second calibration step, generalized estimating equations with a logistic link reestimated coefficients in the training sample, accounting for both clustering of visits under patients and bias toward the null in LASSO coefficients. In the final validation step, logistic models derived from the above two-step process were applied in the 35% validation sample to calculate predicted probabilities for each visit. Results are reported as receiver operating characteristic (ROC) curves (24) with c-statistics (equivalent to area under the ROC curve) (25, 26), along with predicted and observed rates in prespecified strata of predicted probability. Overfitting was evaluated by comparing classification performance in training and validation samples and by comparing predicted risk and observed risk in the validation sample. Variable selection analyses were conducted using the GLMNET (27) and Foreach (28) packages for the R statistical package, version 3.4.0. Confidence intervals for c-statistics were calculated via bootstrap with 10,000 replications.
A public repository (www.github.com/MHResearchNetwork) includes specifications and code for defining predictor and outcome variables, a data dictionary and descriptive statistics for analytic data sets, code for variable selection and calibration steps, coefficients and confidence limits from all final models, and comparison of model performance in training and validation samples.
Results
We identified 19,961,059 eligible visits by 2,960,929 patients during the study period, including 10,275,853 mental health specialty visits and 9,685,206 primary care visits with mental health diagnoses (Table 1). Following the specifications above, health system records identified 24,133 unique probable suicide attempts within 90 days of an eligible visit, and state mortality records identified 1,240 unique suicide deaths within 90 days.
Mental Health Specialty | Primary Care | |||||||
---|---|---|---|---|---|---|---|---|
Training Sample | Validation Sample | Training Sample | Validation Sample | |||||
Characteristic | N | % | N | % | N | % | N | % |
Visits | 6,679,128 | 3,596,725 | 6,297,465 | 3,387,741 | ||||
Female | 4,157,997 | 62 | 2,239,213 | 62 | 3,872,830 | 61 | 2,083,424 | 61 |
Age group (years) | ||||||||
13–17 | 671,313 | 10 | 360,619 | 10 | 250,878 | 4 | 135,070 | 4 |
18–29 | 1,118,492 | 17 | 603,044 | 17 | 822,668 | 13 | 442,774 | 13 |
30–44 | 1,744,704 | 26 | 939,431 | 26 | 1,337,686 | 21 | 720,878 | 21 |
45–64 | 2,453,509 | 37 | 1,321,986 | 37 | 2,466,992 | 39 | 1,326,237 | 39 |
65 or older | 691,110 | 10 | 371,645 | 10 | 1,419,241 | 23 | 762,782 | 23 |
Race | ||||||||
White | 4,562,203 | 68 | 2,455,211 | 68 | 4,162,033 | 66 | 2,237,952 | 66 |
Asian | 302,231 | 5 | 162,400 | 5 | 379,910 | 6 | 204,272 | 6 |
Black | 600,219 | 9 | 324,233 | 9 | 514,021 | 8 | 276,260 | 8 |
Hawaiian/Pacific Islander | 74,473 | 1 | 40,118 | 1 | 103,420 | 2 | 55,833 | 2 |
Native American | 65,309 | 1 | 35,332 | 1 | 69,425 | 1 | 37,717 | 1 |
More than one or other | 38,223 | 1 | 20,485 | 1 | 43,445 | 1 | 23,391 | 1 |
Not recorded | 1,036,470 | 16 | 558,946 | 16 | 1,025,211 | 16 | 552,316 | 16 |
Hispanic ethnicity | 1,486,400 | 22 | 800,547 | 22 | 1,430,611 | 23 | 769,498 | 23 |
Insurance Type | ||||||||
Commercial group | 5,057,328 | 76 | 2,724,286 | 76 | 4,198,138 | 67 | 2,258,974 | 67 |
Individual | 827,218 | 12 | 445,749 | 12 | 1,079,401 | 17 | 580,225 | 17 |
Medicare | 363,598 | 5 | 194,773 | 5 | 576,184 | 9 | 310,001 | 9 |
Medicaid | 213,573 | 3 | 114,767 | 3 | 297,710 | 5 | 160,063 | 5 |
Other | 217,411 | 3 | 117,150 | 3 | 146,032 | 2 | 78,478 | 2 |
Patient Health Questionnaire item 9 score recorded at | ||||||||
Index visit | 657,998 | 10 | 354,918 | 10 | 312,065 | 5 | 168,569 | 5 |
Any visit in past year | 1,328,571 | 20 | 714,693 | 20 | 671,643 | 11 | 362,438 | 11 |
Length of enrollment prior to visit | ||||||||
1 year or more | 5,810,841 | 87 | 3,129,151 | 87 | 5,352,845 | 85 | 2,879,580 | 85 |
5 years or more | 3,772,409 | 56 | 2,031,916 | 56 | 3,542,358 | 56 | 1,907,063 | 56 |
Visits followed by | ||||||||
Suicide attempt within 90 days | 41,470 | 0.62 | 22,329 | 0.62 | 16,302 | 0.26 | 8,688 | 0.26 |
Suicide death within 90 days | 1,529 | 0.02 | 854 | 0.02 | 856 | 0.01 | 445 | 0.01 |
Characteristics of Sampled Visits to Specialty Mental Health and Primary Care Providers in Seven Health Systems (2009–2015), Randomly Divided Into Model Training (65%) and Validation (35%) Samples
Models predicting probable suicide attempt over 90 days were developed and validated for both mental health and primary care visits, excluding 0.3% of visits because of disenrollment within 90 days. Clinical variables with the largest positive prediction coefficients are listed in Table 2 (see Appendices 9B and 9C in the online supplement for all selected predictors and coefficients). The strongest predictors of suicide attempt were similar in mental health specialty and primary care patients: prior suicide attempt, mental health and substance use diagnoses, responses to PHQ-9 item 9, and prior inpatient or emergency mental health care.
Suicide Attempt or Death, by Care Setting | |
---|---|
Suicide attempt following: | |
Mental health specialty visit (of 94 predictors selected) | Primary care visit (of 102 predictors selected) |
Depression diagnosis in past 5 years | Depression diagnosis in past 5 years |
Drug abuse diagnosis in past 5 years | Suicide attempt diagnosis in past 5 years |
PHQ-9 item 9 score=3 in past year | Drug abuse diagnosis in past 5 years |
Alcohol use disorder diagnosis in past 5 years | Alcohol abuse diagnosis in past 5 years |
Mental health inpatient stay in past year | PHQ-9 item 9 score=3 in past year |
Benzodiazepine prescription in past 3 months | Suicide attempt diagnosis in past 3 months |
Suicide attempt in past 3 months | Suicide attempt diagnosis in past year |
Personality disorder diagnosis in past 5 years | Personality disorder diagnosis in past 5 years |
Eating disorder diagnosis in past 5 years | Anxiety disorder diagnosis in past 5 years |
Suicide attempt in past year | Suicide attempt diagnosis in past 5 years with schizophrenia diagnosis in past 5 years |
Mental health emergency department visit in past 3 months | Benzodiazepine prescription in past 3 months |
Self-inflicted cutting/piercing in past year | Eating disorder diagnosis in past 5 years |
Suicide attempt in past 5 years | Mental health emergency department visit in past 3 months |
Injury/poisoning diagnosis in past 3 months | Injury/poisoning diagnosis in past year |
Antidepressant prescription in past 3 months | Mental health emergency department visit in past year |
Suicide death following: | |
Mental health specialty visit (of 43 predictors selected) | Primary care visit (of 29 predictors selected) |
Suicide attempt diagnosis in past year | Mental health emergency department visit in past 3 months |
Benzodiazepine prescription in past 3 months | Alcohol abuse diagnosis in past 5 years |
Mental health emergency department visit in past 3 months | Benzodiazepine prescription in past 3 months |
Second-generation antipsychotic prescription in past 5 years | Depression diagnosis in past 5 years |
Mental health inpatient stay in past 5 years | Mental health inpatient stay in past year |
Mental health inpatient stay in past 3 months | Injury/poisoning diagnosis in past year |
Mental health inpatient stay in past year | Anxiety disorder diagnosis in past 5 years |
Alcohol use disorder diagnosis in past 5 years | PHQ-9 item 9 score=1 with PHQ-8 score |
Antidepressant prescription in past 3 months | PHQ-9 item 9 score=3 with age |
PHQ-9 item 9 score=3 with PHQ-8 score | Suicide attempt diagnosis in past 5 years with age |
PHQ-9 item 9 score=1 with age | Mental health emergency department visit in past year |
Depression diagnosis in past 5 years with age | PHQ-9 item 9 score=2 with age |
Suicide attempt diagnosis in past 5 years with Charlson score | PHQ-9 item 9 score=3 with PHQ-8 score |
PHQ-9 item 9 score=2 with age | Bipolar disorder diagnosis in past 5 years with age |
Anxiety disorder diagnosis in past 5 years with age | Depression diagnosis in past 5 years with age |
The left portion of Figure 1 presents ROC curves illustrating the sensitivity and specificity of suicide attempt predictions in the training and validation samples. The c-statistics (equivalent to area under the ROC curve) for prediction of suicide attempt in the validation samples were 0.851 (95% CI=0.848, 0.853) for mental health specialty visits and 0.853 (95% CI=0.849, 0.857) for primary care visits. In each graph, comparison of ROC curves shows no appreciable difference in prediction accuracy between the training and validation samples (i.e., no evidence of model overfitting). Table 3 compares predicted and observed risk for specific strata selected a priori. Among mental health specialty visits, the lowest two strata included 75% of all visits and 21% of all suicide attempts, and the highest three strata included 5% of visits and 43% of suicide attempts. Among primary care visits, the 75% of visits with the lowest risk scores accounted for 21% of suicide attempts, and the 5% of visits with the highest scores accounted for 48%. Comparison of predicted risk levels in the training sample and observed risk levels in the validation sample again shows no appreciable decline in model performance or evidence of model overfitting. Sensitivity analyses limited to diagnoses of definite self-harm slightly improved prediction accuracy (especially among primary care patients) but excluded approximately 25% of probable suicide attempts (see Appendix 4 in the online supplement). Sensitivity analyses limited to visits preceded by at least 5 years of complete data yielded essentially identical prediction accuracy (see Appendix 5 in the online supplement). Model fit was consistent across the seven participating health systems and across age and sex subgroups (see Appendix 8 in the online supplement).
Risk Score Percentile Strata | Predicted Riskb (%) | Actual Riskc (%) | % of All Attemptsd | Standardized Event Ratioe |
---|---|---|---|---|
Suicide attempts | ||||
Following a mental health specialty visit | ||||
>99.5th | 13.0 | 12.7 | 10 | 20.7 |
99th to 99.5th | 8.5 | 8.1 | 6 | 12.9 |
95th to 99th | 4.1 | 4.2 | 27 | 6.7 |
90th to 95th | 1.9 | 1.8 | 15 | 3.0 |
75th to 90th | 0.9 | 0.9 | 21 | 1.4 |
50th to 75th | 0.3 | 0.3 | 13 | 0.51 |
<50th | 0.1 | 0.1 | 8 | 0.16 |
Following a primary care visit with a mental health diagnosis | ||||
>99.5th | 8.6 | 8.0 | 15 | 30.5 |
99th to 99.5th | 4.1 | 4.2 | 8 | 16.3 |
95th to 99th | 1.6 | 1.6 | 25 | 6.2 |
90th to 95th | 0.7 | 0.7 | 13 | 2.6 |
75th to 90th | 0.3 | 0.3 | 18 | 1.2 |
50th to 75th | 0.1 | 0.1 | 12 | 0.49 |
<50th | 0.04 | 0.04 | 9 | 0.17 |
Suicide deaths | ||||
Following a mental health specialty visit | ||||
>99.5th | 0.654 | 0.694 | 12 | 24.6 |
99th to 99.5th | 0.638 | 0.595 | 11 | 21.5 |
95th to 99th | 0.162 | 0.167 | 25 | 6.3 |
90th to 95th | 0.068 | 0.088 | 16 | 2.3 |
75th to 90th | 0.031 | 0.029 | 16 | 1.1 |
50th to 75th | 0.014 | 0.015 | 13 | 0.54 |
<50th | 0.003 | 0.003 | 6 | 0.12 |
Following a primary care visit with a mental health diagnosis | ||||
>99.5th | 0.536 | 0.435 | 14 | 28.8 |
99th to 99.5th | 0.181 | 0.197 | 7 | 13.0 |
95th to 99th | 0.092 | 0.083 | 22 | 5.6 |
90th to 95th | 0.035 | 0.038 | 13 | 2.5 |
75th to 90th | 0.018 | 0.019 | 19 | 1.3 |
50th to 75th | 0.009 | 0.009 | 15 | 0.62 |
<50th | 0.003 | 0.003 | 10 | 0.19 |
The same process was implemented for prediction of suicide deaths over 90 days, with separate models for mental health specialty and primary care visits. The clinical variables most strongly associated with suicide death in each group are listed in Table 2 (see Appendices 9D and 9E in the online supplement for a complete list). Predictors of suicide death were similar in mental health specialty and primary care patients, and were similar to predictors of suicide attempt.
The right portion of Figure 1 presents ROC curves for prediction of suicide death in the training and validation samples. The c-statistics for prediction of suicide death in the validation samples were 0.861 (95% CI=0.848, 0.875) for mental health specialty visits and 0.833 (95% CI=0.813, 0.853) for primary care visits. Comparison of ROC curves for the training and validation samples shows no evidence of overfitting in the mental health specialty sample and a minimal separation of training and validation curves in the primary care sample. Table 3 compares predicted and observed risk for risk strata selected a priori. Among mental health specialty visits, the lowest two risk strata included 75% of visits and 19% of suicide deaths, and the highest three risk strata included 5% of visits and 48% of suicide deaths. Among primary care visits, the 75% of visits with the lowest risk scores accounted for 25% of suicide deaths, and the 5% of visits with the highest scores accounted for 43%. Comparison of predicted risk levels in the training sample and observed risk levels in the validation sample shows no evidence of overfitting in the primary care sample and a minimal falloff between the training and validation samples in the primary care sample. Sensitivity analyses limited to deaths coded as due to definite self-inflicted injury or poisoning found no meaningful difference in model fit (see Appendix 4 in the online supplement).
Table 4 lists sensitivity, specificity, positive predictive value, and negative predictive value for all four models at cut-points defined by percentiles of the risk score distribution.
Risk Score Percentile Cut-Points | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) |
---|---|---|---|---|
Suicide attempts | ||||
Following mental health specialty visits | ||||
>99th | 16.8 | 99.1 | 10.4 | 99.4 |
>95th | 43.7 | 95.2 | 5.4 | 99.6 |
>90th | 58.3 | 90.3 | 3.6 | 99.7 |
>75th | 79.2 | 75.2 | 2.0 | 99.8 |
>50th | 92.1 | 50.0 | 1.1 | 99.9 |
Following primary care visits with a mental health diagnosis | ||||
>99th | 23.5 | 99.1 | 6.1 | 99.8 |
>95th | 48.2 | 95.1 | 2.5 | 99.9 |
>90th | 61.0 | 90.1 | 1.6 | 99.9 |
>75th | 79.1 | 75.1 | 0.8 | 99.9 |
>50th | 91.4 | 50.1 | 0.5 | 99.9 |
Suicide deaths | ||||
Following mental health specialty visits | ||||
>99th | 23.1 | 99.0 | 0.62 | 99.9 |
>95th | 48.1 | 95.0 | 0.26 | 99.9 |
>90th | 64.3 | 90.0 | 0.17 | 99.9 |
>75th | 80.4 | 75.1 | 0.08 | 99.9 |
>50th | 94.0 | 50.0 | 0.05 | 99.9 |
Following primary care visits with a mental health diagnosis | ||||
>99th | 20.9 | 99.0 | 0.31 | 99.9 |
>95th | 43.1 | 95.0 | 0.13 | 99.9 |
>90th | 55.7 | 90.0 | 0.08 | 99.9 |
>75th | 74.8 | 75.1 | 0.05 | 99.9 |
>50th | 90.3 | 50.0 | 0.03 | 99.9 |
Discussion
In a sample of 20 million visits by 3 million patients in seven health systems, data from electronic health records accurately stratified mental health specialty and primary care visits according to short-term risk of suicide attempt or suicide death. Observed rates of probable suicide attempt and suicide death were over 200 times as high following visits in the highest 1% of predicted risk compared with visits in the bottom half of predicted risk (Table 3). The strongest predictors included mental health diagnoses, substance use diagnoses, use of mental health emergency and inpatient care, and history of self-harm. The absolute risk was lower in primary care, but the predictors selected and the accuracy of prediction were similar across care settings. Responses on the PHQ-9 were selected as important predictors, even though such data were available for only 15% of visits.
Potential Limitations
In interpreting these findings, we should consider both false positive and false negative errors in the ascertainment of probable suicide attempts and deaths. Previous research suggests that false positive rates are near zero for suicide deaths diagnosed by medical examiners (20) and below 20% for diagnoses of definite or possible self-inflicted injury in records from these health systems (7) (see also Appendix 2 in the online supplement). Diagnostic data do not distinguish between self-harm with and without intent to die. Consequently, our definition of probable suicide attempt may include a small proportion of self-harm episodes without suicidal intent. False negative errors may be more common. Up to one-quarter of suicide deaths may not be identified by medical examiners (19). Health system records will not capture suicide attempts when people do not seek care or when providers do not recognize and record diagnoses of self-harm. Nonspecific error (either false positive or false negative) would lead to underestimating the accuracy of prediction models (see Appendix 4 in the online supplement), whereas selective error in the wrong direction (e.g., underascertainment of suicide attempts in patients with low risk scores) could lead to overestimation of model performance.
Health system records do not reflect important social risk factors for suicidal behavior, such as job loss, bereavement, and relationship disruption. Suicidal behavior likely reflects the intersection of clinical risk factors, negative life events, and access to means of self-harm. Data regarding those social risk factors would certainly improve accuracy of prediction.
Our analyses do not consider the one-third to one-half of people who attempt suicide or die by suicide who have no recent mental health treatment or recorded diagnosis (3, 4, 29). Prediction using electronic health record data may also prove useful among patients without recorded mental health diagnoses, but prediction models would necessarily be limited to general medical diagnoses and utilization rather than the mental health diagnoses and treatments selected in this sample.
Methodologic Considerations
We focused on risk over 90 days following an outpatient visit. Risk does vary between visits (30), and near-term risk is most relevant to clinical decisions and quality improvement (31). The interventions that providers or health systems might provide for high-risk patients would typically be delivered over weeks or months (32, 33). Predictors selected in these models (Table 2) include both recent or short-term factors and long-term factors, consistent with previous research (7, 30) indicating that suicidal behavior is influenced by both stable and variable risk factors. Sensitivity analyses using a 30-day outcome window (see Appendix 7 in the online supplement) yielded similar results regarding both predictors selected and accuracy of prediction. Analyses regarding longer-term risk might identify different predictors of suicidal behavior.
Of predictive modeling methods, parametric methods like LASSO lie closest to traditional regression. Nonparametric methods (34) such as random forest could theoretically improve accuracy of prediction. Direct comparisons to date (12, 35), however, have found equal or superior prediction using parametric methods similar to those used here. Nonparametric methods may have little advantage when predictors are dichotomous, such as the diagnosis and utilization indicators included in our models. Parametric models are usually more transparent to clinicians (36) and simpler to implement in electronic health records, as is now under way in these health systems and the Veterans Health Administration (35).
Variable selection models are subject to overfitting or selection of predictive relationships idiosyncratic to a specific sample. The large sample used for training of these models offers some protection against overfitting. In addition, we present explicit comparisons of performance in the training and randomly selected validation samples for all four models (see Table 3 and Figure 1), finding no indication of overfitting in prediction of suicide attempts or prediction of suicide deaths following mental health specialty visits. We do find a slight indication of overfitting in prediction of suicide deaths following primary care visits, likely reflecting the smaller number of events included in these models. Nevertheless, the overall accuracy of prediction (c-statistic) in the independent validation sample exceeds 80%.
In addition to evaluating overfitting within this sample, we should consider generalizability to other care settings or patient populations. This sample included almost 20 million visits in seven health systems serving patients in nine states, including states with high and low rates of suicide mortality. Patients were broadly representative of those service areas in race/ethnicity, socioeconomic status, and source of insurance coverage, including substantial numbers insured by Medicare and Medicaid. Methods could be easily transported to health systems with standard electronic health records and insurance claim databases. Predicted risk levels, however, could be over- or underestimated in settings with higher or lower average risk of suicidal behavior. The predictors selected and the accuracy of prediction could differ in settings with different patterns of mental health care, especially if patterns of diagnosis or utilization were less closely linked to risk of suicidal behavior. The intervention of effective suicide prevention programs could also weaken the relationship between these identified risk predictors and subsequent suicidal behavior. Consequently, we recommend replication in other health systems prior to broad application. All information necessary for replication is available via our online repository.
Context
These empirically derived risk scores outperformed risk stratification based solely on item 9 of the PHQ-9. Regarding sensitivity, selecting mental health visits with any positive response to item 9 would identify only two-thirds of subsequent suicide attempts and deaths (7), whereas selecting visits with risk scores above the 75th percentile would identify 80%. Regarding efficient identification of high risk, selecting the 6% of visits with a response of “more than half the days” or “nearly every day” would identify one-third of subsequent suicide attempts and deaths (7), whereas selecting the 5% of visits with the highest risk scores would identify almost half.
Predictors identified in these models included a range of demographic characteristics, mental health diagnoses, and historical indicators of mental health treatment generally similar to those identified in previous research (9, 12, 13). Based on results in validation samples, performance of these prediction models equaled or exceeded that of other published models using health records to predict suicidal behavior (8–13), where c-statistics ranged from 0.67 to 0.84. These models significantly outperformed other published models predicting suicidal behavior after an outpatient visit, a question of high interest to a wide range of mental health and primary care providers. In this sample, mental health specialty visits with risk scores in the top 5% accounted for 43% of suicide attempts and 48% of suicide deaths in the following 90 days, and primary care visits in the top 5% accounted for 48% of subsequent suicide attempts and 43% of subsequent suicide deaths. For comparison, in two previous models predicting suicidal behavior following outpatient visits (12, 13), the top 5% of patients accounted for between one-quarter and one-third of subsequent suicide attempts and deaths. This improved prediction likely reflects differences in data and methods. First, longitudinal records in integrated health systems may allow more complete ascertainment of risk factors. Second, our analyses consider a larger number of potential predictors and more detailed temporal encoding. Third, responses to PHQ-9 item 9 contributed to prediction, even though such data were available for only 10%−20% of visits. Prediction accuracy would likely improve with greater use of the PHQ-9 or similar measures, as is expected with new initiatives promoting routine outcome assessment (37) and identification of suicidal ideation (5).
The c-statistics for these suicide prediction models also exceed those for models using health record data to predict rehospitalization for heart failure (38), in-hospital mortality from sepsis (39), and high emergency department utilization (40). Suicidal behavior may be more predictable than many adverse medical outcomes.
Among mental health specialty visits, a cut-point at the 95th percentile of risk had a positive predictive value of 5.4% for suicide attempt within 90 days. While that predictive value would be inadequate for a diagnostic test, it is similar or superior to widely accepted tools for prediction of major medical outcomes such as stroke in atrial fibrillation (41) and cardiovascular events (42). Furthermore, predictive values or expected event rates for widely accepted medical prediction tools often include adverse outcomes accumulated over many years (41, 42), rather than the 90-day risk period considered in these analyses.
Clinical Implications
Some recent discussions of predictive modeling in health care warn that reliance on algorithms could lead to inappropriate causal inference (43–45) or atrophy of clinician judgment (43). Regarding the first point, associations identified by our model should certainly not be interpreted as evidence for independent or causal relationships. For example, a recent benzodiazepine prescription is more likely a marker of increased risk than a cause of suicidal behavior. We report predictors selected (Table 2) to demonstrate that all are expected correlates of suicidal behavior, albeit in specific combinations within specific time periods. Regarding the second point, our model and other models predicting suicidal behavior from records data rely largely on the diagnostic and treatment decisions of treating clinicians. The predictors identified by our analyses would be well known to most mental health providers. Predictive models simply allow us to consistently combine millions of providers’ individual judgments to accurately predict an important but rare event (45).
Prediction models cannot replace clinical judgment, but risk scores can certainly inform both individual clinical decisions and quality improvement programs. Participating health systems now recommend completion of a structured suicide risk assessment (46) after any response of “more than half the days” or “nearly every day” to PHQ-9 item 9—implying a 90-day risk of suicide attempt of 2%−3% (7). A predicted 90-day risk exceeding 5% (i.e., above the 95th percentile for mental health specialty visits) would seem to warrant a similar level of additional assessment. A predicted 90-day suicide attempt risk exceeding 10% (i.e., above the 99th percentile for mental health specialty visits) should warrant creation of a personal safety plan and counseling regarding reducing access to means of self-harm (47, 48). Accurate risk stratification can also inform providers’ and health systems’ decisions regarding frequency of follow-up, referral for intensive treatment, or outreach following missed or canceled appointments (31, 49). Implementing these risk-based care pathways and outreach programs is a central goal of the Zero Suicide prevention model recommended by the U.S. National Action Alliance for Suicide Prevention (48). Empirically derived risk predictions can be an important component of that national suicide prevention strategy.
1 : NCHS Data Brief: Mortality in the United States, 2016. Hyattsville, Md, National Center for Health Statistics, 2017Google Scholar
2 Centers for Disease Control and Prevention: Web-Based Injury Statistics Query and Reporting System (WISQARS), Nonfatal Injury Reports, 2000–2014. https://webappa.cdc.gov/sasweb/ncipc/nfirates.htmlGoogle Scholar
3 : Health care contacts in the year before suicide death. J Gen Intern Med 2014; 29:870–877Crossref, Medline, Google Scholar
4 : Racial/ethnic differences in health care visits made before suicide attempt across the United States. Med Care 2015; 53:430–435Crossref, Medline, Google Scholar
5 Patient Safety Advisory Group: Detecting and treating suicidal ideation in all settings. Chicago, Joint Commission Sentinel Event Alerts, 2016, issue 56 (https://www.jointcommission.org/assets/1/18/SEA_56_Suicide.pdf)Google Scholar
6 : Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull 2017; 143:187–232Crossref, Medline, Google Scholar
7 : Risk of suicide attempt and suicide death following completion of the Patient Health Questionnaire depression module in community practice. J Clin Psychiatry 2016; 77:221–227Crossref, Medline, Google Scholar
8 : Predictive modeling and concentration of the risk of suicide: implications for preventive interventions in the US Department of Veterans Affairs. Am J Public Health 2015; 105:1935–1942Crossref, Medline, Google Scholar
9 : Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). JAMA Psychiatry 2015; 72:49–57Crossref, Medline, Google Scholar
10 : Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci 2017; 5:457–469Crossref, Google Scholar
11 : Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry 2016; 73:1064–1071Crossref, Medline, Google Scholar
12 : Predicting suicides after outpatient mental health visits in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Mol Psychiatry 2017; 22:544–551Crossref, Medline, Google Scholar
13 : Predicting suicidal behavior from longitudinal electronic health records. Am J Psychiatry 2017; 174:154–162Link, Google Scholar
14 : The HMO Research Network Virtual Data Warehouse: a public data model to support collaboration. EGEMS (Wash DC) 2014; 2:1049Medline, Google Scholar
15 : Validation of a combined comorbidity index. J Clin Epidemiol 1994; 47:1245–1251Crossref, Medline, Google Scholar
16 : The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. Gen Hosp Psychiatry 2010; 32:345–359Crossref, Medline, Google Scholar
17 : How complete are E-codes in commercial plan claims databases? Pharmacoepidemiol Drug Saf 2014; 23:218–220Crossref, Medline, Google Scholar
18 : Changes in coding of suicide attempts or self-harm with transition from ICD-9 to ICD-10. Psychiatr Serv 2017; 68:215Link, Google Scholar
19 : The accuracy of suicide statistics: are true suicide deaths misclassified? Soc Psychiatry Psychiatr Epidemiol 2016; 51:115–123Crossref, Medline, Google Scholar
20 : An examination of potential misclassification of army suicides: results from the Army Study to Assess Risk and Resilience in Servicemembers. Suicide Life Threat Behav 2017; 47:257–265Crossref, Medline, Google Scholar
21 : Regression shrinkage and selection via the lasso. J R Stat Soc B 1996; 58:267–288Google Scholar
22 : The Elements of Statistical Learning, 2nd ed. New York, Springer, 2009Crossref, Google Scholar
23 : Bayes factors. J Am Stat Assoc 1995; 90:773–795Crossref, Google Scholar
24 : Signal Detection Theory and ROC Analysis. New York, Springer Academic Press, 1975Google Scholar
25 : The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29–36Crossref, Medline, Google Scholar
26 : The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 1997; 30:1145–1159Crossref, Google Scholar
27 : Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33:1–22Crossref, Medline, Google Scholar
28 Weston S: Foreach looping construct for R, R Package, version 1.4.3, 2015Google Scholar
29 : Mental health treatment patterns among adults with recent suicide attempts in the United States. Am J Public Health 2014; 104:2359–2368Crossref, Medline, Google Scholar
30 : Between-visit changes in suicidal ideation and risk of subsequent suicide attempt. Depress Anxiety 2017; 34:794–800Crossref, Medline, Google Scholar
31 : Focusing suicide prevention on periods of high risk. JAMA 2014; 311:1107–1108Crossref, Medline, Google Scholar
32 : Cognitive therapy for the prevention of suicide attempts: a randomized controlled trial. JAMA 2005; 294:563–570Crossref, Medline, Google Scholar
33 : Psychosocial treatments of suicidal behaviors: a practice-friendly review. J Clin Psychol 2006; 62:161–170Crossref, Medline, Google Scholar
34 : Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002; 35:352–359Crossref, Medline, Google Scholar
35 : Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans Health Administration. Int J Methods Psychiatr Res 2017; 26:26Crossref, Google Scholar
36 : Machine learning and electronic health records: a paradigm shift. Am J Psychiatry 2017; 174:93–94Link, Google Scholar
37 HEDIS Depression Measures Specified for Electronic Clinical Data Systems. http://www.ncqa.org/HEDISQualityMeasurement/HEDISLearningCollaborative/HEDISDepressionMeasures.aspxGoogle Scholar
38 : Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol 2017; 2:204–209Crossref, Medline, Google Scholar
39 : Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data–driven, machine learning approach. Acad Emerg Med 2016; 23:269–278Crossref, Medline, Google Scholar
40 : Using the electronic medical record to identify patients at high risk for frequent emergency department visits and high system costs. Am J Med 2017; 130:601.e17–601.e22Crossref, Google Scholar
41 : Can we predict stroke in atrial fibrillation? Clin Cardiol 2012; 35(suppl 1):21–27Crossref, Medline, Google Scholar
42 : Accuracy of the atherosclerotic cardiovascular risk equation in a large contemporary, multiethnic population. J Am Coll Cardiol 2016; 67:2118–2130Crossref, Medline, Google Scholar
43 : Unintended consequences of machine learning in medicine. JAMA 2017; 318:517–518Crossref, Medline, Google Scholar
44 : Machine learning and prediction in medicine: beyond the peak of inflated expectations. N Engl J Med 2017; 376:2507–2509Crossref, Medline, Google Scholar
45 : Predicting the future: big data, machine learning, and clinical medicine. N Engl J Med 2016; 375:1216–1219Crossref, Medline, Google Scholar
46 : The Columbia-Suicide Severity Rating Scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am J Psychiatry 2011; 168:1266–1277Link, Google Scholar
47 : Facilitating action for suicide prevention by learning health care systems. Psychiatr Serv 2016; 67:830–832Link, Google Scholar
48 : Suicide prevention: an emerging priority for health care. Health Aff (Millwood) 2016; 35:1084–1090Crossref, Medline, Google Scholar
49 Suicide prevention in an emergency department population: the ED-SAFE study. JAMA Psychiatry 2017; 74:563–570Crossref, Medline, Google Scholar