The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×

Abstract

Objective:

The authors sought to develop and validate models using electronic health records to predict suicide attempt and suicide death following an outpatient visit.

Method:

Across seven health systems, 2,960,929 patients age 13 or older (mean age, 46 years; 62% female) made 10,275,853 specialty mental health visits and 9,685,206 primary care visits with mental health diagnoses between Jan. 1, 2009, and June 30, 2015. Health system records and state death certificate data identified suicide attempts (N=24,133) and suicide deaths (N=1,240) over 90 days following each visit. Potential predictors included 313 demographic and clinical characteristics extracted from records for up to 5 years before each visit: prior suicide attempts, mental health and substance use diagnoses, medical diagnoses, psychiatric medications dispensed, inpatient or emergency department care, and routinely administered depression questionnaires. Logistic regression models predicting suicide attempt and death were developed using penalized LASSO (least absolute shrinkage and selection operator) variable selection in a random sample of 65% of the visits and validated in the remaining 35%.

Results:

Mental health specialty visits with risk scores in the top 5% accounted for 43% of subsequent suicide attempts and 48% of suicide deaths. Of patients scoring in the top 5%, 5.4% attempted suicide and 0.26% died by suicide within 90 days. C-statistics (equivalent to area under the curve) for prediction of suicide attempt and suicide death were 0.851 (95% CI=0.848, 0.853) and 0.861 (95% CI=0.848, 0.875), respectively. Primary care visits with scores in the top 5% accounted for 48% of subsequent suicide attempts and 43% of suicide deaths. C-statistics for prediction of suicide attempt and suicide death were 0.853 (95% CI=0.849, 0.857) and 0.833 (95% CI=0.813, 0.853), respectively.

Conclusions:

Prediction models incorporating both health record data and responses to self-report questionnaires substantially outperform existing suicide risk prediction tools.

Suicide accounted for almost 45,000 deaths in the United States in 2016, a 25% increase since 2000 (1). Nonfatal suicide attempts account for almost 500,000 emergency department visits annually (2). Half of people who die by suicide and two-thirds of people who survive suicide attempts received some mental health diagnosis or treatment during the previous year (3, 4). Mindful of those prevention opportunities, a Joint Commission Sentinel Event Alert issued in 2016 recommends detection of suicide risk across health care (5). Unfortunately, traditional clinical detection of suicide risk is hardly better than chance (6).

We previously reported (7) that brief depression questionnaires can accurately predict suicide attempt or death. Outpatients who report having thoughts of death or self-harm “nearly every day” on item 9 of the Patient Health Questionnaire (PHQ-9) are seven times as likely to attempt suicide and six times as likely to die by suicide over the following 90 days compared with patients who report having such thoughts “not at all” (7). The sensitivity of this tool, however, is only moderate. One-third of suicide attempts and deaths occur among patients reporting having no suicidal ideation at all. Accurate identification of high risk is also only moderate. The 6% of patients who report suicidal ideation “more than half the days” or “nearly every day” account for only 35% of suicide attempts and deaths. More accurate tools are needed for identifying both low- and high-risk patients.

Recent research has used various modeling methods to predict suicidal behavior from electronic health records. Examples include prediction of suicide death among Veterans Health Administration service users (8), prediction of suicide death following psychiatric hospitalization among U.S. Army soldiers (9), distinguishing patients attempting suicide from those with other injuries or poisonings (10), and prediction of suicide or accidental death following civilian general hospital discharge (11). Two recent analyses have used health record data to predict suicide attempt or suicide death following outpatient visits. Kessler and colleagues (12) used health records and military service records to predict suicide death among U.S. Army soldiers in the 26 weeks following a mental health visit. Approximately one-quarter of suicide deaths occurred after the 5% of visits rated as highest risk. Barak-Corren and colleagues (13) used health record data to predict suicide attempt or death among outpatients making three or more visits in two large academic health systems. One-third of suicide attempts and deaths occurred in the 5% of patients with highest risk scores.

In this study, we combined data typically available from electronic health records with depression questionnaire data in seven large health systems to develop and validate models predicting suicide attempt and suicide death over the 90 days following a mental health or primary care visit.

Method

The seven health systems that participated in this research (HealthPartners; Henry Ford Health System; and the Colorado, Hawaii, Northwest, Southern California, and Washington regions of Kaiser Permanente) serve a combined population of about 8 million members in nine states. Each system provides insurance coverage and comprehensive health care (including general medical and specialty mental health care) to a defined population enrolled through employer-sponsored insurance, individual insurance, capitated Medicaid or Medicare, and subsidized low-income programs. Members are representative of each system’s service area in age, race/ethnicity, and socioeconomic status. All systems recommend using the PHQ-9 at mental health visits and primary care visits for depression, but implementation varied across systems during the study period.

As members of the Mental Health Research Network, each health system maintains a research data warehouse following the Health Care Systems Research Network’s Virtual Data Warehouse model (14). This resource combines data from insurance enrollment records, electronic health records, insurance claims, pharmacy dispensings, state mortality records, and census-derived neighborhood characteristics. Responsible institutional review boards for each health system approved use of these de-identified data for this research.

The study sample included any outpatient visit by a member age 13 or older either to a specialty mental health clinic or to a primary care clinic when a mental health diagnosis was recorded. Sampling was limited to visits to health system clinics (to ensure availability of electronic health record data) and people insured by the health system’s insurance plan (to ensure availability of insurance claims data). All qualifying visits from Jan. 1, 2009, through June 30, 2015, were included, except at the Henry Ford Health System, where only visits after implementation of a new electronic health record system on Dec. 1, 2012, were included.

Potential predictors extracted from health system records for up to 5 years before each visit included demographic characteristics (age, sex, race, ethnicity, source of insurance, and neighborhood income and educational attainment), current and past mental health and substance use diagnoses (organized in 12 categories), past suicide attempts, other past injury or poisoning diagnoses, dispensed prescriptions for mental health medication (organized in four categories), past inpatient or emergency department mental health care, general medical diagnoses (by Charlson Comorbidity Index [15] categories), and recorded scores on the PHQ-9 (16) (including total score and item 9 score).

Potential predictors were represented as dichotomous indicators. Each diagnosis category was represented by three overlapping indicators (recorded at or within 90 days before the visit, recorded within 1 year before, and recorded within 5 years before). Each category of medication or of emergency or inpatient utilization was represented by three overlapping indicators (occurred within 90 days before the visit, 1 year before, or any time before). To represent temporal patterns of prior PHQ-9 item 9 scores, 24 indicators were calculated for each encounter to represent number of observations, maximum value, and modal value (including value of missing) during three overlapping time periods (previous 90 days, previous 183 days, and previous 365 days). The final set of potential predictors for each encounter included 149 indicators and 164 possible interactions (see Appendix 9A in the online supplement for a complete list).

Diagnoses of self-harm or probable suicide attempt were ascertained from all injury or poisoning diagnoses recorded in electronic health records and insurance claims accompanied by an ICD-9 cause of injury code indicating intentional self-harm (codes E950–E958) or undetermined intent (codes E980–E989). Data from these health systems during the study period indicate that inclusion of injuries and poisonings with undetermined intent increases ascertainment of probable suicide attempts by approximately 25% (7) (see also Appendix 4 in the online supplement). Although use of E-codes varied across the United States during the study period (17), participating health systems were selected for high and consistent rates of E-code use (see Appendix 1 in the online supplement). Record review (7) also supports the positive predictive value of this definition for identification of true self-harm in these health systems (see also Appendix 2 in the online supplement). Furthermore, observation of coding changes across the transition from ICD-9 to the more specific ICD-10 coding scheme indicates that most “undetermined” ICD-9 diagnoses actually reflect self-harm (18) (see also Appendix 3 in the online supplement). Ascertainment of suicide attempts was censored at health system disenrollment, after which insurance claims data regarding self-harm diagnoses at external facilities would not be available.

Suicide deaths were ascertained from state mortality records. Following common recommendations (19, 20), all deaths with an ICD-10 diagnosis of self-inflicted injury (codes X60–X84) or injury/poisoning with undetermined intent (codes Y10–Y34) were considered probable suicide deaths. Inclusion of injury and poisoning deaths with undetermined intent increases ascertainment of probable suicide deaths by 5%−10% (7) (see also Appendix 4 in the online supplement).

All predictor and outcome variables were completely specified and calculated prior to model training.

Prediction models were developed separately for mental health specialty and primary care visits, with a random sample of 65% of each used for model training and 35% set aside for validation. Models included multiple visits per person in order to accurately represent changes in risk within patients over time. For each visit, analyses considered any outcome in the following 90 days, regardless of a subsequent visit in between. This approach uses all data available at the time of the index visit but avoids informative or biased censoring related to timing of visits following the index date. In the initial variable selection step, separate models predicting risk of suicide attempt and suicide death were estimated using logistic regression with penalized LASSO (least absolute shrinkage and selection operator) variable selection (21). The LASSO penalization factor selects important predictors by shrinking coefficients for weaker predictors toward zero, excluding predictors with estimated zero coefficients from the final sparse prediction model. To avoid overfitting models to idiosyncratic relationships in the training samples, variable selection used 10-fold cross-validation (22) to select the optimal level of tuning or penalization, measured by the Bayesian information criterion (23). In the second calibration step, generalized estimating equations with a logistic link reestimated coefficients in the training sample, accounting for both clustering of visits under patients and bias toward the null in LASSO coefficients. In the final validation step, logistic models derived from the above two-step process were applied in the 35% validation sample to calculate predicted probabilities for each visit. Results are reported as receiver operating characteristic (ROC) curves (24) with c-statistics (equivalent to area under the ROC curve) (25, 26), along with predicted and observed rates in prespecified strata of predicted probability. Overfitting was evaluated by comparing classification performance in training and validation samples and by comparing predicted risk and observed risk in the validation sample. Variable selection analyses were conducted using the GLMNET (27) and Foreach (28) packages for the R statistical package, version 3.4.0. Confidence intervals for c-statistics were calculated via bootstrap with 10,000 replications.

A public repository (www.github.com/MHResearchNetwork) includes specifications and code for defining predictor and outcome variables, a data dictionary and descriptive statistics for analytic data sets, code for variable selection and calibration steps, coefficients and confidence limits from all final models, and comparison of model performance in training and validation samples.

Results

We identified 19,961,059 eligible visits by 2,960,929 patients during the study period, including 10,275,853 mental health specialty visits and 9,685,206 primary care visits with mental health diagnoses (Table 1). Following the specifications above, health system records identified 24,133 unique probable suicide attempts within 90 days of an eligible visit, and state mortality records identified 1,240 unique suicide deaths within 90 days.

TABLE 1. Characteristics of Sampled Visits to Specialty Mental Health and Primary Care Providers in Seven Health Systems (2009–2015), Randomly Divided Into Model Training (65%) and Validation (35%) Samples

Mental Health SpecialtyPrimary Care
Training SampleValidation SampleTraining SampleValidation Sample
CharacteristicN%N%N%N%
Visits6,679,1283,596,7256,297,4653,387,741
Female4,157,997622,239,213623,872,830612,083,42461
Age group (years)
 13–17671,31310360,61910250,8784135,0704
 18–291,118,49217603,04417822,66813442,77413
 30–441,744,70426939,431261,337,68621720,87821
 45–642,453,509371,321,986372,466,992391,326,23739
 65 or older691,11010371,645101,419,24123762,78223
Race
 White4,562,203682,455,211684,162,033662,237,95266
 Asian302,2315162,4005379,9106204,2726
 Black600,2199324,2339514,0218276,2608
 Hawaiian/Pacific Islander74,473140,1181103,420255,8332
 Native American65,309135,332169,425137,7171
 More than one or other38,223120,485143,445123,3911
 Not recorded1,036,47016558,946161,025,21116552,31616
Hispanic ethnicity1,486,40022800,547221,430,61123769,49823
Insurance Type
 Commercial group5,057,328762,724,286764,198,138672,258,97467
 Individual827,21812445,749121,079,40117580,22517
 Medicare363,5985194,7735576,1849310,0019
 Medicaid213,5733114,7673297,7105160,0635
 Other217,4113117,1503146,032278,4782
Patient Health Questionnaire item 9 score recorded at
 Index visit657,99810354,91810312,0655168,5695
 Any visit in past year1,328,57120714,69320671,64311362,43811
Length of enrollment prior to visit
 1 year or more5,810,841873,129,151875,352,845852,879,58085
 5 years or more3,772,409562,031,916563,542,358561,907,06356
Visits followed by
 Suicide attempt within 90 days41,4700.6222,3290.6216,3020.268,6880.26
 Suicide death within 90 days1,5290.028540.028560.014450.01

TABLE 1. Characteristics of Sampled Visits to Specialty Mental Health and Primary Care Providers in Seven Health Systems (2009–2015), Randomly Divided Into Model Training (65%) and Validation (35%) Samples

Enlarge table

Models predicting probable suicide attempt over 90 days were developed and validated for both mental health and primary care visits, excluding 0.3% of visits because of disenrollment within 90 days. Clinical variables with the largest positive prediction coefficients are listed in Table 2 (see Appendices 9B and 9C in the online supplement for all selected predictors and coefficients). The strongest predictors of suicide attempt were similar in mental health specialty and primary care patients: prior suicide attempt, mental health and substance use diagnoses, responses to PHQ-9 item 9, and prior inpatient or emergency mental health care.

TABLE 2. Clinical Characteristics Selected for Prediction of Suicide Attempt and Suicide Death Within 90 Days of Visit in Seven Health Systems (2009–2015), Listed in Order of Coefficients in Logistic Regression Modelsa

Suicide Attempt or Death, by Care Setting
Suicide attempt following:
Mental health specialty visit (of 94 predictors selected)Primary care visit (of 102 predictors selected)
 Depression diagnosis in past 5 years Depression diagnosis in past 5 years
 Drug abuse diagnosis in past 5 years Suicide attempt diagnosis in past 5 years
 PHQ-9 item 9 score=3 in past year Drug abuse diagnosis in past 5 years
 Alcohol use disorder diagnosis in past 5 years Alcohol abuse diagnosis in past 5 years
 Mental health inpatient stay in past year PHQ-9 item 9 score=3 in past year
 Benzodiazepine prescription in past 3 months Suicide attempt diagnosis in past 3 months
 Suicide attempt in past 3 months Suicide attempt diagnosis in past year
 Personality disorder diagnosis in past 5 years Personality disorder diagnosis in past 5 years
 Eating disorder diagnosis in past 5 years Anxiety disorder diagnosis in past 5 years
 Suicide attempt in past year Suicide attempt diagnosis in past 5 years with schizophrenia diagnosis in past 5 years
 Mental health emergency department visit in past 3 months Benzodiazepine prescription in past 3 months
 Self-inflicted cutting/piercing in past year Eating disorder diagnosis in past 5 years
 Suicide attempt in past 5 years Mental health emergency department visit in past 3 months
 Injury/poisoning diagnosis in past 3 months Injury/poisoning diagnosis in past year
 Antidepressant prescription in past 3 months Mental health emergency department visit in past year
Suicide death following:
Mental health specialty visit (of 43 predictors selected)Primary care visit (of 29 predictors selected)
 Suicide attempt diagnosis in past year Mental health emergency department visit in past 3 months
 Benzodiazepine prescription in past 3 months Alcohol abuse diagnosis in past 5 years
 Mental health emergency department visit in past 3 months Benzodiazepine prescription in past 3 months
 Second-generation antipsychotic prescription in past 5 years Depression diagnosis in past 5 years
 Mental health inpatient stay in past 5 years Mental health inpatient stay in past year
 Mental health inpatient stay in past 3 months Injury/poisoning diagnosis in past year
 Mental health inpatient stay in past year Anxiety disorder diagnosis in past 5 years
 Alcohol use disorder diagnosis in past 5 years PHQ-9 item 9 score=1 with PHQ-8 score
 Antidepressant prescription in past 3 months PHQ-9 item 9 score=3 with age
 PHQ-9 item 9 score=3 with PHQ-8 score Suicide attempt diagnosis in past 5 years with age
 PHQ-9 item 9 score=1 with age Mental health emergency department visit in past year
 Depression diagnosis in past 5 years with age PHQ-9 item 9 score=2 with age
 Suicide attempt diagnosis in past 5 years with Charlson score PHQ-9 item 9 score=3 with PHQ-8 score
 PHQ-9 item 9 score=2 with age Bipolar disorder diagnosis in past 5 years with age
 Anxiety disorder diagnosis in past 5 years with age Depression diagnosis in past 5 years with age

aInteraction terms are indicated by “with”; see Appendices 9B–9E in the online supplement for a complete list. PHQ-9=Patient Health Questionnaire; PHQ-8=Patient Health Questionnaire depression scale.

TABLE 2. Clinical Characteristics Selected for Prediction of Suicide Attempt and Suicide Death Within 90 Days of Visit in Seven Health Systems (2009–2015), Listed in Order of Coefficients in Logistic Regression Modelsa

Enlarge table

The left portion of Figure 1 presents ROC curves illustrating the sensitivity and specificity of suicide attempt predictions in the training and validation samples. The c-statistics (equivalent to area under the ROC curve) for prediction of suicide attempt in the validation samples were 0.851 (95% CI=0.848, 0.853) for mental health specialty visits and 0.853 (95% CI=0.849, 0.857) for primary care visits. In each graph, comparison of ROC curves shows no appreciable difference in prediction accuracy between the training and validation samples (i.e., no evidence of model overfitting). Table 3 compares predicted and observed risk for specific strata selected a priori. Among mental health specialty visits, the lowest two strata included 75% of all visits and 21% of all suicide attempts, and the highest three strata included 5% of visits and 43% of suicide attempts. Among primary care visits, the 75% of visits with the lowest risk scores accounted for 21% of suicide attempts, and the 5% of visits with the highest scores accounted for 48%. Comparison of predicted risk levels in the training sample and observed risk levels in the validation sample again shows no appreciable decline in model performance or evidence of model overfitting. Sensitivity analyses limited to diagnoses of definite self-harm slightly improved prediction accuracy (especially among primary care patients) but excluded approximately 25% of probable suicide attempts (see Appendix 4 in the online supplement). Sensitivity analyses limited to visits preceded by at least 5 years of complete data yielded essentially identical prediction accuracy (see Appendix 5 in the online supplement). Model fit was consistent across the seven participating health systems and across age and sex subgroups (see Appendix 8 in the online supplement).

FIGURE 1.

FIGURE 1. Receiver Operating Characteristic Curves Illustrating Model Performance in the Validation Data Set for Prediction of Suicide Attempts and Suicide Deaths Within 90 Days of Visit in Seven Health Systems, 2009–2015a

a The area below the training curve and above the validation curve indicates potential overfitting in the training sample.

TABLE 3. Classification Accuracy in Predefined Strata for Prediction of Suicide Attempts and Suicide Deaths Within 90 Days of a Mental Health or Primary Care Visit in Seven Health Systems, 2009–2015a

Risk Score Percentile StrataPredicted Riskb (%)Actual Riskc (%)% of All AttemptsdStandardized Event Ratioe
Suicide attempts
Following a mental health specialty visit
 >99.5th13.012.71020.7
 99th to 99.5th8.58.1612.9
 95th to 99th4.14.2276.7
 90th to 95th1.91.8153.0
 75th to 90th0.90.9211.4
 50th to 75th0.30.3130.51
 <50th0.10.180.16
Following a primary care visit with a mental health diagnosis
 >99.5th8.68.01530.5
 99th to 99.5th4.14.2816.3
 95th to 99th1.61.6256.2
 90th to 95th0.70.7132.6
 75th to 90th0.30.3181.2
 50th to 75th0.10.1120.49
 <50th0.040.0490.17
Suicide deaths
Following a mental health specialty visit
 >99.5th0.6540.6941224.6
 99th to 99.5th0.6380.5951121.5
 95th to 99th0.1620.167256.3
 90th to 95th0.0680.088162.3
 75th to 90th0.0310.029161.1
 50th to 75th0.0140.015130.54
 <50th0.0030.00360.12
Following a primary care visit with a mental health diagnosis
 >99.5th0.5360.4351428.8
 99th to 99.5th0.1810.197713.0
 95th to 99th0.0920.083225.6
 90th to 95th0.0350.038132.5
 75th to 90th0.0180.019191.3
 50th to 75th0.0090.009150.62
 <50th0.0030.003100.19

aPotential overfitting in the training sample is indicated by differences between predicted and actual risks.

bPredicted risk in this stratum using final model predictors and coefficients in the training sample.

cObserved risk in this stratum using final model predictors and coefficients in the validation sample.

dPercentage of all suicide attempts or deaths occurring in this stratum in the validation sample.

eRatio of observed risk in this stratum of the validation sample to average risk in the full validation sample.

TABLE 3. Classification Accuracy in Predefined Strata for Prediction of Suicide Attempts and Suicide Deaths Within 90 Days of a Mental Health or Primary Care Visit in Seven Health Systems, 2009–2015a

Enlarge table

The same process was implemented for prediction of suicide deaths over 90 days, with separate models for mental health specialty and primary care visits. The clinical variables most strongly associated with suicide death in each group are listed in Table 2 (see Appendices 9D and 9E in the online supplement for a complete list). Predictors of suicide death were similar in mental health specialty and primary care patients, and were similar to predictors of suicide attempt.

The right portion of Figure 1 presents ROC curves for prediction of suicide death in the training and validation samples. The c-statistics for prediction of suicide death in the validation samples were 0.861 (95% CI=0.848, 0.875) for mental health specialty visits and 0.833 (95% CI=0.813, 0.853) for primary care visits. Comparison of ROC curves for the training and validation samples shows no evidence of overfitting in the mental health specialty sample and a minimal separation of training and validation curves in the primary care sample. Table 3 compares predicted and observed risk for risk strata selected a priori. Among mental health specialty visits, the lowest two risk strata included 75% of visits and 19% of suicide deaths, and the highest three risk strata included 5% of visits and 48% of suicide deaths. Among primary care visits, the 75% of visits with the lowest risk scores accounted for 25% of suicide deaths, and the 5% of visits with the highest scores accounted for 43%. Comparison of predicted risk levels in the training sample and observed risk levels in the validation sample shows no evidence of overfitting in the primary care sample and a minimal falloff between the training and validation samples in the primary care sample. Sensitivity analyses limited to deaths coded as due to definite self-inflicted injury or poisoning found no meaningful difference in model fit (see Appendix 4 in the online supplement).

Table 4 lists sensitivity, specificity, positive predictive value, and negative predictive value for all four models at cut-points defined by percentiles of the risk score distribution.

TABLE 4. Performance Characteristics at Various Cut-Points for Prediction of Suicide Attempts and Suicide Deaths Within 90 Days of Visit in Seven Health Systems, 2009–2015a

Risk Score Percentile Cut-PointsSensitivity (%)Specificity (%)PPV (%)NPV (%)
Suicide attempts
 Following mental health specialty visits
  >99th16.899.110.499.4
  >95th43.795.25.499.6
  >90th58.390.33.699.7
  >75th79.275.22.099.8
  >50th92.150.01.199.9
 Following primary care visits with a mental health diagnosis
  >99th23.599.16.199.8
  >95th48.295.12.599.9
  >90th61.090.11.699.9
  >75th79.175.10.899.9
  >50th91.450.10.599.9
Suicide deaths
 Following mental health specialty visits
  >99th23.199.00.6299.9
  >95th48.195.00.2699.9
  >90th64.390.00.1799.9
  >75th80.475.10.0899.9
  >50th94.050.00.0599.9
 Following primary care visits with a mental health diagnosis
  >99th20.999.00.3199.9
  >95th43.195.00.1399.9
  >90th55.790.00.0899.9
  >75th74.875.10.0599.9
  >50th90.350.00.0399.9

aPPV=positive predicted value; NPV=negative predictive value.

TABLE 4. Performance Characteristics at Various Cut-Points for Prediction of Suicide Attempts and Suicide Deaths Within 90 Days of Visit in Seven Health Systems, 2009–2015a

Enlarge table

Discussion

In a sample of 20 million visits by 3 million patients in seven health systems, data from electronic health records accurately stratified mental health specialty and primary care visits according to short-term risk of suicide attempt or suicide death. Observed rates of probable suicide attempt and suicide death were over 200 times as high following visits in the highest 1% of predicted risk compared with visits in the bottom half of predicted risk (Table 3). The strongest predictors included mental health diagnoses, substance use diagnoses, use of mental health emergency and inpatient care, and history of self-harm. The absolute risk was lower in primary care, but the predictors selected and the accuracy of prediction were similar across care settings. Responses on the PHQ-9 were selected as important predictors, even though such data were available for only 15% of visits.

Potential Limitations

In interpreting these findings, we should consider both false positive and false negative errors in the ascertainment of probable suicide attempts and deaths. Previous research suggests that false positive rates are near zero for suicide deaths diagnosed by medical examiners (20) and below 20% for diagnoses of definite or possible self-inflicted injury in records from these health systems (7) (see also Appendix 2 in the online supplement). Diagnostic data do not distinguish between self-harm with and without intent to die. Consequently, our definition of probable suicide attempt may include a small proportion of self-harm episodes without suicidal intent. False negative errors may be more common. Up to one-quarter of suicide deaths may not be identified by medical examiners (19). Health system records will not capture suicide attempts when people do not seek care or when providers do not recognize and record diagnoses of self-harm. Nonspecific error (either false positive or false negative) would lead to underestimating the accuracy of prediction models (see Appendix 4 in the online supplement), whereas selective error in the wrong direction (e.g., underascertainment of suicide attempts in patients with low risk scores) could lead to overestimation of model performance.

Health system records do not reflect important social risk factors for suicidal behavior, such as job loss, bereavement, and relationship disruption. Suicidal behavior likely reflects the intersection of clinical risk factors, negative life events, and access to means of self-harm. Data regarding those social risk factors would certainly improve accuracy of prediction.

Our analyses do not consider the one-third to one-half of people who attempt suicide or die by suicide who have no recent mental health treatment or recorded diagnosis (3, 4, 29). Prediction using electronic health record data may also prove useful among patients without recorded mental health diagnoses, but prediction models would necessarily be limited to general medical diagnoses and utilization rather than the mental health diagnoses and treatments selected in this sample.

Methodologic Considerations

We focused on risk over 90 days following an outpatient visit. Risk does vary between visits (30), and near-term risk is most relevant to clinical decisions and quality improvement (31). The interventions that providers or health systems might provide for high-risk patients would typically be delivered over weeks or months (32, 33). Predictors selected in these models (Table 2) include both recent or short-term factors and long-term factors, consistent with previous research (7, 30) indicating that suicidal behavior is influenced by both stable and variable risk factors. Sensitivity analyses using a 30-day outcome window (see Appendix 7 in the online supplement) yielded similar results regarding both predictors selected and accuracy of prediction. Analyses regarding longer-term risk might identify different predictors of suicidal behavior.

Of predictive modeling methods, parametric methods like LASSO lie closest to traditional regression. Nonparametric methods (34) such as random forest could theoretically improve accuracy of prediction. Direct comparisons to date (12, 35), however, have found equal or superior prediction using parametric methods similar to those used here. Nonparametric methods may have little advantage when predictors are dichotomous, such as the diagnosis and utilization indicators included in our models. Parametric models are usually more transparent to clinicians (36) and simpler to implement in electronic health records, as is now under way in these health systems and the Veterans Health Administration (35).

Variable selection models are subject to overfitting or selection of predictive relationships idiosyncratic to a specific sample. The large sample used for training of these models offers some protection against overfitting. In addition, we present explicit comparisons of performance in the training and randomly selected validation samples for all four models (see Table 3 and Figure 1), finding no indication of overfitting in prediction of suicide attempts or prediction of suicide deaths following mental health specialty visits. We do find a slight indication of overfitting in prediction of suicide deaths following primary care visits, likely reflecting the smaller number of events included in these models. Nevertheless, the overall accuracy of prediction (c-statistic) in the independent validation sample exceeds 80%.

In addition to evaluating overfitting within this sample, we should consider generalizability to other care settings or patient populations. This sample included almost 20 million visits in seven health systems serving patients in nine states, including states with high and low rates of suicide mortality. Patients were broadly representative of those service areas in race/ethnicity, socioeconomic status, and source of insurance coverage, including substantial numbers insured by Medicare and Medicaid. Methods could be easily transported to health systems with standard electronic health records and insurance claim databases. Predicted risk levels, however, could be over- or underestimated in settings with higher or lower average risk of suicidal behavior. The predictors selected and the accuracy of prediction could differ in settings with different patterns of mental health care, especially if patterns of diagnosis or utilization were less closely linked to risk of suicidal behavior. The intervention of effective suicide prevention programs could also weaken the relationship between these identified risk predictors and subsequent suicidal behavior. Consequently, we recommend replication in other health systems prior to broad application. All information necessary for replication is available via our online repository.

Context

These empirically derived risk scores outperformed risk stratification based solely on item 9 of the PHQ-9. Regarding sensitivity, selecting mental health visits with any positive response to item 9 would identify only two-thirds of subsequent suicide attempts and deaths (7), whereas selecting visits with risk scores above the 75th percentile would identify 80%. Regarding efficient identification of high risk, selecting the 6% of visits with a response of “more than half the days” or “nearly every day” would identify one-third of subsequent suicide attempts and deaths (7), whereas selecting the 5% of visits with the highest risk scores would identify almost half.

Predictors identified in these models included a range of demographic characteristics, mental health diagnoses, and historical indicators of mental health treatment generally similar to those identified in previous research (9, 12, 13). Based on results in validation samples, performance of these prediction models equaled or exceeded that of other published models using health records to predict suicidal behavior (813), where c-statistics ranged from 0.67 to 0.84. These models significantly outperformed other published models predicting suicidal behavior after an outpatient visit, a question of high interest to a wide range of mental health and primary care providers. In this sample, mental health specialty visits with risk scores in the top 5% accounted for 43% of suicide attempts and 48% of suicide deaths in the following 90 days, and primary care visits in the top 5% accounted for 48% of subsequent suicide attempts and 43% of subsequent suicide deaths. For comparison, in two previous models predicting suicidal behavior following outpatient visits (12, 13), the top 5% of patients accounted for between one-quarter and one-third of subsequent suicide attempts and deaths. This improved prediction likely reflects differences in data and methods. First, longitudinal records in integrated health systems may allow more complete ascertainment of risk factors. Second, our analyses consider a larger number of potential predictors and more detailed temporal encoding. Third, responses to PHQ-9 item 9 contributed to prediction, even though such data were available for only 10%−20% of visits. Prediction accuracy would likely improve with greater use of the PHQ-9 or similar measures, as is expected with new initiatives promoting routine outcome assessment (37) and identification of suicidal ideation (5).

The c-statistics for these suicide prediction models also exceed those for models using health record data to predict rehospitalization for heart failure (38), in-hospital mortality from sepsis (39), and high emergency department utilization (40). Suicidal behavior may be more predictable than many adverse medical outcomes.

Among mental health specialty visits, a cut-point at the 95th percentile of risk had a positive predictive value of 5.4% for suicide attempt within 90 days. While that predictive value would be inadequate for a diagnostic test, it is similar or superior to widely accepted tools for prediction of major medical outcomes such as stroke in atrial fibrillation (41) and cardiovascular events (42). Furthermore, predictive values or expected event rates for widely accepted medical prediction tools often include adverse outcomes accumulated over many years (41, 42), rather than the 90-day risk period considered in these analyses.

Clinical Implications

Some recent discussions of predictive modeling in health care warn that reliance on algorithms could lead to inappropriate causal inference (4345) or atrophy of clinician judgment (43). Regarding the first point, associations identified by our model should certainly not be interpreted as evidence for independent or causal relationships. For example, a recent benzodiazepine prescription is more likely a marker of increased risk than a cause of suicidal behavior. We report predictors selected (Table 2) to demonstrate that all are expected correlates of suicidal behavior, albeit in specific combinations within specific time periods. Regarding the second point, our model and other models predicting suicidal behavior from records data rely largely on the diagnostic and treatment decisions of treating clinicians. The predictors identified by our analyses would be well known to most mental health providers. Predictive models simply allow us to consistently combine millions of providers’ individual judgments to accurately predict an important but rare event (45).

Prediction models cannot replace clinical judgment, but risk scores can certainly inform both individual clinical decisions and quality improvement programs. Participating health systems now recommend completion of a structured suicide risk assessment (46) after any response of “more than half the days” or “nearly every day” to PHQ-9 item 9—implying a 90-day risk of suicide attempt of 2%−3% (7). A predicted 90-day risk exceeding 5% (i.e., above the 95th percentile for mental health specialty visits) would seem to warrant a similar level of additional assessment. A predicted 90-day suicide attempt risk exceeding 10% (i.e., above the 99th percentile for mental health specialty visits) should warrant creation of a personal safety plan and counseling regarding reducing access to means of self-harm (47, 48). Accurate risk stratification can also inform providers’ and health systems’ decisions regarding frequency of follow-up, referral for intensive treatment, or outreach following missed or canceled appointments (31, 49). Implementing these risk-based care pathways and outreach programs is a central goal of the Zero Suicide prevention model recommended by the U.S. National Action Alliance for Suicide Prevention (48). Empirically derived risk predictions can be an important component of that national suicide prevention strategy.

From the Kaiser Permanente Washington Health Research Institute, Seattle; the Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena; the HealthPartners Institute, Minneapolis; the Center for Health Services Research, Henry Ford Health System, Detroit; the Center for Health Research, Kaiser Permanente Northwest, Portland, Oreg.; the Institute for Health Research, Kaiser Permanente Colorado, Denver; and the Center for Health Research, Kaiser Permanente Hawaii, Honolulu.
Address correspondence to Dr. Simon ().

Supported by cooperative agreement U19 MH092201 with NIMH.

Dr. Simon, Mr. Johnson, Dr. Lawrence, Dr. Lynch, Dr. Beck, Dr. Waitzfelder, Ms. Ziebell, Dr. Penfold, and Dr. Shortreed are employees of Kaiser Permanente. Dr. Simon has received research grants from Otsuka and Novartis. Dr. Penfold has received research funding from Janssen. Dr. Shortreed has worked on grant projects awarded to Kaiser Permanente Washington Health Research Institute (KPWHRI) Institute by Pfizer and is a co-investigator on grant projects awarded to KPWHRI from Syneos Health, which is representing a consortium of pharmaceutical companies carrying out FDA-mandated studies regarding the safety of extended-release opioids. The other authors report no financial relationships with commercial interests.

References

1 Kochanek KD, Murphy SL, Xu JQ, et al.: NCHS Data Brief: Mortality in the United States, 2016. Hyattsville, Md, National Center for Health Statistics, 2017Google Scholar

2 Centers for Disease Control and Prevention: Web-Based Injury Statistics Query and Reporting System (WISQARS), Nonfatal Injury Reports, 2000–2014. https://webappa.cdc.gov/sasweb/ncipc/nfirates.htmlGoogle Scholar

3 Ahmedani BK, Simon GE, Stewart C, et al.: Health care contacts in the year before suicide death. J Gen Intern Med 2014; 29:870–877Crossref, MedlineGoogle Scholar

4 Ahmedani BK, Stewart C, Simon GE, et al.: Racial/ethnic differences in health care visits made before suicide attempt across the United States. Med Care 2015; 53:430–435Crossref, MedlineGoogle Scholar

5 Patient Safety Advisory Group: Detecting and treating suicidal ideation in all settings. Chicago, Joint Commission Sentinel Event Alerts, 2016, issue 56 (https://www.jointcommission.org/assets/1/18/SEA_56_Suicide.pdf)Google Scholar

6 Franklin JC, Ribeiro JD, Fox KR, et al.: Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull 2017; 143:187–232Crossref, MedlineGoogle Scholar

7 Simon GE, Coleman KJ, Rossom RC, et al.: Risk of suicide attempt and suicide death following completion of the Patient Health Questionnaire depression module in community practice. J Clin Psychiatry 2016; 77:221–227Crossref, MedlineGoogle Scholar

8 McCarthy JF, Bossarte RM, Katz IR, et al.: Predictive modeling and concentration of the risk of suicide: implications for preventive interventions in the US Department of Veterans Affairs. Am J Public Health 2015; 105:1935–1942Crossref, MedlineGoogle Scholar

9 Kessler RC, Warner CH, Ivany C, et al.: Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). JAMA Psychiatry 2015; 72:49–57Crossref, MedlineGoogle Scholar

10 Walsh CG, Ribeiro JD, Franklin JC: Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci 2017; 5:457–469CrossrefGoogle Scholar

11 McCoy TH Jr, Castro VM, Roberson AM, et al.: Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry 2016; 73:1064–1071Crossref, MedlineGoogle Scholar

12 Kessler RC, Stein MB, Petukhova MV, et al.: Predicting suicides after outpatient mental health visits in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Mol Psychiatry 2017; 22:544–551Crossref, MedlineGoogle Scholar

13 Barak-Corren Y, Castro VM, Javitt S, et al.: Predicting suicidal behavior from longitudinal electronic health records. Am J Psychiatry 2017; 174:154–162LinkGoogle Scholar

14 Ross TR, Ng D, Brown JS, et al.: The HMO Research Network Virtual Data Warehouse: a public data model to support collaboration. EGEMS (Wash DC) 2014; 2:1049MedlineGoogle Scholar

15 Charlson M, Szatrowski TP, Peterson J, et al.: Validation of a combined comorbidity index. J Clin Epidemiol 1994; 47:1245–1251Crossref, MedlineGoogle Scholar

16 Kroenke K, Spitzer RL, Williams JB, et al.: The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. Gen Hosp Psychiatry 2010; 32:345–359Crossref, MedlineGoogle Scholar

17 Lu CY, Stewart C, Ahmed AT, et al.: How complete are E-codes in commercial plan claims databases? Pharmacoepidemiol Drug Saf 2014; 23:218–220Crossref, MedlineGoogle Scholar

18 Stewart C, Crawford PM, Simon GE: Changes in coding of suicide attempts or self-harm with transition from ICD-9 to ICD-10. Psychiatr Serv 2017; 68:215LinkGoogle Scholar

19 Bakst SS, Braun T, Zucker I, et al.: The accuracy of suicide statistics: are true suicide deaths misclassified? Soc Psychiatry Psychiatr Epidemiol 2016; 51:115–123Crossref, MedlineGoogle Scholar

20 Cox KL, Nock MK, Biggs QM, et al.: An examination of potential misclassification of army suicides: results from the Army Study to Assess Risk and Resilience in Servicemembers. Suicide Life Threat Behav 2017; 47:257–265Crossref, MedlineGoogle Scholar

21 Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc B 1996; 58:267–288Google Scholar

22 Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning, 2nd ed. New York, Springer, 2009CrossrefGoogle Scholar

23 Kass RE, Raftery AE: Bayes factors. J Am Stat Assoc 1995; 90:773–795CrossrefGoogle Scholar

24 Egan JP: Signal Detection Theory and ROC Analysis. New York, Springer Academic Press, 1975Google Scholar

25 Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29–36Crossref, MedlineGoogle Scholar

26 Bradley AP: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 1997; 30:1145–1159CrossrefGoogle Scholar

27 Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33:1–22Crossref, MedlineGoogle Scholar

28 Weston S: Foreach looping construct for R, R Package, version 1.4.3, 2015Google Scholar

29 Han B, Compton WM, Gfroerer J, et al.: Mental health treatment patterns among adults with recent suicide attempts in the United States. Am J Public Health 2014; 104:2359–2368Crossref, MedlineGoogle Scholar

30 Simon GE, Shortreed SM, Johnson E, et al.: Between-visit changes in suicidal ideation and risk of subsequent suicide attempt. Depress Anxiety 2017; 34:794–800Crossref, MedlineGoogle Scholar

31 Olfson M, Marcus SC, Bridge JA: Focusing suicide prevention on periods of high risk. JAMA 2014; 311:1107–1108Crossref, MedlineGoogle Scholar

32 Brown GK, Ten Have T, Henriques GR, et al.: Cognitive therapy for the prevention of suicide attempts: a randomized controlled trial. JAMA 2005; 294:563–570Crossref, MedlineGoogle Scholar

33 Comtois KA, Linehan MM: Psychosocial treatments of suicidal behaviors: a practice-friendly review. J Clin Psychol 2006; 62:161–170Crossref, MedlineGoogle Scholar

34 Dreiseitl S, Ohno-Machado L: Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002; 35:352–359Crossref, MedlineGoogle Scholar

35 Kessler RC, Hwang I, Hoffmire CA, et al.: Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans Health Administration. Int J Methods Psychiatr Res 2017; 26:26CrossrefGoogle Scholar

36 Adkins DE: Machine learning and electronic health records: a paradigm shift. Am J Psychiatry 2017; 174:93–94LinkGoogle Scholar

37 HEDIS Depression Measures Specified for Electronic Clinical Data Systems. http://www.ncqa.org/HEDISQualityMeasurement/HEDISLearningCollaborative/HEDISDepressionMeasures.aspxGoogle Scholar

38 Frizzell JD, Liang L, Schulte PJ, et al.: Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol 2017; 2:204–209Crossref, MedlineGoogle Scholar

39 Taylor RA, Pare JR, Venkatesh AK, et al.: Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data–driven, machine learning approach. Acad Emerg Med 2016; 23:269–278Crossref, MedlineGoogle Scholar

40 Frost DW, Vembu S, Wang J, et al.: Using the electronic medical record to identify patients at high risk for frequent emergency department visits and high system costs. Am J Med 2017; 130:601.e17–601.e22CrossrefGoogle Scholar

41 Lip GY: Can we predict stroke in atrial fibrillation? Clin Cardiol 2012; 35(suppl 1):21–27Crossref, MedlineGoogle Scholar

42 Rana JS, Tabada GH, Solomon MD, et al.: Accuracy of the atherosclerotic cardiovascular risk equation in a large contemporary, multiethnic population. J Am Coll Cardiol 2016; 67:2118–2130Crossref, MedlineGoogle Scholar

43 Cabitza F, Rasoini R, Gensini GF: Unintended consequences of machine learning in medicine. JAMA 2017; 318:517–518Crossref, MedlineGoogle Scholar

44 Chen JH, Asch SM: Machine learning and prediction in medicine: beyond the peak of inflated expectations. N Engl J Med 2017; 376:2507–2509Crossref, MedlineGoogle Scholar

45 Obermeyer Z, Emanuel EJ: Predicting the future: big data, machine learning, and clinical medicine. N Engl J Med 2016; 375:1216–1219Crossref, MedlineGoogle Scholar

46 Posner K, Brown GK, Stanley B, et al.: The Columbia-Suicide Severity Rating Scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am J Psychiatry 2011; 168:1266–1277LinkGoogle Scholar

47 Rossom RC, Simon GE, Beck A, et al.: Facilitating action for suicide prevention by learning health care systems. Psychiatr Serv 2016; 67:830–832LinkGoogle Scholar

48 Hogan MF, Grumet JG: Suicide prevention: an emerging priority for health care. Health Aff (Millwood) 2016; 35:1084–1090Crossref, MedlineGoogle Scholar

49 Miller IW, Camargo CA, Jr., Arias SA, et al. Suicide prevention in an emergency department population: the ED-SAFE study. JAMA Psychiatry 2017; 74:563–570Crossref, MedlineGoogle Scholar