ArticlesFull Access

Predicting Suicide Attempts and Suicide Deaths Following Outpatient Visits Using Electronic Health Records

Published Online:24 May 2018https://doi.org/10.1176/appi.ajp.2018.17101167

Abstract

Objective:

The authors sought to develop and validate models using electronic health records to predict suicide attempt and suicide death following an outpatient visit.

Method:

Across seven health systems, 2,960,929 patients age 13 or older (mean age, 46 years; 62% female) made 10,275,853 specialty mental health visits and 9,685,206 primary care visits with mental health diagnoses between Jan. 1, 2009, and June 30, 2015. Health system records and state death certificate data identified suicide attempts (N=24,133) and suicide deaths (N=1,240) over 90 days following each visit. Potential predictors included 313 demographic and clinical characteristics extracted from records for up to 5 years before each visit: prior suicide attempts, mental health and substance use diagnoses, medical diagnoses, psychiatric medications dispensed, inpatient or emergency department care, and routinely administered depression questionnaires. Logistic regression models predicting suicide attempt and death were developed using penalized LASSO (least absolute shrinkage and selection operator) variable selection in a random sample of 65% of the visits and validated in the remaining 35%.

Results:

Mental health specialty visits with risk scores in the top 5% accounted for 43% of subsequent suicide attempts and 48% of suicide deaths. Of patients scoring in the top 5%, 5.4% attempted suicide and 0.26% died by suicide within 90 days. C-statistics (equivalent to area under the curve) for prediction of suicide attempt and suicide death were 0.851 (95% CI=0.848, 0.853) and 0.861 (95% CI=0.848, 0.875), respectively. Primary care visits with scores in the top 5% accounted for 48% of subsequent suicide attempts and 43% of suicide deaths. C-statistics for prediction of suicide attempt and suicide death were 0.853 (95% CI=0.849, 0.857) and 0.833 (95% CI=0.813, 0.853), respectively.

Conclusions:

Prediction models incorporating both health record data and responses to self-report questionnaires substantially outperform existing suicide risk prediction tools.

Suicide accounted for almost 45,000 deaths in the United States in 2016, a 25% increase since 2000 (1). Nonfatal suicide attempts account for almost 500,000 emergency department visits annually (2). Half of people who die by suicide and two-thirds of people who survive suicide attempts received some mental health diagnosis or treatment during the previous year (3, 4). Mindful of those prevention opportunities, a Joint Commission Sentinel Event Alert issued in 2016 recommends detection of suicide risk across health care (5). Unfortunately, traditional clinical detection of suicide risk is hardly better than chance (6).

We previously reported (7) that brief depression questionnaires can accurately predict suicide attempt or death. Outpatients who report having thoughts of death or self-harm “nearly every day” on item 9 of the Patient Health Questionnaire (PHQ-9) are seven times as likely to attempt suicide and six times as likely to die by suicide over the following 90 days compared with patients who report having such thoughts “not at all” (7). The sensitivity of this tool, however, is only moderate. One-third of suicide attempts and deaths occur among patients reporting having no suicidal ideation at all. Accurate identification of high risk is also only moderate. The 6% of patients who report suicidal ideation “more than half the days” or “nearly every day” account for only 35% of suicide attempts and deaths. More accurate tools are needed for identifying both low- and high-risk patients.

Recent research has used various modeling methods to predict suicidal behavior from electronic health records. Examples include prediction of suicide death among Veterans Health Administration service users (8), prediction of suicide death following psychiatric hospitalization among U.S. Army soldiers (9), distinguishing patients attempting suicide from those with other injuries or poisonings (10), and prediction of suicide or accidental death following civilian general hospital discharge (11). Two recent analyses have used health record data to predict suicide attempt or suicide death following outpatient visits. Kessler and colleagues (12) used health records and military service records to predict suicide death among U.S. Army soldiers in the 26 weeks following a mental health visit. Approximately one-quarter of suicide deaths occurred after the 5% of visits rated as highest risk. Barak-Corren and colleagues (13) used health record data to predict suicide attempt or death among outpatients making three or more visits in two large academic health systems. One-third of suicide attempts and deaths occurred in the 5% of patients with highest risk scores.

In this study, we combined data typically available from electronic health records with depression questionnaire data in seven large health systems to develop and validate models predicting suicide attempt and suicide death over the 90 days following a mental health or primary care visit.

Method

The seven health systems that participated in this research (HealthPartners; Henry Ford Health System; and the Colorado, Hawaii, Northwest, Southern California, and Washington regions of Kaiser Permanente) serve a combined population of about 8 million members in nine states. Each system provides insurance coverage and comprehensive health care (including general medical and specialty mental health care) to a defined population enrolled through employer-sponsored insurance, individual insurance, capitated Medicaid or Medicare, and subsidized low-income programs. Members are representative of each system’s service area in age, race/ethnicity, and socioeconomic status. All systems recommend using the PHQ-9 at mental health visits and primary care visits for depression, but implementation varied across systems during the study period.

As members of the Mental Health Research Network, each health system maintains a research data warehouse following the Health Care Systems Research Network’s Virtual Data Warehouse model (14). This resource combines data from insurance enrollment records, electronic health records, insurance claims, pharmacy dispensings, state mortality records, and census-derived neighborhood characteristics. Responsible institutional review boards for each health system approved use of these de-identified data for this research.

The study sample included any outpatient visit by a member age 13 or older either to a specialty mental health clinic or to a primary care clinic when a mental health diagnosis was recorded. Sampling was limited to visits to health system clinics (to ensure availability of electronic health record data) and people insured by the health system’s insurance plan (to ensure availability of insurance claims data). All qualifying visits from Jan. 1, 2009, through June 30, 2015, were included, except at the Henry Ford Health System, where only visits after implementation of a new electronic health record system on Dec. 1, 2012, were included.

Potential predictors extracted from health system records for up to 5 years before each visit included demographic characteristics (age, sex, race, ethnicity, source of insurance, and neighborhood income and educational attainment), current and past mental health and substance use diagnoses (organized in 12 categories), past suicide attempts, other past injury or poisoning diagnoses, dispensed prescriptions for mental health medication (organized in four categories), past inpatient or emergency department mental health care, general medical diagnoses (by Charlson Comorbidity Index [15] categories), and recorded scores on the PHQ-9 (16) (including total score and item 9 score).

Potential predictors were represented as dichotomous indicators. Each diagnosis category was represented by three overlapping indicators (recorded at or within 90 days before the visit, recorded within 1 year before, and recorded within 5 years before). Each category of medication or of emergency or inpatient utilization was represented by three overlapping indicators (occurred within 90 days before the visit, 1 year before, or any time before). To represent temporal patterns of prior PHQ-9 item 9 scores, 24 indicators were calculated for each encounter to represent number of observations, maximum value, and modal value (including value of missing) during three overlapping time periods (previous 90 days, previous 183 days, and previous 365 days). The final set of potential predictors for each encounter included 149 indicators and 164 possible interactions (see Appendix 9A in the online supplement for a complete list).

Diagnoses of self-harm or probable suicide attempt were ascertained from all injury or poisoning diagnoses recorded in electronic health records and insurance claims accompanied by an ICD-9 cause of injury code indicating intentional self-harm (codes E950–E958) or undetermined intent (codes E980–E989). Data from these health systems during the study period indicate that inclusion of injuries and poisonings with undetermined intent increases ascertainment of probable suicide attempts by approximately 25% (7) (see also Appendix 4 in the online supplement). Although use of E-codes varied across the United States during the study period (17), participating health systems were selected for high and consistent rates of E-code use (see Appendix 1 in the online supplement). Record review (7) also supports the positive predictive value of this definition for identification of true self-harm in these health systems (see also Appendix 2 in the online supplement). Furthermore, observation of coding changes across the transition from ICD-9 to the more specific ICD-10 coding scheme indicates that most “undetermined” ICD-9 diagnoses actually reflect self-harm (18) (see also Appendix 3 in the online supplement). Ascertainment of suicide attempts was censored at health system disenrollment, after which insurance claims data regarding self-harm diagnoses at external facilities would not be available.

Suicide deaths were ascertained from state mortality records. Following common recommendations (19, 20), all deaths with an ICD-10 diagnosis of self-inflicted injury (codes X60–X84) or injury/poisoning with undetermined intent (codes Y10–Y34) were considered probable suicide deaths. Inclusion of injury and poisoning deaths with undetermined intent increases ascertainment of probable suicide deaths by 5%−10% (7) (see also Appendix 4 in the online supplement).

All predictor and outcome variables were completely specified and calculated prior to model training.

Prediction models were developed separately for mental health specialty and primary care visits, with a random sample of 65% of each used for model training and 35% set aside for validation. Models included multiple visits per person in order to accurately represent changes in risk within patients over time. For each visit, analyses considered any outcome in the following 90 days, regardless of a subsequent visit in between. This approach uses all data available at the time of the index visit but avoids informative or biased censoring related to timing of visits following the index date. In the initial variable selection step, separate models predicting risk of suicide attempt and suicide death were estimated using logistic regression with penalized LASSO (least absolute shrinkage and selection operator) variable selection (21). The LASSO penalization factor selects important predictors by shrinking coefficients for weaker predictors toward zero, excluding predictors with estimated zero coefficients from the final sparse prediction model. To avoid overfitting models to idiosyncratic relationships in the training samples, variable selection used 10-fold cross-validation (22) to select the optimal level of tuning or penalization, measured by the Bayesian information criterion (23). In the second calibration step, generalized estimating equations with a logistic link reestimated coefficients in the training sample, accounting for both clustering of visits under patients and bias toward the null in LASSO coefficients. In the final validation step, logistic models derived from the above two-step process were applied in the 35% validation sample to calculate predicted probabilities for each visit. Results are reported as receiver operating characteristic (ROC) curves (24) with c-statistics (equivalent to area under the ROC curve) (25, 26), along with predicted and observed rates in prespecified strata of predicted probability. Overfitting was evaluated by comparing classification performance in training and validation samples and by comparing predicted risk and observed risk in the validation sample. Variable selection analyses were conducted using the GLMNET (27) and Foreach (28) packages for the R statistical package, version 3.4.0. Confidence intervals for c-statistics were calculated via bootstrap with 10,000 replications.

A public repository (www.github.com/MHResearchNetwork) includes specifications and code for defining predictor and outcome variables, a data dictionary and descriptive statistics for analytic data sets, code for variable selection and calibration steps, coefficients and confidence limits from all final models, and comparison of model performance in training and validation samples.

Results

We identified 19,961,059 eligible visits by 2,960,929 patients during the study period, including 10,275,853 mental health specialty visits and 9,685,206 primary care visits with mental health diagnoses (Table 1). Following the specifications above, health system records identified 24,133 unique probable suicide attempts within 90 days of an eligible visit, and state mortality records identified 1,240 unique suicide deaths within 90 days.

TABLE 1. Characteristics of Sampled Visits to Specialty Mental Health and Primary Care Providers in Seven Health Systems (2009–2015), Randomly Divided Into Model Training (65%) and Validation (35%) Samples

	Mental Health Specialty				Primary Care
	Training Sample		Validation Sample		Training Sample		Validation Sample
Characteristic	N	%	N	%	N	%	N	%
Visits	6,679,128		3,596,725		6,297,465		3,387,741
Female	4,157,997	62	2,239,213	62	3,872,830	61	2,083,424	61
Age group (years)
13–17	671,313	10	360,619	10	250,878	4	135,070	4
18–29	1,118,492	17	603,044	17	822,668	13	442,774	13
30–44	1,744,704	26	939,431	26	1,337,686	21	720,878	21
45–64	2,453,509	37	1,321,986	37	2,466,992	39	1,326,237	39
65 or older	691,110	10	371,645	10	1,419,241	23	762,782	23
Race
White	4,562,203	68	2,455,211	68	4,162,033	66	2,237,952	66
Asian	302,231	5	162,400	5	379,910	6	204,272	6
Black	600,219	9	324,233	9	514,021	8	276,260	8
Hawaiian/Pacific Islander	74,473	1	40,118	1	103,420	2	55,833	2
Native American	65,309	1	35,332	1	69,425	1	37,717	1
More than one or other	38,223	1	20,485	1	43,445	1	23,391	1
Not recorded	1,036,470	16	558,946	16	1,025,211	16	552,316	16
Hispanic ethnicity	1,486,400	22	800,547	22	1,430,611	23	769,498	23
Insurance Type
Commercial group	5,057,328	76	2,724,286	76	4,198,138	67	2,258,974	67
Individual	827,218	12	445,749	12	1,079,401	17	580,225	17
Medicare	363,598	5	194,773	5	576,184	9	310,001	9
Medicaid	213,573	3	114,767	3	297,710	5	160,063	5
Other	217,411	3	117,150	3	146,032	2	78,478	2
Patient Health Questionnaire item 9 score recorded at
Index visit	657,998	10	354,918	10	312,065	5	168,569	5
Any visit in past year	1,328,571	20	714,693	20	671,643	11	362,438	11
Length of enrollment prior to visit
1 year or more	5,810,841	87	3,129,151	87	5,352,845	85	2,879,580	85
5 years or more	3,772,409	56	2,031,916	56	3,542,358	56	1,907,063	56
Visits followed by
Suicide attempt within 90 days	41,470	0.62	22,329	0.62	16,302	0.26	8,688	0.26
Suicide death within 90 days	1,529	0.02	854	0.02	856	0.01	445	0.01

Enlarge table

Models predicting probable suicide attempt over 90 days were developed and validated for both mental health and primary care visits, excluding 0.3% of visits because of disenrollment within 90 days. Clinical variables with the largest positive prediction coefficients are listed in Table 2 (see Appendices 9B and 9C in the online supplement for all selected predictors and coefficients). The strongest predictors of suicide attempt were similar in mental health specialty and primary care patients: prior suicide attempt, mental health and substance use diagnoses, responses to PHQ-9 item 9, and prior inpatient or emergency mental health care.

TABLE 2. Clinical Characteristics Selected for Prediction of Suicide Attempt and Suicide Death Within 90 Days of Visit in Seven Health Systems (2009–2015), Listed in Order of Coefficients in Logistic Regression Models^a

Suicide Attempt or Death, by Care Setting
Suicide attempt following:
Mental health specialty visit (of 94 predictors selected)	Primary care visit (of 102 predictors selected)
Depression diagnosis in past 5 years	Depression diagnosis in past 5 years
Drug abuse diagnosis in past 5 years	Suicide attempt diagnosis in past 5 years
PHQ-9 item 9 score=3 in past year	Drug abuse diagnosis in past 5 years
Alcohol use disorder diagnosis in past 5 years	Alcohol abuse diagnosis in past 5 years
Mental health inpatient stay in past year	PHQ-9 item 9 score=3 in past year
Benzodiazepine prescription in past 3 months	Suicide attempt diagnosis in past 3 months
Suicide attempt in past 3 months	Suicide attempt diagnosis in past year
Personality disorder diagnosis in past 5 years	Personality disorder diagnosis in past 5 years
Eating disorder diagnosis in past 5 years	Anxiety disorder diagnosis in past 5 years
Suicide attempt in past year	Suicide attempt diagnosis in past 5 years with schizophrenia diagnosis in past 5 years
Mental health emergency department visit in past 3 months	Benzodiazepine prescription in past 3 months
Self-inflicted cutting/piercing in past year	Eating disorder diagnosis in past 5 years
Suicide attempt in past 5 years	Mental health emergency department visit in past 3 months
Injury/poisoning diagnosis in past 3 months	Injury/poisoning diagnosis in past year
Antidepressant prescription in past 3 months	Mental health emergency department visit in past year
Suicide death following:
Mental health specialty visit (of 43 predictors selected)	Primary care visit (of 29 predictors selected)
Suicide attempt diagnosis in past year	Mental health emergency department visit in past 3 months
Benzodiazepine prescription in past 3 months	Alcohol abuse diagnosis in past 5 years
Mental health emergency department visit in past 3 months	Benzodiazepine prescription in past 3 months
Second-generation antipsychotic prescription in past 5 years	Depression diagnosis in past 5 years
Mental health inpatient stay in past 5 years	Mental health inpatient stay in past year
Mental health inpatient stay in past 3 months	Injury/poisoning diagnosis in past year
Mental health inpatient stay in past year	Anxiety disorder diagnosis in past 5 years
Alcohol use disorder diagnosis in past 5 years	PHQ-9 item 9 score=1 with PHQ-8 score
Antidepressant prescription in past 3 months	PHQ-9 item 9 score=3 with age
PHQ-9 item 9 score=3 with PHQ-8 score	Suicide attempt diagnosis in past 5 years with age
PHQ-9 item 9 score=1 with age	Mental health emergency department visit in past year
Depression diagnosis in past 5 years with age	PHQ-9 item 9 score=2 with age
Suicide attempt diagnosis in past 5 years with Charlson score	PHQ-9 item 9 score=3 with PHQ-8 score
PHQ-9 item 9 score=2 with age	Bipolar disorder diagnosis in past 5 years with age
Anxiety disorder diagnosis in past 5 years with age	Depression diagnosis in past 5 years with age

^aInteraction terms are indicated by “with”; see Appendices 9B–9E in the online supplement for a complete list. PHQ-9=Patient Health Questionnaire; PHQ-8=Patient Health Questionnaire depression scale.

Enlarge table

The left portion of Figure 1 presents ROC curves illustrating the sensitivity and specificity of suicide attempt predictions in the training and validation samples. The c-statistics (equivalent to area under the ROC curve) for prediction of suicide attempt in the validation samples were 0.851 (95% CI=0.848, 0.853) for mental health specialty visits and 0.853 (95% CI=0.849, 0.857) for primary care visits. In each graph, comparison of ROC curves shows no appreciable difference in prediction accuracy between the training and validation samples (i.e., no evidence of model overfitting). Table 3 compares predicted and observed risk for specific strata selected a priori. Among mental health specialty visits, the lowest two strata included 75% of all visits and 21% of all suicide attempts, and the highest three strata included 5% of visits and 43% of suicide attempts. Among primary care visits, the 75% of visits with the lowest risk scores accounted for 21% of suicide attempts, and the 5% of visits with the highest scores accounted for 48%. Comparison of predicted risk levels in the training sample and observed risk levels in the validation sample again shows no appreciable decline in model performance or evidence of model overfitting. Sensitivity analyses limited to diagnoses of definite self-harm slightly improved prediction accuracy (especially among primary care patients) but excluded approximately 25% of probable suicide attempts (see Appendix 4 in the online supplement). Sensitivity analyses limited to visits preceded by at least 5 years of complete data yielded essentially identical prediction accuracy (see Appendix 5 in the online supplement). Model fit was consistent across the seven participating health systems and across age and sex subgroups (see Appendix 8 in the online supplement).

TABLE 3. Classification Accuracy in Predefined Strata for Prediction of Suicide Attempts and Suicide Deaths Within 90 Days of a Mental Health or Primary Care Visit in Seven Health Systems, 2009–2015^a

Risk Score Percentile Strata	Predicted Risk^b (%)	Actual Risk^c (%)	% of All Attempts^d	Standardized Event Ratio^e
Suicide attempts
Following a mental health specialty visit
>99.5th	13.0	12.7	10	20.7
99th to 99.5th	8.5	8.1	6	12.9
95th to 99th	4.1	4.2	27	6.7
90th to 95th	1.9	1.8	15	3.0
75th to 90th	0.9	0.9	21	1.4
50th to 75th	0.3	0.3	13	0.51
<50th	0.1	0.1	8	0.16
Following a primary care visit with a mental health diagnosis
>99.5th	8.6	8.0	15	30.5
99th to 99.5th	4.1	4.2	8	16.3
95th to 99th	1.6	1.6	25	6.2
90th to 95th	0.7	0.7	13	2.6
75th to 90th	0.3	0.3	18	1.2
50th to 75th	0.1	0.1	12	0.49
<50th	0.04	0.04	9	0.17
Suicide deaths
Following a mental health specialty visit
>99.5th	0.654	0.694	12	24.6
99th to 99.5th	0.638	0.595	11	21.5
95th to 99th	0.162	0.167	25	6.3
90th to 95th	0.068	0.088	16	2.3
75th to 90th	0.031	0.029	16	1.1
50th to 75th	0.014	0.015	13	0.54
<50th	0.003	0.003	6	0.12
Following a primary care visit with a mental health diagnosis
>99.5th	0.536	0.435	14	28.8
99th to 99.5th	0.181	0.197	7	13.0
95th to 99th	0.092	0.083	22	5.6
90th to 95th	0.035	0.038	13	2.5
75th to 90th	0.018	0.019	19	1.3
50th to 75th	0.009	0.009	15	0.62
<50th	0.003	0.003	10	0.19

^aPotential overfitting in the training sample is indicated by differences between predicted and actual risks.

^bPredicted risk in this stratum using final model predictors and coefficients in the training sample.

^cObserved risk in this stratum using final model predictors and coefficients in the validation sample.

^dPercentage of all suicide attempts or deaths occurring in this stratum in the validation sample.

^eRatio of observed risk in this stratum of the validation sample to average risk in the full validation sample.

Enlarge table

The same process was implemented for prediction of suicide deaths over 90 days, with separate models for mental health specialty and primary care visits. The clinical variables most strongly associated with suicide death in each group are listed in Table 2 (see Appendices 9D and 9E in the online supplement for a complete list). Predictors of suicide death were similar in mental health specialty and primary care patients, and were similar to predictors of suicide attempt.

The right portion of Figure 1 presents ROC curves for prediction of suicide death in the training and validation samples. The c-statistics for prediction of suicide death in the validation samples were 0.861 (95% CI=0.848, 0.875) for mental health specialty visits and 0.833 (95% CI=0.813, 0.853) for primary care visits. Comparison of ROC curves for the training and validation samples shows no evidence of overfitting in the mental health specialty sample and a minimal separation of training and validation curves in the primary care sample. Table 3 compares predicted and observed risk for risk strata selected a priori. Among mental health specialty visits, the lowest two risk strata included 75% of visits and 19% of suicide deaths, and the highest three risk strata included 5% of visits and 48% of suicide deaths. Among primary care visits, the 75% of visits with the lowest risk scores accounted for 25% of suicide deaths, and the 5% of visits with the highest scores accounted for 43%. Comparison of predicted risk levels in the training sample and observed risk levels in the validation sample shows no evidence of overfitting in the primary care sample and a minimal falloff between the training and validation samples in the primary care sample. Sensitivity analyses limited to deaths coded as due to definite self-inflicted injury or poisoning found no meaningful difference in model fit (see Appendix 4 in the online supplement).

Table 4 lists sensitivity, specificity, positive predictive value, and negative predictive value for all four models at cut-points defined by percentiles of the risk score distribution.

TABLE 4. Performance Characteristics at Various Cut-Points for Prediction of Suicide Attempts and Suicide Deaths Within 90 Days of Visit in Seven Health Systems, 2009–2015^a

Risk Score Percentile Cut-Points	Sensitivity (%)	Specificity (%)	PPV (%)	NPV (%)
Suicide attempts
Following mental health specialty visits
>99th	16.8	99.1	10.4	99.4
>95th	43.7	95.2	5.4	99.6
>90th	58.3	90.3	3.6	99.7
>75th	79.2	75.2	2.0	99.8
>50th	92.1	50.0	1.1	99.9
Following primary care visits with a mental health diagnosis
>99th	23.5	99.1	6.1	99.8
>95th	48.2	95.1	2.5	99.9
>90th	61.0	90.1	1.6	99.9
>75th	79.1	75.1	0.8	99.9
>50th	91.4	50.1	0.5	99.9
Suicide deaths
Following mental health specialty visits
>99th	23.1	99.0	0.62	99.9
>95th	48.1	95.0	0.26	99.9
>90th	64.3	90.0	0.17	99.9
>75th	80.4	75.1	0.08	99.9
>50th	94.0	50.0	0.05	99.9
Following primary care visits with a mental health diagnosis
>99th	20.9	99.0	0.31	99.9
>95th	43.1	95.0	0.13	99.9
>90th	55.7	90.0	0.08	99.9
>75th	74.8	75.1	0.05	99.9
>50th	90.3	50.0	0.03	99.9

^aPPV=positive predicted value; NPV=negative predictive value.

TABLE 4. Performance Characteristics at Various Cut-Points for Prediction of Suicide Attempts and Suicide Deaths Within 90 Days of Visit in Seven Health Systems, 2009–2015^a

Enlarge table

Discussion

In a sample of 20 million visits by 3 million patients in seven health systems, data from electronic health records accurately stratified mental health specialty and primary care visits according to short-term risk of suicide attempt or suicide death. Observed rates of probable suicide attempt and suicide death were over 200 times as high following visits in the highest 1% of predicted risk compared with visits in the bottom half of predicted risk (Table 3). The strongest predictors included mental health diagnoses, substance use diagnoses, use of mental health emergency and inpatient care, and history of self-harm. The absolute risk was lower in primary care, but the predictors selected and the accuracy of prediction were similar across care settings. Responses on the PHQ-9 were selected as important predictors, even though such data were available for only 15% of visits.

Potential Limitations

In interpreting these findings, we should consider both false positive and false negative errors in the ascertainment of probable suicide attempts and deaths. Previous research suggests that false positive rates are near zero for suicide deaths diagnosed by medical examiners (20) and below 20% for diagnoses of definite or possible self-inflicted injury in records from these health systems (7) (see also Appendix 2 in the online supplement). Diagnostic data do not distinguish between self-harm with and without intent to die. Consequently, our definition of probable suicide attempt may include a small proportion of self-harm episodes without suicidal intent. False negative errors may be more common. Up to one-quarter of suicide deaths may not be identified by medical examiners (19). Health system records will not capture suicide attempts when people do not seek care or when providers do not recognize and record diagnoses of self-harm. Nonspecific error (either false positive or false negative) would lead to underestimating the accuracy of prediction models (see Appendix 4 in the online supplement), whereas selective error in the wrong direction (e.g., underascertainment of suicide attempts in patients with low risk scores) could lead to overestimation of model performance.

Health system records do not reflect important social risk factors for suicidal behavior, such as job loss, bereavement, and relationship disruption. Suicidal behavior likely reflects the intersection of clinical risk factors, negative life events, and access to means of self-harm. Data regarding those social risk factors would certainly improve accuracy of prediction.

Our analyses do not consider the one-third to one-half of people who attempt suicide or die by suicide who have no recent mental health treatment or recorded diagnosis (3, 4, 29). Prediction using electronic health record data may also prove useful among patients without recorded mental health diagnoses, but prediction models would necessarily be limited to general medical diagnoses and utilization rather than the mental health diagnoses and treatments selected in this sample.

Methodologic Considerations

We focused on risk over 90 days following an outpatient visit. Risk does vary between visits (30), and near-term risk is most relevant to clinical decisions and quality improvement (31). The interventions that providers or health systems might provide for high-risk patients would typically be delivered over weeks or months (32, 33). Predictors selected in these models (Table 2) include both recent or short-term factors and long-term factors, consistent with previous research (7, 30) indicating that suicidal behavior is influenced by both stable and variable risk factors. Sensitivity analyses using a 30-day outcome window (see Appendix 7 in the online supplement) yielded similar results regarding both predictors selected and accuracy of prediction. Analyses regarding longer-term risk might identify different predictors of suicidal behavior.

Of predictive modeling methods, parametric methods like LASSO lie closest to traditional regression. Nonparametric methods (34) such as random forest could theoretically improve accuracy of prediction. Direct comparisons to date (12, 35), however, have found equal or superior prediction using parametric methods similar to those used here. Nonparametric methods may have little advantage when predictors are dichotomous, such as the diagnosis and utilization indicators included in our models. Parametric models are usually more transparent to clinicians (36) and simpler to implement in electronic health records, as is now under way in these health systems and the Veterans Health Administration (35).

Variable selection models are subject to overfitting or selection of predictive relationships idiosyncratic to a specific sample. The large sample used for training of these models offers some protection against overfitting. In addition, we present explicit comparisons of performance in the training and randomly selected validation samples for all four models (see Table 3 and Figure 1), finding no indication of overfitting in prediction of suicide attempts or prediction of suicide deaths following mental health specialty visits. We do find a slight indication of overfitting in prediction of suicide deaths following primary care visits, likely reflecting the smaller number of events included in these models. Nevertheless, the overall accuracy of prediction (c-statistic) in the independent validation sample exceeds 80%.

In addition to evaluating overfitting within this sample, we should consider generalizability to other care settings or patient populations. This sample included almost 20 million visits in seven health systems serving patients in nine states, including states with high and low rates of suicide mortality. Patients were broadly representative of those service areas in race/ethnicity, socioeconomic status, and source of insurance coverage, including substantial numbers insured by Medicare and Medicaid. Methods could be easily transported to health systems with standard electronic health records and insurance claim databases. Predicted risk levels, however, could be over- or underestimated in settings with higher or lower average risk of suicidal behavior. The predictors selected and the accuracy of prediction could differ in settings with different patterns of mental health care, especially if patterns of diagnosis or utilization were less closely linked to risk of suicidal behavior. The intervention of effective suicide prevention programs could also weaken the relationship between these identified risk predictors and subsequent suicidal behavior. Consequently, we recommend replication in other health systems prior to broad application. All information necessary for replication is available via our online repository.

Context

These empirically derived risk scores outperformed risk stratification based solely on item 9 of the PHQ-9. Regarding sensitivity, selecting mental health visits with any positive response to item 9 would identify only two-thirds of subsequent suicide attempts and deaths (7), whereas selecting visits with risk scores above the 75th percentile would identify 80%. Regarding efficient identification of high risk, selecting the 6% of visits with a response of “more than half the days” or “nearly every day” would identify one-third of subsequent suicide attempts and deaths (7), whereas selecting the 5% of visits with the highest risk scores would identify almost half.

Predictors identified in these models included a range of demographic characteristics, mental health diagnoses, and historical indicators of mental health treatment generally similar to those identified in previous research (9, 12, 13). Based on results in validation samples, performance of these prediction models equaled or exceeded that of other published models using health records to predict suicidal behavior (8–13), where c-statistics ranged from 0.67 to 0.84. These models significantly outperformed other published models predicting suicidal behavior after an outpatient visit, a question of high interest to a wide range of mental health and primary care providers. In this sample, mental health specialty visits with risk scores in the top 5% accounted for 43% of suicide attempts and 48% of suicide deaths in the following 90 days, and primary care visits in the top 5% accounted for 48% of subsequent suicide attempts and 43% of subsequent suicide deaths. For comparison, in two previous models predicting suicidal behavior following outpatient visits (12, 13), the top 5% of patients accounted for between one-quarter and one-third of subsequent suicide attempts and deaths. This improved prediction likely reflects differences in data and methods. First, longitudinal records in integrated health systems may allow more complete ascertainment of risk factors. Second, our analyses consider a larger number of potential predictors and more detailed temporal encoding. Third, responses to PHQ-9 item 9 contributed to prediction, even though such data were available for only 10%−20% of visits. Prediction accuracy would likely improve with greater use of the PHQ-9 or similar measures, as is expected with new initiatives promoting routine outcome assessment (37) and identification of suicidal ideation (5).

The c-statistics for these suicide prediction models also exceed those for models using health record data to predict rehospitalization for heart failure (38), in-hospital mortality from sepsis (39), and high emergency department utilization (40). Suicidal behavior may be more predictable than many adverse medical outcomes.

Among mental health specialty visits, a cut-point at the 95th percentile of risk had a positive predictive value of 5.4% for suicide attempt within 90 days. While that predictive value would be inadequate for a diagnostic test, it is similar or superior to widely accepted tools for prediction of major medical outcomes such as stroke in atrial fibrillation (41) and cardiovascular events (42). Furthermore, predictive values or expected event rates for widely accepted medical prediction tools often include adverse outcomes accumulated over many years (41, 42), rather than the 90-day risk period considered in these analyses.

Clinical Implications

Some recent discussions of predictive modeling in health care warn that reliance on algorithms could lead to inappropriate causal inference (43–45) or atrophy of clinician judgment (43). Regarding the first point, associations identified by our model should certainly not be interpreted as evidence for independent or causal relationships. For example, a recent benzodiazepine prescription is more likely a marker of increased risk than a cause of suicidal behavior. We report predictors selected (Table 2) to demonstrate that all are expected correlates of suicidal behavior, albeit in specific combinations within specific time periods. Regarding the second point, our model and other models predicting suicidal behavior from records data rely largely on the diagnostic and treatment decisions of treating clinicians. The predictors identified by our analyses would be well known to most mental health providers. Predictive models simply allow us to consistently combine millions of providers’ individual judgments to accurately predict an important but rare event (45).

Prediction models cannot replace clinical judgment, but risk scores can certainly inform both individual clinical decisions and quality improvement programs. Participating health systems now recommend completion of a structured suicide risk assessment (46) after any response of “more than half the days” or “nearly every day” to PHQ-9 item 9—implying a 90-day risk of suicide attempt of 2%−3% (7). A predicted 90-day risk exceeding 5% (i.e., above the 95th percentile for mental health specialty visits) would seem to warrant a similar level of additional assessment. A predicted 90-day suicide attempt risk exceeding 10% (i.e., above the 99th percentile for mental health specialty visits) should warrant creation of a personal safety plan and counseling regarding reducing access to means of self-harm (47, 48). Accurate risk stratification can also inform providers’ and health systems’ decisions regarding frequency of follow-up, referral for intensive treatment, or outreach following missed or canceled appointments (31, 49). Implementing these risk-based care pathways and outreach programs is a central goal of the Zero Suicide prevention model recommended by the U.S. National Action Alliance for Suicide Prevention (48). Empirically derived risk predictions can be an important component of that national suicide prevention strategy.

From the Kaiser Permanente Washington Health Research Institute, Seattle; the Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena; the HealthPartners Institute, Minneapolis; the Center for Health Services Research, Henry Ford Health System, Detroit; the Center for Health Research, Kaiser Permanente Northwest, Portland, Oreg.; the Institute for Health Research, Kaiser Permanente Colorado, Denver; and the Center for Health Research, Kaiser Permanente Hawaii, Honolulu.

Address correspondence to Dr. Simon ([email protected]).

Supported by cooperative agreement U19 MH092201 with NIMH.

Dr. Simon, Mr. Johnson, Dr. Lawrence, Dr. Lynch, Dr. Beck, Dr. Waitzfelder, Ms. Ziebell, Dr. Penfold, and Dr. Shortreed are employees of Kaiser Permanente. Dr. Simon has received research grants from Otsuka and Novartis. Dr. Penfold has received research funding from Janssen. Dr. Shortreed has worked on grant projects awarded to Kaiser Permanente Washington Health Research Institute (KPWHRI) Institute by Pfizer and is a co-investigator on grant projects awarded to KPWHRI from Syneos Health, which is representing a consortium of pharmaceutical companies carrying out FDA-mandated studies regarding the safety of extended-release opioids. The other authors report no financial relationships with commercial interests.

References

1 Kochanek KD, Murphy SL, Xu JQ, et al.: NCHS Data Brief: Mortality in the United States, 2016. Hyattsville, Md, National Center for Health Statistics, 2017Google Scholar

2 Centers for Disease Control and Prevention: Web-Based Injury Statistics Query and Reporting System (WISQARS), Nonfatal Injury Reports, 2000–2014. https://webappa.cdc.gov/sasweb/ncipc/nfirates.htmlGoogle Scholar

3 Ahmedani BK, Simon GE, Stewart C, et al.: Health care contacts in the year before suicide death. J Gen Intern Med 2014; 29:870–877Crossref, Medline, Google Scholar

4 Ahmedani BK, Stewart C, Simon GE, et al.: Racial/ethnic differences in health care visits made before suicide attempt across the United States. Med Care 2015; 53:430–435Crossref, Medline, Google Scholar

5 Patient Safety Advisory Group: Detecting and treating suicidal ideation in all settings. Chicago, Joint Commission Sentinel Event Alerts, 2016, issue 56 (https://www.jointcommission.org/assets/1/18/SEA_56_Suicide.pdf)Google Scholar

6 Franklin JC, Ribeiro JD, Fox KR, et al.: Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull 2017; 143:187–232Crossref, Medline, Google Scholar

7 Simon GE, Coleman KJ, Rossom RC, et al.: Risk of suicide attempt and suicide death following completion of the Patient Health Questionnaire depression module in community practice. J Clin Psychiatry 2016; 77:221–227Crossref, Medline, Google Scholar

8 McCarthy JF, Bossarte RM, Katz IR, et al.: Predictive modeling and concentration of the risk of suicide: implications for preventive interventions in the US Department of Veterans Affairs. Am J Public Health 2015; 105:1935–1942Crossref, Medline, Google Scholar

9 Kessler RC, Warner CH, Ivany C, et al.: Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). JAMA Psychiatry 2015; 72:49–57Crossref, Medline, Google Scholar

10 Walsh CG, Ribeiro JD, Franklin JC: Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci 2017; 5:457–469Crossref, Google Scholar

11 McCoy TH Jr, Castro VM, Roberson AM, et al.: Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry 2016; 73:1064–1071Crossref, Medline, Google Scholar

12 Kessler RC, Stein MB, Petukhova MV, et al.: Predicting suicides after outpatient mental health visits in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Mol Psychiatry 2017; 22:544–551Crossref, Medline, Google Scholar

13 Barak-Corren Y, Castro VM, Javitt S, et al.: Predicting suicidal behavior from longitudinal electronic health records. Am J Psychiatry 2017; 174:154–162Link, Google Scholar

14 Ross TR, Ng D, Brown JS, et al.: The HMO Research Network Virtual Data Warehouse: a public data model to support collaboration. EGEMS (Wash DC) 2014; 2:1049Medline, Google Scholar

15 Charlson M, Szatrowski TP, Peterson J, et al.: Validation of a combined comorbidity index. J Clin Epidemiol 1994; 47:1245–1251Crossref, Medline, Google Scholar

16 Kroenke K, Spitzer RL, Williams JB, et al.: The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. Gen Hosp Psychiatry 2010; 32:345–359Crossref, Medline, Google Scholar

17 Lu CY, Stewart C, Ahmed AT, et al.: How complete are E-codes in commercial plan claims databases? Pharmacoepidemiol Drug Saf 2014; 23:218–220Crossref, Medline, Google Scholar

18 Stewart C, Crawford PM, Simon GE: Changes in coding of suicide attempts or self-harm with transition from ICD-9 to ICD-10. Psychiatr Serv 2017; 68:215Link, Google Scholar

19 Bakst SS, Braun T, Zucker I, et al.: The accuracy of suicide statistics: are true suicide deaths misclassified? Soc Psychiatry Psychiatr Epidemiol 2016; 51:115–123Crossref, Medline, Google Scholar

20 Cox KL, Nock MK, Biggs QM, et al.: An examination of potential misclassification of army suicides: results from the Army Study to Assess Risk and Resilience in Servicemembers. Suicide Life Threat Behav 2017; 47:257–265Crossref, Medline, Google Scholar

21 Tibshirani R: Regression shrinkage and selection via the lasso. J R Stat Soc B 1996; 58:267–288Google Scholar

22 Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning, 2nd ed. New York, Springer, 2009Crossref, Google Scholar

23 Kass RE, Raftery AE: Bayes factors. J Am Stat Assoc 1995; 90:773–795Crossref, Google Scholar

24 Egan JP: Signal Detection Theory and ROC Analysis. New York, Springer Academic Press, 1975Google Scholar

25 Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29–36Crossref, Medline, Google Scholar

26 Bradley AP: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit 1997; 30:1145–1159Crossref, Google Scholar

27 Friedman J, Hastie T, Tibshirani R: Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33:1–22Crossref, Medline, Google Scholar

28 Weston S: Foreach looping construct for R, R Package, version 1.4.3, 2015Google Scholar

29 Han B, Compton WM, Gfroerer J, et al.: Mental health treatment patterns among adults with recent suicide attempts in the United States. Am J Public Health 2014; 104:2359–2368Crossref, Medline, Google Scholar

30 Simon GE, Shortreed SM, Johnson E, et al.: Between-visit changes in suicidal ideation and risk of subsequent suicide attempt. Depress Anxiety 2017; 34:794–800Crossref, Medline, Google Scholar

31 Olfson M, Marcus SC, Bridge JA: Focusing suicide prevention on periods of high risk. JAMA 2014; 311:1107–1108Crossref, Medline, Google Scholar

32 Brown GK, Ten Have T, Henriques GR, et al.: Cognitive therapy for the prevention of suicide attempts: a randomized controlled trial. JAMA 2005; 294:563–570Crossref, Medline, Google Scholar

33 Comtois KA, Linehan MM: Psychosocial treatments of suicidal behaviors: a practice-friendly review. J Clin Psychol 2006; 62:161–170Crossref, Medline, Google Scholar

34 Dreiseitl S, Ohno-Machado L: Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002; 35:352–359Crossref, Medline, Google Scholar

35 Kessler RC, Hwang I, Hoffmire CA, et al.: Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans Health Administration. Int J Methods Psychiatr Res 2017; 26:26Crossref, Google Scholar

36 Adkins DE: Machine learning and electronic health records: a paradigm shift. Am J Psychiatry 2017; 174:93–94Link, Google Scholar

37 HEDIS Depression Measures Specified for Electronic Clinical Data Systems. http://www.ncqa.org/HEDISQualityMeasurement/HEDISLearningCollaborative/HEDISDepressionMeasures.aspxGoogle Scholar

38 Frizzell JD, Liang L, Schulte PJ, et al.: Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol 2017; 2:204–209Crossref, Medline, Google Scholar

39 Taylor RA, Pare JR, Venkatesh AK, et al.: Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data–driven, machine learning approach. Acad Emerg Med 2016; 23:269–278Crossref, Medline, Google Scholar

40 Frost DW, Vembu S, Wang J, et al.: Using the electronic medical record to identify patients at high risk for frequent emergency department visits and high system costs. Am J Med 2017; 130:601.e17–601.e22Crossref, Google Scholar

41 Lip GY: Can we predict stroke in atrial fibrillation? Clin Cardiol 2012; 35(suppl 1):21–27Crossref, Medline, Google Scholar

42 Rana JS, Tabada GH, Solomon MD, et al.: Accuracy of the atherosclerotic cardiovascular risk equation in a large contemporary, multiethnic population. J Am Coll Cardiol 2016; 67:2118–2130Crossref, Medline, Google Scholar

43 Cabitza F, Rasoini R, Gensini GF: Unintended consequences of machine learning in medicine. JAMA 2017; 318:517–518Crossref, Medline, Google Scholar

44 Chen JH, Asch SM: Machine learning and prediction in medicine: beyond the peak of inflated expectations. N Engl J Med 2017; 376:2507–2509Crossref, Medline, Google Scholar

45 Obermeyer Z, Emanuel EJ: Predicting the future: big data, machine learning, and clinical medicine. N Engl J Med 2016; 375:1216–1219Crossref, Medline, Google Scholar

46 Posner K, Brown GK, Stanley B, et al.: The Columbia-Suicide Severity Rating Scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am J Psychiatry 2011; 168:1266–1277Link, Google Scholar

47 Rossom RC, Simon GE, Beck A, et al.: Facilitating action for suicide prevention by learning health care systems. Psychiatr Serv 2016; 67:830–832Link, Google Scholar

48 Hogan MF, Grumet JG: Suicide prevention: an emerging priority for health care. Health Aff (Millwood) 2016; 35:1084–1090Crossref, Medline, Google Scholar

49 Miller IW, Camargo CA, Jr., Arias SA, et al. Suicide prevention in an emergency department population: the ED-SAFE study. JAMA Psychiatry 2017; 74:563–570Crossref, Medline, Google Scholar

Volume 175
Issue 10

October 01, 2018
Pages 951-960

Metrics

Editor Spotlight

Journal Podcast

This article is featured in AJP Audio.

Keywords

PDF download

History

Received 27 October 2017

Revised 11 February 2018

Accepted 19 March 2018

Published online 24 May 2018

Published in print 1 October 2018

Sign In

Change Password

Your password must have 6 characters or more:

Password Changed Successfully

Create your account

Forget yout Password?

Forgot your Username?

Predicting Suicide Attempts and Suicide Deaths Following Outpatient Visits Using Electronic Health Records

Abstract

Objective:

Method:

Results:

Conclusions:

Method

Results

Discussion

Potential Limitations

Methodologic Considerations

Context

Clinical Implications

Editor Spotlight