0
Get Alert
Please Wait... Processing your request... Please Wait.
You must sign in to sign-up for alerts.

Please confirm that your email address is correct, so you can successfully receive this alert.

1
Editorial   |    
How Good Are Observational Studies in Assessing Psychiatric Treatment Outcomes?
T. Michael Kashner, Ph.D., J.D.
Am J Psychiatry 2012;169:244-247. doi:10.1176/appi.ajp.2012.12010005
View Author and Article Information
From Loma Linda University Medical School, Loma Linda, Calif.; University of Texas Southwestern Medical Center at Dallas; and the Office of Academic Affiliations, Department of Veterans Affairs, Washington, D.C.

Editorial accepted for publication January 2012.

Dr. Kashner reports no financial relationships with commercial interests.

Address correspondence to Dr. Kashner (michael.kashner@va.gov).

Copyright © American Psychiatric Association

Accepted January , 2012.

In comparative effectiveness studies, treatment effects are calculated by contrasting outcomes between patients who have been assigned to different treatment groups. While randomized treatment assignments are preferred, constraints on resources, timeliness of results, ethical concerns, low frequency of outcomes, and demands for patient subgroup analyses often lead psychiatric investigators to rely on observational data after patients and their physicians have self-selected their treatments (1). With observational studies, researchers must account for how patients select their treatment to properly adjust estimates of treatment effects that control for these selection biases.

In this issue of the Journal, Leon et al. (2) provide a strong case for an informative observational study. Their analysis was conducted to further test the 2009 Food and Drug Administration warnings that suicidal behavior could accompany the use of antiepileptic medications. To account for selection biases in this observational group, the authors applied an advanced statistical method called “propensity scoring” (3, 4). When the authors analyzed panel data for 199 participants with bipolar disorder who were followed for 30 years, they found no association between antiepileptic medication use and risk of suicide attempts or completed suicides.

The natural question for the practitioner is: Do these analytic techniques lead to scientifically valid findings that can guide clinical decisions? To make this judgment, it might be helpful to explain how statisticians control for selection biases.

Leon et al. (2) computed a treatment effect size by comparing suicidal outcomes (rate of suicide attempt or suicide) between treatment groups (patients who were exposed and who were not exposed to antiepileptic medications) after adjusting for differences in patient demographic and clinical factors. To yield valid findings, these adjustments must be based on all relevant confounding factors and computed using a correctly specified outcomes model.

To be relevant, confounding factors must 1) vary across treatment groups and 2) be expected to have an impact on patient outcomes directly. Randomized treatment assignments that yield equivalent treatment groups are said to be unconditionally “exogenous.” Thus, no factor will vary across treatment groups, and calculating effect sizes is reduced to simple comparisons of outcomes across treatment groups. However, when patients and their physicians self-select treatments, treatment groups are not expected to be equivalent. Researchers analyzing the outcomes must then identify all confounding factors (e.g., in the Leon et al. study, clinical and demographic characteristics), determine covariates from the data set to measure these confounding factors (e.g., in this case, prior symptom severity, suicidal behaviors, and comorbidities as clinical factors and socioeconomic status, marital status, age, and gender as demographic factors), and then specify an outcomes model to compute effect sizes that are adjusted for these covariates (e.g., here, a mixed-effect, grouped-time survival model).

Outcomes models specify outcome as the dependent variable (here, time between initial period and onset of suicidal behavior, if any) and the confounding covariates, along with a treatment indicator variable, as the independent variables. Treatment indicators assume a value of 1 when the patient selects the treatment of interest (e.g., exposed to antiepileptic in an initial period), and zero otherwise (e.g., not exposed). The outcomes model is fitted to the data set, and effect size is computed from estimates of the model parameters.

Few medical data sets will contain all relevant confounding factors (e.g., patient access to the means to commit suicide, patient access to psychiatric care for symptom relief). To account for these unobserved covariates, instrumental variables (5) are added to the list of independent variables in the outcomes model (e.g., here, the geographic location of patient residence, reflecting variations in gun regulations, drug trafficking enforcement, and availability of psychiatric services). Instruments must be observable in the data set, vary by treatment group, and be associated with one or more of the unobserved confounding factors. Unlike covariates, instruments are not expected to directly drive patient outcomes. Thus, any association observable in the data set between an instrumental variable and outcomes variables can be attributed to the instrument's association with one or more unobserved factors. If the observable covariates and instrumental variables included in the outcomes model reflect all relevant confounding factors, we say that the treatment assignment is exogenous conditional to the data, or “conditionally exogenous.”

The second problem is how to specify the outcomes model. Outcomes models that do not reflect the data set's true “data-generating process” are said to be misspecified (6). Adjusting for confounding factors using misspecified models could also lead to incorrect estimates of effect size (7).

To solve both exogeneity and specification problems in their outcomes model, Leon et al. summarized both covariates and instruments into a single score. This score was estimated by fitting a second model to the data set. Unlike outcomes models, these propensity models are designed to predict treatment assignment (e.g., exposed or not exposed to treatment with antiepileptics during the initial period) with covariates and instruments as independent variables. Effect sizes are computed by comparing outcomes between exposed and unexposed patients who have been matched by their respective propensity scores.

There are advantages to the Leon et al. approach. Combining covariates and instruments into a single score 1) reduces the number of free parameters in the outcomes model and thus increases power to detect treatment effect sizes; 2) permits more variables to be included in the analyses of small sample sizes; and 3) reduces the exogeneity problem to searching for variables that predict treatment assignment and the specification problem to determining how patients should be divided into discrete propensity groups.

But these advantages do not come without a price. The more successfully the propensity model predicts treatment assignment, the less likely it will be to find untreated and treated patients with matchable propensity scores (e.g., in the Leon et al. study, 21% of sampled patient-time intervals could not be matched). Replacing covariates and instruments by a single score may introduce a misspecification error because the impact of each variable on outcomes is assessed only through its association with the propensity score. When the study's purpose is to determine whether exposure to antiepileptic medication increases hazard rates for suicidal behaviors, what is needed is the propensity for suicidal behaviors, rather than the propensity for medication exposure. For instance, both severe symptoms and low socioeconomic status are positively associated with suicidal behaviors (8), while severe symptoms but high socioeconomic status often drive the decision to use medication (2). If these characteristics hold, then low-socioeconomic-status patients with severe symptoms would have a very different initial suicidal behavior profile than their high-socioeconomic-status counterparts with mild symptoms, although the two groups may have comparable propensity scores.

While citing prior successes is informative, findings should be tested for robustness each time an analytic method is applied to a given data set. Leon et al. did show that results were stable across different approaches to classifying patients into discrete propensity groups. However, more can be done here to help the practitioner judge the validity of the reported findings. For instance, a test for robustness inspired by White and Lu (9) and Rubin and Thomas (10) involves recomputing effect size estimates in which exposed and unexposed patients are rematched based on the propensity score plus one or more selected confounding covariates (e.g., propensity scores and socioeconomic status). Since both matched and rematched estimates are designed to measure the same effect size, any observed difference would allow the investigator to reject the null hypothesis that estimates were robust. By repeating across different sets of selected covariates (e.g., propensity and marital status, propensity and age group), the rematched sample that yields the greatest deviation from the original effect size estimate can be determined and tested for significance by bootstrapping the original data set.

This discussion is intended to point to an “analysis gap” that exists between advanced analytic methods that are known among mathematical and computational statisticians and actual methods that medical researchers apply in observational studies. The Leon et al. study offers a good example of methodologists and clinical investigators working closely together to narrow that gap and apply advanced statistical methods to observational outcome studies. As the National Institutes of Health continues its support for observational studies (1), medical researchers should, rather than restating theory, reciting prior successes, or limiting results to those computable with a popular commercial software program, comb the statistical literature, apply the best analytic methods for their study purpose, and test the applicability of such methods against their data set. Only then can practitioners have confidence that observational findings are offering correct statistical inferences on the risks and benefits of medical treatments.

Lauer  MS;  Collins  FS:  Using science to improve the nation's health system: NIH's commitment to comparative effectiveness research.  JAMA 2010; 303:2182–2183
[CrossRef] | [PubMed]
 
Leon  AC;  Solomon  DA;  Li  C;  Fiedorowicz  JG;  Coryell  WH;  Endicott  J;  Keller  MB:  Antiepileptic drugs for bipolar disorder and the risk of suicidal behavior: a 30-year observational study.  Am J Psychiatry 2012; 169:285–291
[CrossRef] | [PubMed]
 
Rubin  DB:  Estimating causal effects from large data sets using propensity scores.  Ann Intern Med 1997; 127:757–763
[PubMed]
[CrossRef]
 
Cepeda  MS;  Boston  R;  Farrar  JT;  Strom  BL:  Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders.  Am J Epidemiol 2003; 158:280–287
[CrossRef] | [PubMed]
 
Heckman  JJ;  Vytlacil  EJ:  Local instrumental variables and latent variable models for identifying and bounding treatment effects.  Proc Natl Acad Sci USA 1999; 96:4730–4734
[CrossRef] | [PubMed]
 
Golden  RM;  Henley  SS;  White  H;  Kashner  TM:  New directions in information matrix testing: eigenspectrum tests, In  Causality, Prediction, and Specification Analysis: Recent Advances and Future Directions . Edited by Swanson  NR.  New York,  Springer (in press)
 
Kashner  TM;  Henley  SS;  Golden  RM;  Rush  AJ;  Jarrett  RB:  Assessing the preventive effects of cognitive therapy following relief of depression: a methodologic innovation.  J Affect Disord 2007; 104:251–261
[CrossRef] | [PubMed]
 
Qin  P;  Agerbo  E;  Mortensen  PB:  Suicide risk in relation to socioeconomic, demographic, psychiatric, and familiar factors: a national register-based study of all suicides in Denmark, 1981–1997.  Am J Psychiatry 2003; 160:765–772
[CrossRef] | [PubMed]
 
White  H;  Lu  X:  Robustness checks and robustness tests in applied economics (Discussion Paper) .  San Diego,  University of California San Diego, Department of Economics, 2010
 
Rubin  DB;  Thomas  N:  Combining propensity score matching with additional adjustments for prognostic covariates.  J Am Stat Assoc 2000; 95:573–585
[CrossRef]
 
References Container
+

References

Lauer  MS;  Collins  FS:  Using science to improve the nation's health system: NIH's commitment to comparative effectiveness research.  JAMA 2010; 303:2182–2183
[CrossRef] | [PubMed]
 
Leon  AC;  Solomon  DA;  Li  C;  Fiedorowicz  JG;  Coryell  WH;  Endicott  J;  Keller  MB:  Antiepileptic drugs for bipolar disorder and the risk of suicidal behavior: a 30-year observational study.  Am J Psychiatry 2012; 169:285–291
[CrossRef] | [PubMed]
 
Rubin  DB:  Estimating causal effects from large data sets using propensity scores.  Ann Intern Med 1997; 127:757–763
[PubMed]
[CrossRef]
 
Cepeda  MS;  Boston  R;  Farrar  JT;  Strom  BL:  Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders.  Am J Epidemiol 2003; 158:280–287
[CrossRef] | [PubMed]
 
Heckman  JJ;  Vytlacil  EJ:  Local instrumental variables and latent variable models for identifying and bounding treatment effects.  Proc Natl Acad Sci USA 1999; 96:4730–4734
[CrossRef] | [PubMed]
 
Golden  RM;  Henley  SS;  White  H;  Kashner  TM:  New directions in information matrix testing: eigenspectrum tests, In  Causality, Prediction, and Specification Analysis: Recent Advances and Future Directions . Edited by Swanson  NR.  New York,  Springer (in press)
 
Kashner  TM;  Henley  SS;  Golden  RM;  Rush  AJ;  Jarrett  RB:  Assessing the preventive effects of cognitive therapy following relief of depression: a methodologic innovation.  J Affect Disord 2007; 104:251–261
[CrossRef] | [PubMed]
 
Qin  P;  Agerbo  E;  Mortensen  PB:  Suicide risk in relation to socioeconomic, demographic, psychiatric, and familiar factors: a national register-based study of all suicides in Denmark, 1981–1997.  Am J Psychiatry 2003; 160:765–772
[CrossRef] | [PubMed]
 
White  H;  Lu  X:  Robustness checks and robustness tests in applied economics (Discussion Paper) .  San Diego,  University of California San Diego, Department of Economics, 2010
 
Rubin  DB;  Thomas  N:  Combining propensity score matching with additional adjustments for prognostic covariates.  J Am Stat Assoc 2000; 95:573–585
[CrossRef]
 
References Container
+
+

CME Activity

There is currently no quiz available for this resource. Please click here to go to the CME page to find another.
Submit a Comments
Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
Comments are moderated and will appear on the site at the discertion of APA editorial staff.

* = Required Field
(if multiple authors, separate names by comma)
Example: John Doe



Related Content
Books
The American Psychiatric Publishing Textbook of Substance Abuse Treatment, 4th Edition > Chapter 20.  >
The American Psychiatric Publishing Textbook of Substance Abuse Treatment, 4th Edition > Chapter 23.  >
The American Psychiatric Publishing Textbook of Substance Abuse Treatment, 4th Edition > Chapter 31.  >
Dulcan's Textbook of Child and Adolescent Psychiatry > Chapter 58.  >
Gabbard's Treatments of Psychiatric Disorders, 4th Edition > Chapter 17.  >
Topic Collections
Psychiatric News
PubMed Articles