The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
EditorialsFull Access

Overt and Hidden Bias in Large Observational Studies

In this issue, Köhler et al. (1) present interesting findings on the antidepressant effect of the concomitant use of selective serotonin reuptake inhibitors (SSRIs) and statins as compared with SSRIs alone, from a population-based study. Using data from national registers, the authors identified nearly 900,000 incident SSRI users, of whom 13% used a statin concomitantly. Those taking an SSRI and a statin concomitantly were found to experience significantly lower risks for psychiatric hospital contacts in general and for psychiatric hospital contacts specifically due to depression, and no increased risks for all-cause mortality. One important strength of the study is the use of high-quality data from a very large nationwide sample, which allowed the investigators to estimate the effectiveness of treatment in the general population.

However, the study was susceptible to potential selection bias associated with an observational study design. One common issue in observational studies is that background characteristics of comparison groups may differ in ways that affect the outcome of interest. In the data used in the Köhler et al. study, the concomitant-treatment group was about 10 years older on average than the SSRI-only group (a median age of 56 years, compared with 46 years), and age has been shown in a univariate analysis to be associated with depression (2). To handle potential selection bias in observational studies, several statistical methods have been used in medical studies comparing effectiveness of treatments or exposures.

One common method is propensity-score matching, as used by Köhler et al. in their comparison of the concomitant and SSRI-only groups. The propensity score is the probability of an individual receiving a particular treatment (or exposure) given a set of observed covariates. In propensity-score matching, usually a subsample is chosen among all control subjects to match with treated individuals on background covariates, based on propensity score values. More specifically, investigators form pairs of treatment subjects and control subjects by matching on their propensity score values using certain algorithms. Nearest-neighbor matching was used in the Köhler et al. study, although matching on multivariate distance within propensity score calipers (e.g., a quarter of the standard deviation of propensity score values) tends to work better than matching directly on propensity score (3). The two comparison groups that are matched on propensity score are in general comparable in terms of the covariates used to estimate the propensity score (3). Propensity-score matching can be thought of as creating a “quasi-randomized” experiment in which two subjects with the same propensity score, one in the treated group and the other in the control group, are treated as if they were randomly assigned to each group. Thus, one might expect fair comparisons between the study groups after propensity-score matching. In practice, some subjects may have extreme values of propensity score that prevent them from being matched effectively. These subjects are often excluded (trimmed) from the matching, as done in the Köhler et al. study. To evaluate how the samples are compared on covariates before and after matching, standardized difference in percent (the mean difference as a percentage of the average standard deviation) is computed for each covariate. Often, including in the Köhler et al. study, the standardized difference is significantly smaller and is below 10%, where 10% is considered the threshold for a meaningful difference (4), after matching. Proper analyses are then performed to compare treatment effects while accounting for the matched pair design.

Other methods include propensity-score stratification (also called subclassification), weighting, and multivariate regression adjustment. Instead of forming specific pairs, propensity-score stratification splits the subjects into, for example, five strata of equal size by estimated propensity scores, which typically removes over 90% of the selection bias (5). If there are no significant differences in covariate means within each stratum, stratified analyses are then performed or the analyses are done within each stratum. Another way to remove selection bias is to weight individuals in the treated group by the inverse probability of the propensity score and weight the controls by the inverse probability of one minus propensity score. This creates a pseudo sample with approximately the same numbers of treated and control subjects with comparable propensity score values. Lastly, we can adjust for selection bias by using treatment indicator and propensity score (better in its logit form) as predictors, along with other important covariates, in a multiple regression model.

If the treated and control groups have similar variances, covariate adjustment using the propensity score works well and is quite common in applications, because of its simplicity. However, if the covariance matrices of the two groups are very different, one may consider using propensity score methods for matching (as done in the Köhler et al. study) or stratification, rather than using covariate adjustment (6). If the match is not complete, as in the Köhler et al. study, one should consider both matching and nonmatching (stratification/regression) analyses, balancing between bias and variance (4). Weighting has been used less frequently in the clinical literature, despite its attractive theoretical properties, probably because of the complexity of running weighted models and the potential issue with extreme weights. These various propensity score methods can also be used in combination. For example, covariate adjustment with the propensity score can be used in each stratum after propensity-score stratification (4), further subclassification can be performed after matching (7), or covariate adjustment can be used for the matched sample (8).

No matter which of these methods is implemented, the propensity score is calculated in the same way. Logistic regression is commonly used to estimate assignment probability based on all covariates that are related to treatment exposure (and to the outcome). If including all available covariates is not feasible or favorable, we could select a tentative list of covariates for adjustments using problem knowledge and exploratory comparisons of treatment groups. Based on a tentative adjustment method, we next apply it to the covariates excluded from the list and identify large imbalances after adjustment, then reconsider the tentative list in light of this (5). As the goal is to find the best estimation of propensity score, one can include a large number of covariates, their higher-order terms, and interactions in the logistic regression. In the next step, the propensity score can be included in the model for outcomes, along with a subset of key covariates if the covariate adjustment method is adopted. This two-step bias-removing method is more advantageous than fitting a regression model directly using all of the background covariates. In multiple regression, we are more concerned about including too many covariates and affecting the validity of the coefficient estimates. By including the propensity score in the model, we are able to control for a large set of covariates without sacrificing the validity of the model.

It is worth pointing out that propensity score methods are to correct overt bias, assuming no hidden bias. That is, the covariates collected in an observational study need to contain all relevant information of treatment assignment. If there is a hidden bias that is not measured or included in the calculation of propensity score, and it is closely related to the outcome of interest, seriously imbalanced between groups, and uncorrelated with propensity score, then our conclusions about the treatment will be affected. Several sensitivity analyses, such as sensitivity to an unobserved bias (9), have been suggested, but they are used relatively infrequently (8). Based on the reported numbers in the Köhler et al. study, it can be estimated that to attribute the lower risk for psychiatric hospital contacts to an unobserved binary covariate rather than the effect of an SSRI plus a statin, the covariate would need both to be an excellent predictor of psychiatric hospital contacts and to increase by roughly 1.1-fold the odds of concomitant treatment with an SSRI and a statin. For example, if the estimated probability of an individual getting concomitant treatment based on all observed covariates in the propensity score model is 0.2, then the inclusion of this hidden bias, if it were known, would increase the risk of exposure to 0.22. Thus, the findings in the Köhler et al. study may not be robust against hidden bias and should be interpreted with caution.

Nevertheless, the results from this large population study are consistent with those from a small randomized controlled trial (10) as well as with other studies. Therefore, it warrants further investigation of the antidepressant potential of the combined treatment in larger randomized trials.

From the Departments of Statistics and Psychiatry, University of Pittsburgh, Pittsburgh.
Address correspondence to Dr. Cheng ().

The author reports no financial relationships with commercial interests.

References

1 Köhler O, Gasse C, Petersen L, et al.: The effect of concomitant treatment with SSRIs and statins: a population-based study. Am J Psychiatry 2016; 173:807–815LinkGoogle Scholar

2 Blazer D, Burchett B, Service C, et al.: The association of age and depression among the elderly: an epidemiologic exploration. J Gerontol 1991; 46:M210–M215Crossref, MedlineGoogle Scholar

3 Rosenbaum PR, Rubin DB: Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 1985; 39:33–38Google Scholar

4 Austin PC, Mamdani MM: A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Stat Med 2006; 25:2084–2106Crossref, MedlineGoogle Scholar

5 Rosenbaum PR, Rubin DB: Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 1984; 79:516–524CrossrefGoogle Scholar

6 D’Agostino RB Jr: Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 1998; 17:2265–2281Crossref, MedlineGoogle Scholar

7 Rubin DB: Using propensity scores to help design observational studies: application to the tobacco litigation. Health Serv Outcomes Res Methodol 2001; 2:169–188CrossrefGoogle Scholar

8 Stuart EA: Matching methods for causal inference: a review and a look forward. Stat Sci 2010; 25:1–21Crossref, MedlineGoogle Scholar

9 Rosenbaum PR: Observational Studies, 2nd ed. New York, Springer, 2002CrossrefGoogle Scholar

10 Ghanizadeh A, Hedayati A: Augmentation of fluoxetine with lovastatin for treating major depressive disorder, a randomized double-blind placebo controlled-clinical trial. Depress Anxiety 2013; 30:1084–1088Crossref, MedlineGoogle Scholar