Item-Level Genome-Wide Association Study of the Alcohol Use Disorders Identification Test in Three Population-Based Cohorts
Abstract
Objective:
Genome-wide association studies (GWASs) of the Alcohol Use Disorders Identification Test (AUDIT), a 10-item screen for alcohol use disorder (AUD), have elucidated novel loci for alcohol consumption and misuse. However, these studies also revealed that GWASs can be influenced by numerous biases (e.g., measurement error, selection bias), which may have led to inconsistent genetic correlations between alcohol involvement and AUD, as well as paradoxically negative genetic correlations between alcohol involvement and psychiatric disorders and/or medical conditions. The authors used genomic structural equation modeling to elucidate the genetics of alcohol consumption and problematic consequences of alcohol use as measured by AUDIT.
Methods:
To explore these unexpected differences in genetic correlations, the authors conducted the first item-level and the largest GWAS of AUDIT items (N=160,824) and applied a multivariate framework to mitigate previous biases.
Results:
The authors identified novel patterns of similarity (and dissimilarity) among the AUDIT items and found evidence of a correlated two-factor structure at the genetic level (“consumption” and “problems,” rg=0.80). Moreover, by applying empirically derived weights to each of the AUDIT items, the authors constructed an aggregate measure of alcohol consumption that was strongly associated with alcohol dependence (rg=0.67), moderately associated with several other psychiatric disorders, and no longer positively associated with health and positive socioeconomic outcomes. Lastly, by conducting polygenic analyses in three independent cohorts that differed in their ascertainment and prevalence of AUD, the authors identified novel genetic associations between alcohol consumption, alcohol misuse, and health.
Conclusions:
This work further emphasizes the value of AUDIT for both clinical and genetic studies of AUD and the importance of using multivariate methods to study genetic associations that are more closely related to AUD.
Over the past decade, genome-wide association studies (GWASs) have advanced our understanding of alcohol use disorder (AUD) (1). Many of these studies have relied on a categorical approach to AUD phenotypes, comparing clinically ascertained case and control subjects (e.g., 2), but recent studies have increasingly employed a complementary approach leveraging dimensional measures of alcohol consumption and screen-based AUD symptoms in population-based cohorts (e.g., 3–6). Compared to clinical diagnostic phenotypes, these dimensional measures can often be administered more easily at scale via self-report questionnaires, thus accelerating genetic discovery through drastic increases in sample size. The Alcohol Use Disorders Identification Test (AUDIT) (7), a 10-item questionnaire that screens for drinking habits and problems by measuring aspects of alcohol use and misuse in the past year, is one such measure. A recent GWAS meta-analysis of AUD and AUDIT phenotypes identified 29 novel loci (5), representing one of the biggest advances of AUD genetics to date (2–4, 6).
Notably, several studies using self-report instruments have revealed that not all aspects of alcohol involvement are interchangeable. While AUDIT can be used as a unidimensional screen (i.e., AUDIT total score), previous research has shown that AUDIT can differentiate between two related but distinct facets of AUD: alcohol consumption (sum of items 1–3, AUDIT-C), which is necessary but not sufficient for a diagnosis of AUD, and problematic consequences of alcohol consumption (sum of items 4–10, AUDIT-P), which more closely resemble the diagnostic criteria of AUD. We previously found that AUDIT-C and AUDIT-P have distinct genetic relationships with clinically defined AUD (6) as well as other forms of psychopathology. Surprisingly, AUDIT-C was positively associated with socioeconomic variables, negatively associated with some forms of psychopathology, and only moderately positively associated with alcohol dependence, whereas AUDIT-P exhibited strong positive associations with alcohol dependence and numerous other psychiatric disorders. Although this divergence may reflect true differences between the biological mechanisms underlying alcohol consumption and problems, it may be confounded by other factors, such as sources of selection bias, genetic heterogeneity among the individual items, and measurement error (1, 8).
Because AUDIT-C and AUDIT-P are computed using an unweighted composite score approach, they inherently rely on the assumptions that the scale is unidimensional and that each item is equally informative of the construct being measured. This approach is not based on any empirical evidence but rather reflects a holdover from the original use of AUDIT as a screen for use in primary health care settings. Therefore, it is possible that the lack of item-specific weights introduces error in downstream analyses. While these issues have been thoroughly studied at the phenotypic level via factor analysis (see Table S1 in the online supplement), they have not yet been investigated at the genetic level. Using methods that can account for, or mitigate, such measurement problems will allow researchers to better capitalize on the potential of dimensional measures like AUDIT for genetic discovery.
In the present study, we sought to elucidate the genetics of alcohol consumption and problematic consequences of alcohol use measured via AUDIT using genomic structural equation modeling (9), a novel multivariate framework that allows structural equation modeling techniques to be applied to genetic covariance matrices based on GWAS results. Accordingly, we undertook the first item-level and the largest-to-date GWAS meta-analyses of AUDIT (N=160,824), using data from three population-based cohorts of European ancestry. We then used genomic structural equation modeling to analyze the item-level GWAS results with the aims of 1) investigating the latent genetic factor structure of AUDIT, based on prior knowledge (see Table S1 in the online supplement), and 2) conducting multivariate GWASs of the resulting latent genetic factor(s). We posited that applying this approach would lead to more nuanced, empirically derived weights to each of the AUDIT items when constructing our aggregate measures (as opposed to giving each item equivalent weight), which is a novel approach for GWASs of AUD phenotypes. Finally, to characterize the biology and liability associated with each latent genetic factor, we used a variety of in silico tools and polygenic analyses spanning three independent cohorts that varied in method of ascertainment and prevalence of AUD.
We hypothesized that a higher resolution of each of the alcohol phenotypes measured in AUDIT would further our understanding of the differences among indices of alcohol consumption (items 1–3) and problematic alcohol use (items 4–10) and how they relate to health. We anticipated that the genetic contributions to alcohol consumption and problematic use would not be completely overlapping and that genomic modeling using item-level data would ameliorate the confounding issues between alcohol consumption, AUD, and indices of health that complicated previous GWAS efforts.
METHODS
Discovery Samples and Phenotype Construction
We collected AUDIT and genotype data from three population-based cohorts: the UK Biobank (maximum N=147,267), the Netherlands Twin Register (maximum N=9,975), and the Avon Longitudinal Study of Parents and Children (ALSPAC; maximum N=3,582). We used the same phenotyping strategies across the three cohorts, which are described in section 2 of the online supplement. AUDIT scores and demographic characteristics for each cohort are reported in Table S2 in the online supplement. Genotyping, imputation, and quality control procedures have been extensively described in previous publications (10–12). Because AUDIT was administered with skip logic in UK Biobank, we used multiple imputation by chained equations to minimize the impact of missing data on our item-level GWAS (see section 2.1 in the online supplement for details).
Univariate Genome-Wide Association and Meta-Analyses
In UK Biobank, we used BOLT-LMM, version 2.3.2 (13), to conduct GWASs for each of the 10 AUDIT items with the first 40 ancestry principal components, sex, age, sex-by-age interactions, and batch as covariates. In the Netherlands Twin Register, we used the fastgwa function of GCTA (14) and included the first five ancestry principal components, sex, birth year, and genotyping platform as covariates. In ALSPAC, we used PLINK, version 2.0 (15), to analyze unrelated participants, including the first 10 ancestry principal components, sex, and age as covariates. Note that both BOLT-LMM and fastgwa are capable of analyzing related individuals. Further details are included in section 3 in the online supplement and in previous work (16). We then used METAL (17) to conduct sample-size-weighted meta-analyses of the cohort-level GWAS summary statistics for each AUDIT item after quality control procedures (see section 4 in the online supplement). A total of 8,596,116 single-nucleotide polymorphisms (SNPs) were included in the meta-analyses.
Phenotypic and Genetic Correlations
We used the lavaan package, version 0.6.5 (18), in R to estimate polychoric phenotypic correlations (rp) among AUDIT items. We used the GenomicSEM package, version 0.0.2, in R, which is based on linkage disequilibrium (LD) score regression (19), to estimate the heritability of each of the 10 AUDIT items and the genetic correlations between them. We applied standard quality control procedures prior to all analyses (e.g., use of precomputed LD scores, exclusion of the major histocompatibility region, restriction of SNPs to HapMap 3, application of minor allele frequency ≥1%, and information score >0.90 filters). Lastly, we used GenomicSEM (9) to estimate genetic correlations between latent genetic factors and complex traits and disorders broadly related to human health (see section 5.1.2 in the online supplement). We applied a standard Benjamini-Hochberg false discovery rate correction (FDR 5%) to account for multiple testing.
Phenotypic and Genetic Factor Analysis
To empirically model the phenotypic and genetic relationships among AUDIT items, we used lavaan and GenomicSEM to conduct phenotypic and genetic confirmatory factor analyses, respectively, using weighted least squares estimation. This process has been described extensively elsewhere (9, 16, 20, 21), and further details are provided in section 5.1 in the online supplement. We tested three models: a parallel factor model (i.e., a sum-score model), a common factor model, and a correlated factors model. The common and correlated factors models were selected based on prior research (see Table S1 in the online supplement), and the parallel factor model served to test the restrictive assumptions of sum-score approaches. We assessed model fit using conventional indices that were available in both the lavaan and GenomicSEM software packages (see section 5 in the online supplement). Only data from UK Biobank (the largest sample) were included in the phenotypic factor analyses. For the genetic factor analyses, GWAS summary statistics from the meta-analyses for each AUDIT item were subjected to standard quality control practices, as described above. GenomicSEM’s multivariable version of LD score regression was then used to estimate the genetic covariance and sampling covariance matrices for the AUDIT items, which were used to test the specified confirmatory factor models. The sampling covariance matrix was smoothed beforehand, as it was slightly non-positive-definite. Factor extension analysis was used to estimate the expected factor loading of item 6 (i.e., “eye opener”; see section 5.1.1 in the online supplement), as it was excluded from the final genetic confirmatory factor model because of nonsignificant SNP heritability.
Multivariate GWASs
Using GenomicSEM (9), we conducted multivariate GWAS analyses by estimating SNP associations with the AUDIT latent genetic factors from the best-fitting model. The details of these analyses are provided in section 5.1 in the online supplement. Individual SNP effects were estimated for the latent genetic factors in each model if they were available in all univariate summary statistics, had a minor allele frequency ≥0.5%, and were present in the 1000 Genomes Phase 3 (version 5) reference panel. The effective sample size for each latent factor was estimated using the approach described by Mallard et al. (16).
Biological Annotation, Gene, and Transcriptome-Based Association Analyses
We performed multiple in silico analyses to compare the results from each of the AUDIT latent genetic factors. First, we used FUMA, version 1.2.8 (22), to identify independent SNPs and study their functional consequences, which included ANNOVAR categories, Combined Annotation Dependent Depletion scores, and RegulomeDB scores. Second, we used MAGMA, version 1.08 (22, 23), to conduct competitive gene-set and pathway analyses for each of the AUDIT genetic latent factors. SNPs were mapped to 18,546 protein coding genes from Ensembl, build 85. Gene sets were obtained from MSigDB, version 7.0 (“curated gene sets,” “GO terms”). We also used an extension of this method, Hi-C-coupled MAGMA (H-MAGMA) (24), to assign noncoding (intergenic and intronic) SNPs to genes based on their chromatin interactions. Exonic and promoter SNPs are assigned to genes based on physical position. We used four Hi-C data sets, which were derived from fetal brain, adult brain, and induced pluripotent stem cell–derived neurons and astrocytes (https://github.com/thewonlab/H-MAGMA). Lastly, we used S-PrediXcan, version 0.6.2 (25), to predict gene expression levels in 13 brain tissues and to test whether the predicted gene expression showed divergent correlation patterns with each of the AUDIT latent genetic factors. Precomputed tissue weights from the Genotype-Tissue Expression (GTEx, version 8) project database (https://www.gtexportal.org/) were used as the reference transcriptome data set. Further details are provided in section 6 in the online supplement.
Polygenic Risk Score Analyses
Prediction of alcohol phenotypes in UK Biobank and COGA.
We used the PRS-CS “auto” version (26) to compute polygenic risk scores (PRSs) for the latent genetic AUDIT factors (“consumption” and “problems”) and their sum-score counterparts (AUDIT-C and AUDIT-P) in two independent samples: an independent subset of unrelated individuals of European ancestry in the UK Biobank who did not fill out the AUDIT and a subset of individuals of European ancestry from the Collaborative Study on the Genetics of Alcoholism (COGA) (27), which includes probands meeting criteria for alcohol dependence, their family members, and community control families. Using the score algorithm in PLINK, version 1.90, we computed individual-level PRSs to predict additional alcohol phenotypes (drinking quantity, drinking frequency, and lifetime AUD diagnosis) measured in UK Biobank and COGA (see section 7 in the online supplement). We tested for associations between AUDIT PRSs and alcohol phenotypes using linear (quantity and frequency phenotypes) or logistic (AUD) regression models in R, version 3.6.3. In UK Biobank, we included sex, age at first assessment, Townsend deprivation index score (28), and the first 10 ancestry principal components as covariates. In COGA, we included age, sex, array type, income, and the first 10 ancestry principal components as fixed-effect covariates, with family identity included as a random effect (i.e., allowing the intercept to vary by family).
We sought to compare the performance of the latent factor–based PRSs (the consumption and problems PRSs) against the performance of their sum-score counterparts (the AUDIT-C and AUDIT-P PRSs) in predicting different alcohol phenotypes. To this end, we applied two approaches to our PRS analyses: cross-dimension PRS models (i.e., the consumption and problems PRSs included as simultaneous predictors) and cross-method PRS models (i.e., the consumption and AUDIT-C PRSs included as simultaneous predictors in a model, and the problems and AUDIT-P PRSs included as simultaneous predictors in a model). We corrected for the total number of outcome phenotypes across the validation samples using a conservative Bonferroni p value of 8.33 × 10−3, since the same PRSs were used as predictors across models (and were correlated with each other).
Phenome-wide association study in BioVU.
To examine exploratory associations between PRSs and hundreds of medical diagnoses, we used the PRS-CS method (26) described above to compute consumption and problems PRSs for each of the 66,915 unrelated genotyped individuals of European ancestry from the Vanderbilt University Medical Center biobank (BioVU) (29). Using electronic health record data in BioVU, we performed phenome-wide association studies (PheWASs) for consumption and problems PRSs using the PheWAS package, version 0.12 (30), in R. Specifically, we fitted a logistic regression model to each of the 1,335 case/control phenotypes in BioVU (“phecodes”; see section 7.3 in the online supplement) in order to estimate the effect of a given PRS on each diagnosis. Sex, median age of the longitudinal electronic health record measurements, and the first 10 principal components were included as covariates. We then repeated the PheWAS analyses using AUD diagnoses (phecodes 317, 317.1) as additional covariates. A standard Benjamini-Hochberg false discovery rate (FDR 5%) correction was applied to account for multiple testing.
RESULTS
Phenotypic and Genetic Analyses Reveal a Consistent Two-Factor Structure of Alcohol Consumption and Problematic Use
Phenotypic and genetic analyses showed that AUDIT items were positively correlated with each other, with correlation estimates ranging from moderate to large (see Tables S3 and S4 in the online supplement). The one exception to this pattern was item 1 (frequency of consumption), which was generally less correlated with the other AUDIT items. Moreover, we found that genetic correlations tended to be moderately larger than the phenotypic correlations (mean absolute difference=0.198), an effect that was driven by stronger genetic correlations among items 4 through 10 (the problematic alcohol use phenotypes). Of note, all AUDIT items exhibited significant SNP heritability, with the exception of item 6 (see Table S5 in the online supplement). We suspect this may be attributable to the low rates of endorsement for the item in all three cohorts (see Table S2 in the online supplement). For this reason, we excluded item 6 from all subsequent analyses, and a factor extension analysis was used to estimate its expected factor loading in the final model.
We found that a correlated factors model provided the best fit to both the genetic and the phenotypic covariance matrices (phenotypic model: χ2=4252.963, df=26, comparative fit index=0.994, standardized root mean square residual=0.041; genetic model: χ2=142.689, df=26, comparative fit index=0.982, standardized root mean square residual=0.067) (Figure 1; see also Tables S6 and S7 in the online supplement). That is, the patterns of genetic and phenotypic correlations among the AUDIT items could both be represented by a factor model with two correlated factors: one that captured the covariance among alcohol consumption items (items 1–3) and one that captured the covariance among alcohol-related problems (items 4–10). These two latent factors were highly correlated with each other, phenotypically (rp=0.825, SE=0.002) and genetically (rg=0.801, SE=0.037). Nearly all items had large factor loadings across both levels of analyses except item 1, which consistently had markedly smaller factor loadings and larger residual variances.
The two correlated factors model was compared with other solutions. A model with a single common factor provided acceptable fit for the phenotypic (χ2=14,967.064, df=27, comparative fit index=0.978, standardized root mean square residual=0.070) and genetic (χ2=350.785, df=27, comparative fit index=0.949, standardized root mean square residual=0.094) factor analyses, but it did not minimize the standardized difference between the observed and predicted correlations as well as the correlated factors model (see Table S7 in the online supplement). The parallel factor model (i.e., the sum-score model) exhibited poor fit, reflected by the strong, unanimous bias observed in the model-implied correlations (phenotypic model: χ2=43,655.530, df=34, comparative fit index=0.936, standardized root mean square residual=0.143; genetic model: χ2=607.196, df=43, comparative fit index=0.911, standardized root mean square residual=0.470). Accordingly, we identified the correlated factors model as the best fitting and most appropriate model for further genetic analyses.
Latent Variable Approach Characterizes and Ameliorates Bias in GWAS of Alcohol Consumption
By estimating genetic correlations in a genomic structural equation modeling framework, we identified interesting patterns of relationships between 100 exogenous phenotypes (chosen based on previous findings or hypothesized relationships) and the consumption and problems latent genetic factors. We also examined correlations with the residual genetic variance in item 1 (i.e., the genetic variance in item 1 that is unrelated to other AUDIT items; henceforth the “frequency residual”). Results are reported in Table S8 in the online supplement.
For the consumption and problems factors, we found that their patterns of genetic correlation with other phenotypes were much more similar than previously reported for AUDIT-C and AUDIT-P (4). Both the consumption and problems factors showed strong positive genetic correlations with alcohol dependence. The consumption and problems factors were also positively related to other measures of substance use (e.g., cannabis use disorder, impulsivity). Furthermore, the previous positive associations that we observed between AUDIT and indices of socioeconomic status (e.g., educational attainment) were now attenuated.
We did still observe that, compared with the consumption factor, the problems factor was more strongly related to psychopathology (e.g., posttraumatic stress disorder, depression, bipolar disorder, schizophrenia). We also identified novel divergent associations with pain phenotypes, malnutrition, and measures of social satisfaction (e.g., the problems factor showing genetic overlap with these conditions), suggesting that, as we anticipated, the genetic contributions to alcohol consumption and misuse reflect both complementary and distinct genetic factors.
Finally, the frequency residual was negatively associated with alcohol dependence (Figure 2). We also found positive genetic correlations between the frequency residual and socioeconomic outcomes, including educational attainment, household income, and intelligence. Furthermore, we observed consistently negative genetic correlations between the frequency residual and other psychiatric and substance use disorders, such as major depressive disorder and cannabis use disorder. This result suggests that many of the puzzling genetic correlations previously reported for alcohol consumption were driven by variance related to socially stratified differences in behavior rather than variance related to the alcohol phenotypes of clinical interest.
Multivariate GWAS Confirms a Distinct Genetic Basis Between Alcohol Consumption and Misuse
The results of our multivariate GWAS for the consumption and problems factors are presented in Figure 3. We identified eight independent loci that were associated with the consumption factor (see Table S9 in the online supplement). For the problems factor, we replicated two loci on chromosome 4, located in the ethanol metabolizing gene ADH1B (see Table S10 in the online supplement). The signal associated with the latent factors is convergent with that of the sum scores, with a few exceptions (see section 6.1.1 and Tables S11 and S12 in the online supplement).
Some loci included genes that were only associated with the consumption factor (see Table S31 in the online supplement). For example, KLB, RCF1, and the MAPT/CRHR1 region, which were previously associated with alcohol consumption behaviors (3–5, 31), were only associated with the consumption factor. We also identified other novel candidate genes for alcohol consumption, such as CPS1, which has previously been associated with metabolic conditions (see Table S13 in the online supplement).
We performed in silico gene-based and transcriptome-based analyses (see Tables S15–S30 in the online supplement), which revealed both convergent and divergent associations for the consumption and problems factors (see Table S31 in the online supplement). For example, both factors robustly implicated ethanol metabolizing genes (ADH1B, ADH1C) and dopamine transmission (DRD2, involved in mediating the rewarding effects of drugs [32]), as well as pleiotropic genes previously implicated in anthropometric and metabolic traits (e.g., CELF1 [5, 33]), and intelligence (e.g., MTCH2 [34], FAM180B/NDUFS3 [35]).
Lastly, gene-set analyses revealed that genes more closely linked to cellular responses to alcohol drinking (e.g., cellular response to retinoic acid) were associated with the consumption factor (see Table S17 in the online supplement), while the gene sets related to postsynaptic modulation of chemical synaptic transmission were associated with the problems factor (see Table S18 in the online supplement).
Polygenic Risk Analyses
UK Biobank.
In UK Biobank, we found that both the consumption and problems PRSs were robustly associated with drinking frequency, drinking quantity, and lifetime AUD (Figure 4). However, the consumption PRS outperformed (i.e., explained more variance) the problems PRS for alcohol consumption phenotypes (see Table S32 in the online supplement). When the latent-factor PRSs and sum-score PRSs for the same construct were both included in the multiple regression model (e.g., the consumption and AUDIT-C PRSs), the consumption PRS outperformed the AUDIT-C PRS in predicting AUD diagnosis and drinking quantity (but not frequency), while the AUDIT-P PRS outperformed the problems PRS across all three phenotypes (see Table S33 in the online supplement).
COGA.
In COGA, PRS results aligned with those observed in UK Biobank, with a few exceptions. When both the consumption and problems PRSs were included in the same model, only the consumption PRS showed significant associations with drinks per week, maximum number of drinks per 24-hour period, and AUD (see Table S34 in the online supplement). As observed in UK Biobank, when latent-factor PRSs and sum-score PRSs for the same construct were both included in the multiple regression model, the consumption PRS outperformed the AUDIT-C PRS, and the AUDIT-P PRS outperformed the problems PRS (see Table S35 in the online supplement). Interestingly, in those models, we found that the strongest associations were between the consumption PRS and AUD and between the AUDIT-P PRS and AUD.
BioVU.
We performed two independent PheWASs of the consumption and problems PRSs to identify whether these two variables would show different patterns of genetic associations with medical outcomes. Of 1,335 phenotypes, 15 were FDR-significantly associated with the consumption PRS (Figure 5; see also Table S36 in the online supplement) and 17 with the problems PRS (see Table S37 in the online supplement). Both PRSs were significantly associated with AUD and other tobacco and substance use disorders. Replicating our previous results for AUDIT-C and AUDIT-P, we observed paradoxical negative associations between the consumption PRS and metabolic conditions, including diabetes mellitus and obesity phenotypes, whereas the problems PRS was primarily positively associated with other psychiatric disorders, including depression, anxiety disorder, bipolar disorder, schizophrenia, and suicidal ideation or attempt. Intriguingly, the problems PRS was also negatively associated with type 2 diabetes with renal manifestations. Most of the associations did not persist after correcting for AUD, although the direction of effects remained consistent (see Tables S38 and S39 in the online supplement).
DISCUSSION
In this study, we performed the first item-level and the largest GWAS of AUDIT to date (N=160,824), and we used genomic structural equation modeling to elucidate the genetic etiology of alcohol consumption and problematic alcohol use. By conducting phenotypic and genetic factor analyses of the individual AUDIT items, we provide evidence that two correlated latent factors (consumption and problems) parsimoniously explained the covariance in measures of alcohol consumption and problematic alcohol use across both levels of analysis. Moreover, by applying empirically derived weights to the AUDIT items in a genomic structural equation modeling framework, we demonstrated that our method can ameliorate confounding biases that have complicated previous work with consumption phenotypes (in particular, the bias present in item 1). Notably, both the consumption and problems factors share a strong positive genetic correlation with alcohol dependence (both rg values ∼0.7), and we show, for the first time, that the polygenic signal of the consumption factor is strongly associated with several AUD phenotypes in three independent cohorts. Finally, the results of our bioinformatic analyses further illustrate that the consumption and problems factors have unique components of their genetic etiology. Collectively, our novel framework provides a means to study two genetic liabilities that are more closely related to AUD and advances our understanding of the associated biology in several ways, as we delineate below.
First, we built on recent investigations of the genetic etiology of AUD and related traits by analyzing each of the 10 unique items that comprise AUDIT. At this higher resolution, we were able to identify sources of genetic heterogeneity among the items, such as the consistently weaker genetic correlations between frequency of alcohol consumption (item 1) and other drinking patterns (items 2–3) and AUD symptoms (items 4–10). Our item-level approach also allowed us to empirically model the genetic relationships between AUDIT items, providing the first empirical evidence of a correlated, two-factor structure for AUD symptoms at the genetic level. In doing so, we also generated empirically derived weights to determine how individual items contribute to aggregate measures of alcohol consumption and problematic use. This is an important advance from most quantitative or dimensional genetic studies of AUD (and other forms of psychopathology), which often use composite score measures that lack statistical justification.
Second, and perhaps most importantly, we found that the consumption factor was a good genetic proxy of AUD when appropriate weights were applied to the individual items using genomic structural equation modeling. This is a striking change from previous investigations into the divergent genetic bases of alcohol consumption and problematic use, including our own prior analyses of AUDIT. GWASs of alcohol consumption phenotypes have consistently reported low to moderate overlap with AUD, which has surprised many researchers (2–5), and even paradoxical negative associations with a variety of diseases and disorders. Our multivariate approach has ameliorated these issues, producing an aggregate measure of alcohol consumption that is more consistent with the known patterns of alcohol phenotype associations established in the literature, such as a strong genetic correlation with alcohol dependence. Furthermore, we used genetic correlation analyses to characterize the residual genetic variance in frequency of consumption (frequency residual) that is unrelated to other AUDIT items. These analyses revealed that the frequency residual had consistently positive associations with measures of socioeconomic status and consistently negative associations with measures of substance use and psychopathology. Indeed, these genetic correlations are very similar to those observed in GWASs of AUDIT-C (4, 5) and other GWASs of alcohol consumption (3, 4), suggesting that single-item frequency-based measures of alcohol consumption may be particularly susceptible to confounding and/or selection bias. For example, Marees et al. (36) reported that greater frequency of alcohol consumption was associated with higher socioeconomic status and lower risk of other psychiatric and substance use disorders in UK Biobank. In population-based cohorts with a “healthy volunteer” bias, such as the UK Biobank, the relationship between frequency of alcohol consumption and aspects of physical and mental health may not be fully generalizable (37). This degree of bias, we speculate, will likely vary from population to population.
Third, we confirmed that the genetic contributions to alcohol consumption are partially distinct from those pertaining to problematic consequences of alcohol use. In silico analyses revealed the value of dissecting the two phenotypes, as gene- and transcriptome-based analyses identified partially divergent biological mechanisms for the consumption and problems factors. For example, the corticotropin receptor gene (CRHR1), which has been associated with alcohol use in animals and humans (38, 39), was associated with consumption only. As a result, we are now beginning to uncover genetic signals for aspects of alcohol involvement that have the potential to be further analyzed at the molecular, cellular, and circuit levels in cellular and animal model systems.
Fourth, we found that the consumption PRS was strongly associated with AUD even in higher-risk cohorts like COGA. This demonstrates the important downstream effects of allowing items to have different weights in phenotype construction. Whereas our current and previous PRSs for AUDIT-C have been disproportionately influenced by a single item (frequency of consumption) (40), our consumption PRS was composed of the genetic effects shared among all consumption-focused items. The consumption and problems PRSs were both strongly associated with AUD in UK Biobank, even when both scores were entered in the same model. In COGA, both the consumption and problems PRSs were associated with AUD, but the consumption PRS was more strongly associated than the problems PRS. The increased influence of binge drinking (item 3), which had a large factor loading on the consumption factor, may be partially responsible for these stronger associations in a high-risk sample. However, it is perhaps more likely that these differences might be simply explained by differences in item endorsement and thus predictive power of the discovery GWASs (e.g., the consumption factor had a greater mean chi-square than the problems factor).
Finally, our comprehensive PheWAS analyses have linked different facets of AUD liability (via the latent factor–based consumption and problems PRSs) to a myriad of health-related outcomes in a large, independent biobank. We found that the consumption PRS was consistently negatively associated with a broad range of metabolic and congenital conditions. While it is possible that there is still residual bias in the discovery GWAS, it is important to note that this pattern of paradoxical associations with consumption is not observed in the genetic correlation analyses. Thus, it is possible that these negative associations are illustrative of selection bias or other confounding in BioVU (41), where patients with certain conditions may elect not to drink because of unmeasured factors (e.g., family history, medical advice, contraindications for prescribed medications). Mirroring the genetic correlation results, we also found that the problems PRS was uniquely associated with numerous psychiatric disorders that are commonly reported to co-occur with AUD. However, we determined that the associations between problems PRS and mental health did not persist in the absence of the clinical manifestation of AUD. These findings suggest that the associations with mental health are not a result of horizontal pleiotropy. Instead, they may be a consequence of AUD, be correlated with other risk factors for AUD (along with and/or aside from genetic risk), or be related to ascertainment of patients with diagnosed AUD in the medical record. These results also encouragingly suggest that treating AUD could have widespread improvements in overall health.
These findings should be interpreted in light of several limitations. AUDIT is a self-report measure that can be influenced by misreporting, and it only captures alcohol use in the past year, so it can be influenced by longitudinal changes in drinking that may be a consequence of, for example, other illnesses (42). People who stopped drinking or who never drank may represent genetically distinct groups; in our data set, 4,511 individuals were never drinkers, and 4,290 were previous drinkers. While our approach has substantially reduced bias in AUDIT without excluding any individuals from discovery, future studies might consider employing multiple techniques (e.g., separate never drinkers from former drinkers) to further alleviate potential biases associated with frequency of alcohol use in population-based cohorts. Additionally, while the AUDIT PRSs tended to perform similarly in UK Biobank and COGA, the portability of PRSs can be influenced by demographic characteristics such as socioeconomic status, age, and sex (43). It remains to be determined how generalizable the genetics of AUDIT are across different populations, especially in samples of different ancestries (as we included only individuals of European ancestry in the present study) or cultures (e.g., United Kingdom versus United States). A similar point also applies to sex-stratified samples, considering that AUDIT scores differ in men and women. Finally, it is important to note that the problems PRS exhibited weaker associations with AUD and other alcohol phenotypes in comparison to its AUDIT-P counterpart. Although the two predictors generally had similar effects in single-PRS models, the problems PRS was rendered redundant in the cross-method analyses when both of the highly correlated AUDIT-P and problems PRSs (e.g., r=0.84 in UK Biobank) were included in the regression models. However, we caution against the interpretation that the univariate GWAS approach is preferable. The multivariate GWAS function of genomic structural equation modeling is not only more flexible than traditional univariate GWAS, but its results may be more robust to confounding, as the software automatically applies a correction for population stratification (9). Furthermore, genomic structural equation modeling is better suited to investigate nuanced genetic influences, including the possibility of identifying SNPs with heterogeneous effects across symptoms or items.
Analyzing alternative phenotypes as a complementary approach to studying clinically defined AUD, and psychiatric disorders in general, has generated considerable interest in recent years (44). Collectively, our work demonstrates how AUDIT can inexpensively facilitate such efforts. Here, we have shown that, after correcting for some potential biases, item- or symptom-level analyses can help unpack the genetic etiology of AUD by breaking down genetic influences into specific and shared components; notably, this is possible only because we can contrast our results against gold-standard, clinically ascertained AUD GWAS data sets. While composite scores have shown some utility in previous genetic association studies, such studies often rely on strong assumptions that the scale is unidimensional and that each item is equally informative of the construct being measured. In this study, we have shown that the latter assumption is false for AUDIT. In particular, a large proportion of the genetic variance of item 1 appears to be uninformative about a broader consumption construct, as it is related to socially stratified differences in behavior rather than the alcohol phenotypes of clinical interest. Moreover, although we found a notable degree of unidimensionality among the AUDIT items, our results demonstrate that the consumption and problems factors remain distinct in their associations with health.
1 : Recent efforts to dissect the genetic basis of alcohol use and abuse. Biol Psychiatry 2020; 87:609–618Crossref, Medline, Google Scholar
2 , : Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat Neurosci 2018; 21:1656–1669Crossref, Medline, Google Scholar
3 , : Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat Genet 2019; 51:237–244Crossref, Medline, Google Scholar
4 , : Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat Commun 2019; 10:1499Crossref, Medline, Google Scholar
5 , : Meta-analysis of problematic alcohol use in 435,563 individuals identifies 29 risk variants and yields insights into biology, pleiotropy, and causality. Nat Neurosci 2020; 23:809–818Crossref, Medline, Google Scholar
6 , : Genome-wide association study meta-analysis of the Alcohol Use Disorders Identification Test (AUDIT) in two population-based cohorts. Am J Psychiatry 2019; 176:107–118Link, Google Scholar
7 , : Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO Collaborative Project on Early Detection of Persons With Harmful Alcohol Consumption–II. Addiction 1993; 88:791–804Crossref, Medline, Google Scholar
8 , : Heterogeneity of alcohol use disorder: understanding mechanisms to advance personalized treatment. Alcohol Clin Exp Res 2015; 39:579–584Crossref, Medline, Google Scholar
9 , : Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat Hum Behav 2019; 3:513–525Crossref, Medline, Google Scholar
10 , : Cohort profile: the “children of the 90s”: the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol 2013; 42:111–127Crossref, Medline, Google Scholar
11 , : Netherlands Twin Register: a focus on longitudinal research. Twin Res 2002; 5:401–406Crossref, Medline, Google Scholar
12 , : The UK Biobank resource with deep phenotyping and genomic data. Nature 2018; 562:203–209 https://doi.org/Crossref, Medline, Google Scholar
13 , : Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 2015; 47:284–290Crossref, Medline, Google Scholar
14 , : A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet 2019; 51:1749–1755Crossref, Medline, Google Scholar
15 , : Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015; 4:7 https://doi.org/Crossref, Medline, Google Scholar
16 , : Multivariate GWAS of psychiatric disorders and their cardinal symptoms reveal two dimensions of cross-cutting genetic liabilities. BioRxiv, September 8, 2020 (doi: https://doi.org/10.1101/603134)Google Scholar
17 : METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010; 26:2190–2191Crossref, Medline, Google Scholar
18 : lavaan: an R package for structural equation modeling. J Stat Softw 2012; 48:1–36Crossref, Google Scholar
19 , : LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 2015; 47:291–295Crossref, Medline, Google Scholar
20 , : Multivariate genomic analysis of 1.5 million people identifies genes related to addiction, antisocial behavior, and health. Nat Neurosci (in press)Google Scholar
21
22 , : Functional mapping and annotation of genetic associations with FUMA. Nat Commun 2017; 8:1826Crossref, Medline, Google Scholar
23 , : MAGMA: generalized gene-set analysis of GWAS data. PLOS Comput Biol 2015; 11:e1004219Crossref, Medline, Google Scholar
24 , : A computational tool (H-MAGMA) for improved prediction of brain-disorder risk genes by incorporating brain chromatin interaction profiles. Nat Neurosci 2020; 23:583–593Crossref, Medline, Google Scholar
25 , : Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun 2018; 9:1825 https://doi.org/Crossref, Medline, Google Scholar
26 , : Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 2019; 10:1776Crossref, Medline, Google Scholar
27 The Collaborative Study on the Genetics of Alcoholism. Alcohol Health Res World 1995; 19:228–236 https://doi.org/Medline, Google Scholar
28 , : The development of a standardized neighborhood deprivation index. J Urban Health 2006; 83:1041–1062Crossref, Medline, Google Scholar
29 , : Genetic risk for major depressive disorder and loneliness in sex-specific associations with coronary artery disease. Mol Psychiatry 2019 (https://doi.org/10.1038/s41380-019-0614-y)Google Scholar
30 : R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment. Bioinformatics 2014; 30:2375–2376 https://doi.org/Crossref, Medline, Google Scholar
31 , : New alcohol-related genes suggest shared genetic mechanisms with neuropsychiatric disorders. Nat Hum Behav 2019; 3:950–961Crossref, Medline, Google Scholar
32 : The brain on drugs: from reward to addiction. Cell 2015; 162:712–725Crossref, Medline, Google Scholar
33 , : Genetic variation at the CELF1 (CUGBP, elav-like family member 1 gene) locus is genome-wide associated with Alzheimer’s disease and obesity. Am J Med Genet B Neuropsychiatr Genet 2014; 165B:283–293Crossref, Medline, Google Scholar
34 , : Genome-wide association study of cognitive functions and educational attainment in UK Biobank (N=112 151). Mol Psychiatry 2016; 21:758–767Crossref, Medline, Google Scholar
35 , : Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet 2018; 50:912–919Crossref, Medline, Google Scholar
36 , : Potential influence of socioeconomic status on genetic correlations between alcohol consumption measures and mental health. Psychol Med 2020; 50:484–498Crossref, Medline, Google Scholar
37 , : Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am J Epidemiol 2017; 186:1026–1034Crossref, Medline, Google Scholar
38 , : Genome-wide association study of maximum habitual alcohol intake in >140,000 US European and African American veterans yields novel risk loci. Biol Psychiatry 2019; 86:365–376Crossref, Medline, Google Scholar
39 : Corticotropin releasing factor: a key role in the neurobiology of addiction. Front Neuroendocrinol 2014; 35:234–244Crossref, Medline, Google Scholar
40 , : Polygenic contributions to alcohol use and alcohol use disorders across population-based and clinically ascertained samples. Psychol Med (Online ahead of print, January 20, 2020)Google Scholar
41 , : Collider scope: when selection bias can substantially influence observed associations. Int J Epidemiol 2018; 47:226–235Crossref, Medline, Google Scholar
42 , : Genome-wide analyses of behavioural traits are subject to bias by misreports and longitudinal changes. Nat Commun 2021; 12:20211Crossref, Medline, Google Scholar
43 , : Variable prediction accuracy of polygenic scores within an ancestry group. eLife 2020; 9:e48376Crossref, Medline, Google Scholar
44 : Emerging phenotyping strategies will advance our understanding of psychiatric genetics. Nat Neurosci 2020; 23:475–480Crossref, Medline, Google Scholar