Schizophrenia patients differ greatly in observed levels of hallucinations, delusions, and negative, disorganized, manic, and depressive symptoms, as well as in age at onset, course of illness, and comorbidities. Historically, categorical subtypes of schizophrenia were identified, such as the paranoid, catatonic, and hebephrenic subtypes that were combined by Kraepelin into dementia praecox, and the positive and negative subtypes (1). Twin studies of schizophrenia as a category have provided the best evidence to date for strong heritability (70%–80%) (2). Thus, it is possible that most clinical diversity is due to the kind of individual variation in the underlying pathology that is seen in, for example, Huntington’s disease, which has one common genetic basis. An alternative approach is to define distinct dimensional features (e.g., a definition based on factor analyses of symptom data) (3).
A number of studies suggest that there is a genetic basis for clinical heterogeneity (4–6). Associations with dimensional components of schizophrenia could provide insights into targets for pharmacological therapy, factors influencing specific functional impairments, or the clinical subgroups most likely to be relevant to associations with the categorical diagnosis. We previously hypothesized two classes of genetic effects (5): clinical modifier genes that influence features of illness without altering the risk of illness itself and susceptibility modifier genes that influence the risk of illness in a way that affects its clinical features (akin to subtypes of illness).
Genome-wide association study (GWAS) data provide an opportunity to consider dimensional approaches in new ways. Schizophrenia GWAS analyses have detected three types of highly significant effects: 1) common single-nucleotide polymorphisms (SNPs) in at least seven genes or nongenic regions are strongly associated with schizophrenia, although with small individual effects (7); 2) a set of rare chromosomal deletions and duplications (copy number variants) have large effects on risk but only in a small proportion of cases (8); and 3) a robust polygenic effect can be observed by predicting case-control status in a schizophrenia data set by computing scores for each subject that depend on association test results for large numbers of SNPs from a different schizophrenia data set (7, 9). The success of the GWAS approach suggests that it might also be used to explore the genetic basis of clinical heterogeneity.
To our knowledge, this study is the first GWAS analysis of clinical symptom dimensions in schizophrenia. We used data from one of the largest single GWAS (Molecular Genetics of Schizophrenia [MGS] study) (10), in which the assessment protocol included completion of a dimensional rating scale by an expert diagnostician after reviewing multiple sources of current and historical data. We used factor analysis to derive positive, negative/disorganized, and mood factors from these data and tested the association of each factor score with the SNPs from the MGS GWAS (10). We then used data from 16 other data sets in the Psychiatric GWAS Consortium (PGC) schizophrenia analysis (7) to generate polygenic scores for the MGS participants and carried out analyses to determine whether the strong polygenic effect observed across schizophrenia data sets is more strongly associated with any of the clinical dimensions.
Clinical Sample and Assessments
The clinical methods of the study have been described elsewhere (10). Briefly, we examined 2,454 individuals of European ancestry for whom both GWAS data and valid dimensional rating data (2,436 individuals for chromosome X because of additional quality-control exclusions) were available. Participants were recruited through 10 university-based sites in the United States and Australia under a common protocol. They received consensus diagnoses of either DSM-IV schizophrenia (90%) or schizoaffective disorder (with criterion A schizophrenia symptoms for at least 6 months) based on available information from the Diagnostic Interview for Genetic Studies, version 2.0, informant reports, and psychiatric treatment records. At the same time that diagnoses were assigned (i.e., when all sources had been reviewed), a diagnostician also rated clinical features using the Lifetime Dimensions of Psychosis Scale (http://depressiongenetics.stanford.edu/ldps.html), which was designed to quantify the schizophrenia symptom dimensions identified by previous factor-analytic studies (11). The 14 scale items we used are listed in Table 1. For each item, separate ratings were made on 4-point subscales for typical severity and total duration, and these ratings were summed to produce a score for the item for this analysis. An additional four items were partially redundant or had insufficient variance to be useful in this study. Interrater reliability was measured for 41 participants (drawn from all sites) for whom complete scale ratings were obtained from pairs of raters at different sites, with acceptable intraclass correlation coefficients for the positive (0.74), negative/disorganized (0.66), and mood (0.67) factor scores described below.
TABLE 1.Factor Loadings in the Exploratory Factor Analysis of the Lifetime Dimensions of Psychosis Scale Ratings in the Molecular Genetics of Schizophrenia Samplea
| Add to My POL
|Abnormal perception of thought||0.504||0.100||0.118|
|Poverty of speech||0.076||0.707||−0.162|
|Formal thought disorder||0.175||0.597||0.084|
|Depression with psychosis||0.263||−0.172||0.465|
|Mania with psychosis||0.015||0.140||0.934|
DNA Extraction and Genotyping
DNA specimens were extracted from lymphocytes or from Epstein-Barr virus-transformed lymphoblastic cell lines and were assayed at the Broad Institute (Cambridge, Mass.) using Affymetrix 6.0 genotyping arrays (Affymetrix, Santa Clara, Calif.). Part of the MGS GWAS sample was genotyped under the auspices of the Genetic Association Information Network, and the remaining samples were genotyped (at the same laboratory several months later) under grant funding, but they constitute a single MGS sample. After quality control, 671,422 autosomal and 25,069 X-chromosome SNPs were selected for analysis (10).
Factor Analysis of the Lifetime Dimensions of Psychosis Scale
Exploratory factor analysis using all MGS GWAS participants with Lifetime Dimensions of Psychosis Scale ratings was performed in Mplus (http://www.statmodel.com/index.shtml) using an oblique geomin rotation (12). Prior to exploratory factor analysis, missing data points were imputed using the Proc MI statistical procedure in SAS (SAS Institute, Cary, N.C.) after excluding participants for whom data were missing for ≥50% of the items. The exploratory factor analysis included 2,454 participants of European ancestry and 1,137 African American participants, but we report only on the larger European ancestry data set rather than combining both data sets, since the genetic architecture for the two groups looks different (10). A three-factor solution was selected as providing the most parsimonious and interpretable factors. Based on the results from the exploratory factor analysis, a variable with a loading of at least 0.4 on a factor was selected as an indicator for that factor in the confirmatory factor analysis if its loadings on each of the other factors was at least 0.2 units less. Confirmatory factor analysis was performed (12) following the exploratory factor analysis structure, specifying a simple model with no cross loadings of items on factors. Goodness-of-fit was assessed using the comparative fit index, Tucker-Lewis index, and root mean square error of approximation from the confirmatory factor analysis.
We implemented a case-only association test of allelic effects on three quantitative traits: positive, negative/disorganized, and mood factors. We used linear regression as implemented in PLINK (13) to test for allelic effects on scores for these three factors, with covariates including study site (categorical), age, sex, and principal components scores reflecting ancestry effects (five for autosomal SNPs and three for chromosome-X SNPs) (10). Because three different dimensions were tested, the threshold for genome-wide significance was set at 1.67×10−8.
We tested whether any known gene pathways (sets of functionally related genes) were overrepresented in the locations of the best association findings for each dimension using the ALIGATOR (Association LIst Go AnnoTatOR) method, which corrects for confounding factors and sources of bias, such as linkage disequilibrium between SNPs, variable gene size, overlapping genes, and multiple nonindependent gene ontology categories (14). We included pathways from the Gene Ontology, KEGG (Kyoto Encyclopedia of Genes and Genomes), Mouse Genome Informatics, PANTHER (Protein Analysis THrough Evolutionary Relationships), BioCarta, and Reactome databases.
It has been well established that GWAS results from one schizophrenia data set can be used to predict case-control status in a second data set (7, 9). A large genome-wide set of independent autosomal SNPs (which have been genotyped or imputed in each data set) is selected (after pruning to restrict linkage disequilibrium between SNPs); then the effect size beta for the test of association of each tested allele in the first data set is used as a weighting factor to create a polygenic score for each subject in the second data set as the sum across all SNPs for the number of test alleles carried by the subject, times the weight for each allele. The proportion of variance explained is small but increases with the sizes of the two data sets.
We assessed whether the polygenic signal was more closely related to any one symptom dimension. We used MGS dimensional GWAS data for each factor score as a training data set and the remaining 16 PGC data sets (case subjects, N=6,715; comparison subjects, N=9,978) (7) as the test data set. From all HapMap 3 SNPs that were either genotyped or imputed (with information content >0.9 using the Beagle genetic analysis software package  for the PGC samples), 110,942 autosomal SNPs were selected, with a linkage disequilibrium (r2) <0.25 in 500 SNP windows. For each symptom factor separately, analyses were carried out for each of 10 bins of SNPs (Table 2); each bin included SNPs with p values in the MGS GWAS for that dimension that were below the specified values listed in Table 2. In each analysis, effect size beta values from the MGS dimensional analysis were used as weighting factors to compute polygenic scores for each participant in the 16 PGC data sets. PGC case-control status was then predicted by logistic regression analysis of polygenic scores plus covariates (PGC study site and nine principal component scores reflecting ancestry). Each analysis yielded a p value for the overall significance of the prediction of PGC case-control status, while correcting for covariates, and an estimate of the variance in case-control status that was explained (Nagelkerke’s R2 for the full model using the polygenic score plus the covariates, minus R2 for the covariates alone).
TABLE 2.Polygenic Score Analyses of Prediction of Psychiatric GWAS Consortium (PGC) Case-Control Status by Results of Each Molecular Genetics of Schizophrenia (MGS) Dimensional GWASa
| Add to My POL
|Symptom Factor||Single-Nucleotide Polymorphisms (SNPs)||Symptom Factorb|
|p-Value Threshold (Dimensional GWAS) to Select SNPs||N||p||Variance Explainedc||p||Variance Explainedc||p||Variance Explainedc|
We also examined the same effect in the opposite direction (i.e., not an independent analysis). Polygenic scores for the MGS GWAS case subjects were computed using association test results for the 16 PGC data sets combined, using the subset of the same SNPs that produced the most significant polygenic analysis for the categorical schizophrenia diagnosis (the best 20% of p values in the 16 PGC data sets, predicting MGS case-control status with p=2.45×10−54, 6.35% of variance explained). We then used linear regression to determine whether polygenic scores for the MGS case subjects were predicted by each factor score plus MGS ancestry and site covariates, and we report the p value for the effect of each factor score.
Eigenvalues, exploratory factor analysis model fit indices, and clinical judgment were used to select a three-factor model as the most adequate and parsimonious representation of the item associations. Exploratory factor analysis factors and their item loadings are listed in Table 1. The three factors (clinical dimensions) were labeled as positive, negative/disorganized, and affective. The confirmatory factor analysis model fit indices for this three-factor model were the comparative fit index (0.91), Tucker-Lewis index (0.90), and root mean square error of approximation (0.12). Additional factors could have been extracted to improve the statistical fit, but such factors were more poorly marked and less likely to be replicable and meaningfully interpreted. The three-factor solution is clinically intuitive and consistent with previous studies. It is possible that ascertainment or rater differences across sites may have also contributed to the lower fit index values. However, as noted above, we accounted for the site mean differences as well as age and sex effects on the factor scores in the association regression models.
GWAS of Symptom Dimensions
Genomic inflation factors (λ) for analyses of positive, negative/disorganized, and affective factors were 0.98, 1.0, and 1.01, respectively, indicating no significant inflation of results by technical factors or population stratification. No genome-wide significant associations were observed for any clinical dimension. Data for SNPs with a p value <10−5 are summarized in Table 3, including gene symbols and a brief summary of functions. Only one region (chromosome 20q13.31) produced moderate evidence for association with two different factors (positive and negative/disorganized).
TABLE 3.Single-Nucleotide Polymorphisms (SNPs) of Moderate Association to Each Symptom Dimensiona
| Add to My POL
|SNP||Chromosome/Band||Location (Base Pair)||Beta||p||Closest Gene (Symbol, Distance [base pair], Gene Name)||Function/Relevance|
|rs7233060||18q23||75,493,367||0.1225||2.53×10−07||CTDP1, −47421, CTD (carboxy-terminal domain, RNA polymerase II, polypeptide A) phosphatase, subunit 1||Makes POLR2A available for initiation of gene expression; mutations cause Charcot-Marie-Tooth (demyelinating) disease|
|rs17206232||5q12.3||64,469,156||0.1350||1.45×10−06||ADAMTS6, 11162, ADAM metallopeptidase with thrombospondin type 1 motif, 6||ADAMTS4/ADAMTS5 induce neurite extension in cultured neurons (25)|
|rs2323266||13q21.2||60,863,304||−0.1032||3.13×10−06||PCDH20, 18515, protocadherin 20||Neuronal survival, synaptogenesis (26). Hippocampal circuitry formation, synaptic plasticity (27). Variants in PCDH19 associated with epilepsy (28, 29).|
|rs10900020||10q11.21||44,147,203||−0.1536||3.46×10−06||CXCL12, 38407, chemokine (C-X-C motif) ligand 12||Diverse roles in neuronal migration, growth factor signaling, neuroprotection (30). Increased GABA, glutamate, dopamine release (31).|
|rs959770||4p16.3||2,365,095||−0.1198||9.40×10−06||ZFYVE28, within, zinc finger, FYVE domain containing 28||Regulation of epidermal growth factor receptor activity|
|rs1455244||18p11.21||11,484,199||−0.06046||3.22×10−06||Intergenic (195 kb upstream of closest gene, GNAL, guanine nucleotide binding protein [G protein], alpha activating activity polypeptide, olfactory type)||Coupled to mesolimbic and mesocortical dopamine-1 receptors (32)|
|rs7172342||15q22.2||59,123,734||−0.0795||3.83×10−06||RORA, within, RAR-related orphan receptor A||Transcription factor involved in cerebellar dendritic development and synapse formation (33). Decreased expression in autism (34).|
|rs4530903||6p21.32||32,689,867||0.09191||4.83×10−06||Between HLA-DRB1 and HLA-DQA1, 35343, −23293, major histocompatibility complex, class II genes||Immunity. Common SNPs in this region are strongly associated with schizophrenia (7, 9, 10, 22).|
|rs10924245||1q44||243,800,231||−0.1661||6.93×10−06||KIF26B,within, kinesin family member 26B||Regulation of cell-cell adhesion|
|rs17290922||16q13||55,581,818||−0.1338||7.78×10−06||NLRC5,within, nucleotide-binding oligomerization domains 27||Induce major histocompatibility class-I genes (35)|
|rs4702765||5p15.2||10,980,604||0.2237||1.06×10−05||CTNND2, 44347, catenin (cadherin-associated protein), delta 2||Binds presenilin-1. Maintenance of dendrites and dendritic spines (36). Mutations cause cri du chat syndrome (37). Rare copy number variant observed in schizophrenia (38).|
Only one region produced evidence for genome-wide significant association in the PGC two-stage analysis (full GWAS data for 9,394 case subjects and 12,462 comparison subjects and the addition of data for the most significant SNPs from 8,442 case subjects and 21,397 comparison subjects) (10). The PGC observed significant association for multiple SNPs across the major histocompatibility complex region, spanning the HLA (human leukocyte antigen) genes, and we observed moderate evidence for association of negative/disorganized factor scores with SNPs downstream of HLA-DQA1.
Pathway analyses were performed separately for SNPs within genes (267,899 SNPs, 15,998 genes) and then for SNPs within 20 kb of genes (360,811 SNPs, 22,604 genes). The threshold for selecting significant SNPs in this context was set such that 5% of genes included one such SNP (p=0.007 and 799 genes for SNPs within genes; p=0.005 and 1,130 genes for SNPs within 20 kb of genes). In both analyses, the number of pathways that were enriched (i.e., pathways that contained more significant genes than expected by chance) did not reach overall significance after correction for multiple testing.
Results for the prediction of PGC case-control status with polygenic scores based on each MGS dimensional GWAS analysis are summarized in Table 2. For the negative/disorganized factor, p values became nominally significant when polygenic scores for participants in the PGC study were computed based on results of the best 10% of SNPs in the dimensional GWAS analysis, with the lowest p value (0.007) obtained using all SNPs, although only 0.05% of the variance in PGC case-control status was predicted. There was no evidence that polygenic scores based on the positive or mood factor GWAS results could predict PGC case-control status.
In a related analysis of the MGS case subjects, polygenic scores were computed based on log odds ratio values from the other 16 PGC data sets and were used to predict (by linear regression) each factor score, with site, sex, age at interview, and MGS ancestry principal components as covariates; p values for negative/disorganized, positive, and mood factors were 0.03, 0.5, and 0.7, respectively. There was no significant interaction between sex and polygenic scores in predicting negative/disorganized factor scores. To further explore the relationship between negative/disorganized factor and polygenic scores, we carried out separate linear regression analyses of the raw sums of severity plus the duration ratings for Lifetime Dimensions of Psychosis Scale items for negative (blunted affect and poverty of speech) and disorganized (formal thought disorder and disorganized behavior) symptoms as predictors of polygenic scores, with site and ancestry component covariates. A significant effect was observed for disorganized symptoms (p=0.004) but not negative symptoms (p=0.37); analyzed separately, both disorganized symptom items contributed to the prediction of polygenic scores (formal thought disorder, p=0.01; bizarre behavior, p=0.03).
To our knowledge, this is the first GWAS of clinical dimensions of schizophrenia. There have been several previous reports of relationships between putative schizophrenia candidate genes and clinical measures (16–20). SNPs in DTNBP1 were reported to be more strongly associated with negative symptoms and SNPs in COMT with manic symptoms in two independent samples (16–19). SNPs in ZNF804A were reported to be more strongly associated with manic-like symptoms in one sample (20). Another study presented association results for a small case-control sample in regions with previously demonstrated evidence for linkage to schizophrenia symptom factors and reported SNPs with moderate levels of association with positive and disorganized symptom scores (21).
In the present study, we did not detect any association for clinical factor scores at a genome-wide significant threshold of significance, which is not surprising given that much larger samples have been required to detect significant associations of schizophrenia with common SNPs (7, 9, 10, 22). With one exception, there was no overlap between the best MGS dimensional GWAS association signals and the significant associations detected by the PGC. This suggests either that differential genetic effects on symptoms (if they exist) are largely distinct from those on risk of illness or that much larger samples are needed to detect individual SNPs that influence both symptom dimensions and illness risk. The exception was the moderate association that we observed between negative/disorganized symptoms and SNPs between HLA-DRB-1 and HLA-DQA1, part of the broad major histocompatibility complex region (spanning all of the HLA genes) in which many SNPs are significantly associated with schizophrenia (7, 9, 10, 22). It is not yet known how sequence variation in HLA genes predisposes to schizophrenia or whether and why this might be more related to negative/disorganized symptoms.
The most intriguing result is that case-control status of participants in the PGC analysis was predicted by polygenic scores that were computed on the basis of MGS association test results for negative/disorganized scores for thousands of SNPs, with the signal here apparently generated primarily by ratings of disorganized symptoms (formal thought disorder and bizarre behavior). This suggests that the well-replicated polygenic effect seen in cross-data-set analyses of schizophrenia (7, 9) might be most closely related to these aspects of the disorder, which in turn suggests that treatments might be able to target these features. Note that within-subject analyses of MGS factor scores are unlikely to be related to case-control analyses: when case subjects have a higher frequency of specific SNP alleles than comparison subjects, the polygenic effect observed in our study would not be detected if factor scores were randomly distributed among case subjects. The effect is modest and is difficult to correct for multiple testing because 10 partially correlated analyses were carried out for each factor score. However, the pattern of results is typical of other schizophrenia polygenic analyses, becoming gradually more significant as larger proportions of SNPs are included. This is believed to be the case because many SNPs influence risk, many of them with very small effect sizes that produce completely nonsignificant individual p values in most GWAS data sets such that their effects can only be detected in aggregate (9, 23). However, when we used MGS case-control GWAS results as weights for polygenic scores in the remaining 16 PGC data sets, 2.2% of the variance in case-control status in those data sets could be predicted, much larger than the 0.05% of variance that can be predicted with polygenic scores based on MGS negative/dimensional GWAS results.
The size of the polygenic effect that can be detected for symptom scores may be restricted by what we view as an inherent noisiness of clinical ratings in schizophrenia, such that it is noteworthy to detect any genetic association signal using factor scores. Clinical ratings rely on the self-report of patients who may fail to recognize or may deny their symptoms, as well as on records (often cursory) from brief hospital stays and clinic visits. We also observed site differences in factor score means, and we cannot determine whether these were due to true differences in sampling or subtle differences in rater styles. Nevertheless, our three-factor solution is clinically intuitive and consistent with previous work. Factor analytic studies of schizophrenia have been reviewed by Peralta and Cuesta (3). Selected models have typically included three to five factors, including various combinations of positive, “bizarre” positive (Schneiderian), negative, disorganized, manic, and depressive factors. It has not been unusual to see (as in our study) negative symptoms combined in one factor with disorganized symptoms, positive with bizarre positive symptoms, and manic with depressive features.
Larger sample sizes are needed to determine whether significant associations with symptom dimensions can be detected for individual SNPs, genes, and pathways. We note that most of the best-supported genes in our study have functions (as summarized in Table 3) that could plausibly be related to schizophrenia, including involvement in known CNS diseases and roles in neurodevelopment, neuroprotection, and neurotransmission.
A number of methodological limitations of this study should be considered. We cannot rule out the possibility that other factor solutions (e.g., with disorganization, bizarre psychosis, mania, or depression symptoms in separate factors) or other rating scales or procedures might produce stronger genetic associations. We also lacked sufficient systematic information to study environmental variables, such as lifetime cannabis abuse, immigration, and urbanicity, which tend to exert their putative effects through early exposures that are difficult to capture retrospectively (24). Additionally, we lacked formal cognitive testing of subjects, which might shed light on whether the clinical ratings of disorganized symptoms were related to specific neuropsychological impairments. The most critical limitations are those that constrain the power of the analyses (as discussed above): sample size, which was insufficient to produce genome-wide significant association results for individual SNPs, and the imprecision with which clinical symptoms can be measured.
Regarding sample size, this is the largest schizophrenia genetics study with a single assessment protocol that included detailed lifetime symptom ratings by expert raters, and thus our results deserve to be considered separately as well as in combination with other samples that were rated by other methods. The PGC is undertaking such a cross-data-set analysis (in which we are taking part), which could shed additional light on whether significant associations can be observed between individual SNPs and symptom dimensions and whether the polygenic effect on negative/disorganized symptoms can be replicated and strengthened despite the need to combine different types of rating systems from different studies.
In conclusion, we carried out GWAS analyses of positive, negative/disorganized, and mood factor scores in 2,454 individuals with schizophrenia. No single SNP produced significant evidence for association at a genome-wide threshold, and thus larger samples will be required to search for these associations. However, a polygenic score analysis produced evidence that there is a relationship between negative/disorganized factor scores and the polygenic signal that is observed in cross-sample analyses of schizophrenia GWAS data sets, with further analyses suggesting that this effect was primarily due to disorganized symptoms (duration and severity of formal thought disorder and bizarre behavior). This suggests that at least part of the effect of multiple common SNPs is on the deteriorative course of illness that has generally been considered the hallmark of the syndrome.
The authors thank the study participants and research staff at the study sites.