Genetic factors clearly play a substantial role in the etiology of schizophrenia, as evidenced by family and twin studies that indicate a heritability of up to 80% for the disorder (1, 2). Although a number of replicated linkages have been reported, implicating multiple chromosomal regions (3–5), none of these linkage findings has led to cloning of causative genes for schizophrenia. Several neurobiologically plausible candidate genes have, however, been identified (6, 7).
An alternative to linkage and other agnostic analysis methods that may aid in the genetic dissection of complex diseases is interrogation of candidate genes thought to be associated with both the qualitative diagnostic category and quantitative endo- or intermediate phenotypes. This neurobiologically informed strategy utilizes existing knowledge of the underlying neural substrates of the disorder and may be particularly informative in unraveling the genetic architecture of schizophrenia. As part of the Consortium on the Genetics of Schizophrenia (COGS; 8–10), we constructed a custom single-nucleotide polymorphism (SNP) array containing 1,536 SNPs in 94 genes of relevance to schizophrenia and related phenotypes. We used information regarding putatively important neurobiological systems, as well as an extensive review of published linkage, association, and model organism studies, to identify and rank genes in terms of their level of importance in understanding schizophrenia. The resulting COGS SNP chip provides excellent coverage of many previously suggested candidate genes for schizophrenia, including AKT1, CHRNA7, COMT, DAO, DAOA, DISC1, DTNBP1, ERBB4, GRM3, GSK3B, NOS1AP, NRG1, PAFAH1B1, PPP3CC, PRODH, RELN, and RGS4 (6, 7), as well as several novel genes from putatively important pathways.
We utilized the COGS SNP chip to evaluate the associations of these 94 candidate genes with 12 heritable neurophysiological and neurocognitive endophenotypes that have been shown to be characteristically impaired in schizophrenia: prepulse inhibition of the startle response, P50 suppression, the antisaccade task for eye movements, continuous performance, letter-number span, verbal learning, abstraction and mental flexibility, face memory, spatial memory, spatial processing, sensorimotor dexterity, and emotion recognition. Our goal was not only to identify singular genetic associations with the COGS endophenotypes but also to assess the degree of pleiotropy (genetic associations with multiple endophenotypes). Since genes exhibiting pleiotropic effects across several endophenotypes may have far-reaching neurobehavioral implications, these genes may be optimal candidates to serve as biomarkers for early identification and intervention in schizophrenia in at-risk populations, as well as targets for treatment with novel pharmaceutical and psychosocial therapies. To confirm the collective significance of our findings, we also developed a novel multiple testing strategy, the bootstrap total significance test, which overcomes some of the limitations of similar methods currently used in genomics.
Families were ascertained through the identification of probands at each of the seven COGS sites who met the DSM-IV-TR criteria for schizophrenia on the basis of administration of the Diagnostic Interview for Genetic Studies (11) and the Family Interview for Genetic Studies (12). The minimal requirement for pedigree ascertainment was a schizophrenia proband, both parents, and at least one unaffected sibling. This sampling strategy provides greater potential for phenotypic contrasts between and among the siblings for quantitative genetic analyses. Additional affected and unaffected siblings were collected whenever possible. The age range was set a priori at 18–65 years. All subjects received urine toxicology screens for drugs of abuse before phenotyping (negative screens were required). The ascertainment and screening procedures, inclusion and exclusion criteria, and descriptive statistics for the study group are discussed in detail elsewhere (10). After a detailed description of study participation, written informed consent was obtained for each subject in accordance with the protocols of the local institutional review boards.
Each subject in the study group was assessed for 12 endophenotypes, as described elsewhere in detail (13, 14), all of which have been shown to be heritable (15). Prepulse inhibition was measured as the percentage inhibition of the startle reflex in response to a weak prestimulus with a 60-msec interval from prepulse to startle stimulus (16–18). P50 suppression was measured as the difference between the amplitudes of the P50 event-related potentials generated in response to conditioning and test stimuli presented with a 500-msec interstimulus interval (19, 20). Although the ratio is the more commonly used measure of P50 suppression, we have found the difference score to be more heritable in our COGS families (15). The “overlap” antisaccade task of oculomotor inhibition, which requires subjects to fixate on a central target and respond to a peripheral cue by looking in the opposite direction at the same distance, was measured as the ratio of correct antisaccades to total interpretable saccades (21, 22). The degraded stimulus version of the Continuous Performance Test (referred to later as “continuous performance”), a widely used measure of deficits in sustained, focused attention with a high perceptual load, was used to assess correct target detections and incorrect responses to nontargets (d′) (23 and 1999 software by K.H. Nuechterlein and R.F. Asarnow, version 8.11). The letter-number span, part of the Wechsler Memory Scale, is a prototypical task to assess storage of working memory information with manipulation; it requires the correct reordering of intermixed numbers and letters. For the assessment of verbal learning and memory, we used the California Verbal Learning Test (24) (“verbal learning”), an established list-learning test that provides a total score for recall of a list of 16 verbally presented items summed over five trials.
We also employed a modified version of the University of Pennsylvania Computerized Neurocognitive Battery (25, 26), excluding measures of attention and verbal and working memory, which were assessed as detailed in the preceding. Six measures were evaluated by using this battery. The test for abstraction and mental flexibility (“abstraction”) presents four objects from which the subject must choose the one that does not belong. An assessment of face memory requires subjects to recognize 20 previously presented target faces among 20 distracter faces. The assessment of spatial memory uses euclidean shapes as learning stimuli in a recognition paradigm identical to that used for face memory. For an assessment of spatial processing, two lines are presented at an angle, and the corresponding lines must be identified on a simultaneously presented array. The assessment of sensorimotor dexterity requires the subject to click with a mouse as quickly as possible on a target that gets increasingly smaller. The assessment of emotion recognition involves the correct identification of a variety of facial expressions of emotion. Each of these tests was measured as “efficiency,” calculated as accuracy/(log10 speed) and expressed as standard equivalents (z scores).
Genes of interest were identified and ranked in terms of their level of importance in understanding schizophrenia according to complementary information from a number of research domains: 1) linkage and association studies of schizophrenia and related phenotypes, 2) model organism, gene expression, and brain imaging studies, and 3) genetic networks and biological pathways relevant to schizophrenia. We mined public databases for general information about the genes and polymorphic variants of interest, including haplotype-tagging and potentially functional SNPs and those with previous reports of association with schizophrenia or related phenotypes. These data were then combined and compared to the list of SNPs available from Illumina, Inc. (San Diego, Calif.), for choice of the final 1,536 SNPs in 94 genes. A total of 1,417 haplotype-tagging SNPs obtained from the TAMAL web site (Technology And Money Are Limiting; 27) were selected from Caucasian HapMap populations (28) to efficiently interrogate 86 of the genes with an r2 threshold of 0.8 in our primarily (89%) Caucasian subjects. We included 5 kilobases (kb) of flanking sequence on either side of each gene to capture nearby regulatory elements in the tagged regions. The TAGGER SNP selection algorithm (29), with an aggressive tagging mode forcing all coding SNPs into the model, was used to select tagging SNPs for 76 of the genes. The SNP selection algorithm of Gabriel et al. (30) with a pairwise tagging mode was used to select tagging SNPs for an additional 10 genes to achieve sufficient gene coverage with the available SNPs. A combination of gene-spanning and putatively associated SNPs was used for the remaining eight genes because suitable tagging SNPs were not available. For CHRNA7, SNPs were selected within exons/introns 1–4 only, as the remainder of the gene cannot be screened because of a partial duplication, CHRFAM7A (31). The custom array includes 109 SNPs in 33 genes with reported evidence of association, 29 coding sequence variants in 17 genes (25 nonsynonymous and four synonymous), and 18 SNPs located in putative promoter regions or transcription factor binding sites. On average, there is one SNP per 10 kb for each gene with variance due to linkage disequilibrium patterns, SNP availability, etc. Minor allele frequencies for these SNPs ranged from 0.01 to 0.50, with an average of 0.23. The complete list of all 1,536 SNPs and 94 candidate genes included on the COGS SNP chip and the specific details from our research are available in Supplemental Table 1, which accompanies the online version of this article; this information includes RefSNP accession identification numbers (rs numbers), chromosomal locations, gene information, designation of SNPs (e.g., as tagging, coding, putatively functional, or associated, including p values and references), relevant sequence information, and minor allele frequencies for the four HapMap populations. Ingenuity Pathway Analysis (Ingenuity Systems, Redwood City, Calif.) was used to investigate the molecular interactions among the included genes and to provide information regarding pathway membership.
A group of 534 subjects from 130 families was selected for genotyping on the basis of the availability of locally collected blood from five of the seven COGS sites. For each family, both endophenotype data and DNA were available for all schizophrenia probands and at least one unaffected sibling, for a total of 130 sibling pairs discordant for schizophrenia. DNA was also available for 217 parents, 130 of whom were phenotyped. An additional 73 phenotyped siblings were included across the family set as well, 57 of whom were also genotyped and six of whom had schizophrenia. On average, cleaned endophenotype data were available for 370 (SD=41) subjects. This study group has more than 80% power to detect SNPs explaining 3% of the variance at p<0.01, 4% of the variance at p<10–3, and 5.5% of the variance at p<10–4.
+
Genotyping and Cleaning
Genotyping was performed by the Biomedical Genomics Laboratory at the University of California, San Diego, by means of an Illumina BeadStation 500 scanner (Illumina, San Diego) and 20 μl of genomic DNA at 50 ng/μl plated on 96-well plates with three positive controls per plate. Genotype data were cleaned by using Illumina's BeadStudio software, version 3. Each subject was evaluated across all 1,536 SNPs, and six subjects were excluded for having poor allele call rates, defined as an average call rate below 80% and a median genotype call score below 0.76. Each SNP was then evaluated across all remaining subjects, and 38 SNPs were excluded for having average call rates below 90% and cluster separation scores below 0.05. Another 95 SNPs were eliminated after a manual examination of all SNPs with call rates above 90% but cluster separation scores between 0.05 and 0.25. A total of 133 SNPs were thus removed, resulting in a 91.3% SNP assay conversion rate. An additional 0.03% of the genotypes were removed because of Mendelian inconsistencies. The final group of 1,403 passing SNPs had a genotype call rate of 99.98% (749,052 genotypes called out of a possible 749,202). Accuracy estimated from 72 replicate DNA samples genotyped across the panel indicated a 99.98% reproducibility rate (100,139 identical genotypes out of a possible 100,163). Further quality control assessments using the PLINK analysis tool set (32) identified 15 SNPs with minor allele frequencies less than 0.01 in the unrelated individuals (i.e., parents) and three SNPs with Hardy-Weinberg equilibrium p values less than 10–4. Removal of these additional SNPs resulted in the final 1,385 SNPs with minor allele frequencies approximating those observed in the Caucasian HapMap sample.
+
Covariate Selection and Population Stratification
Multidimensional scaling, as implemented in PLINK, was used to assess the degree of population stratification in this study group and to validate the self-reported subject ancestries, which are not always reliable. These results suggested that subjects of Caucasian ancestry formed the largest and most genetically homogenous group, encompassing 89% of the subjects. Although the remaining subjects reported varying degrees of Hispanic, Native American, Asian, and African American ancestry, most generally clustered with the Caucasian group. To further evaluate the effects of the observed admixture, the first two principal components from the multidimensional scaling analysis were investigated as covariates, along with age and sex, through heritability analyses of the endophenotypes using SOLAR software (33). All factors found to be significant covariates were incorporated in the subsequent association analyses. Bivariate genetic correlations between endophenotypes were also explored. The heritability estimates and genetic correlations obtained in these analyses were similar to those we previously reported in a larger group that includes the current subjects (15), and those findings are not reiterated here. Schizophrenia was not included as a covariate in these analyses, since that would effectively remove the part of the gene-endophenotype association specifically related to schizophrenia. Therefore, the analysis could not reveal significant SNP associations with an endophenotype perfectly correlated with schizophrenia status, no matter the causal pathway between genotype and endophenotype.
+
Variance-Component Family Association Analyses
We employed the variance-component association module of the Merlin software package, version 1.1.2 (34), to assess the degree of association between the 1,385 SNPs and 12 endophenotypes in the 130 families. The association analyses were adjusted for age (all endophenotypes except P50 suppression), sex (prepulse inhibition, P50 suppression, verbal learning, spatial processing, emotion recognition), and ancestry as the first principal component from the multidimensional scaling analysis (antisaccade, continuous performance, verbal learning, spatial memory, spatial processing). A secondary analysis was also performed in Merlin to assess the independence of multiple associations by using the most significant SNP as a covariate, thereby decreasing the significance of any other SNPs in linkage disequilibrium with it. Independent signals were considered those remaining at p<0.01. For comparison purposes, the DFAM module of PLINK was used to perform an analysis of sibling pairs discordant for schizophrenia with the 1,361 autosomal SNPs. The effective number of independent SNPs tested, accounting for redundancies in linkage disequilibrium due to the inclusion of putatively functional and/or associated SNPS along with tagging SNPs and gene-spanning SNPs, was determined to be 977, with a corresponding Bonferroni correction for multiple comparisons of p=5×10–5 (35) for a given endophenotype and 4.2×10–6 for all 12 endophenotypes. The latter number is very conservative because of the observed between-endophenotype correlations, which complicate exact adjustment across endophenotypes. The multiple testing issue is further addressed by the total significance test, described in the following.
+
Total Significance Test for Multiple Tests
To test whether the observed genotype-endophenotype associations significantly exceed what would be seen by chance given that there are 16,620 total tests (1,385 SNPs and 12 endophenotypes), we developed and implemented a separate, novel multiple testing strategy, the bootstrap total significance test. Our strategy introduces two innovations that together overcome several limitations of existing genomic multiple testing methods.
First, we base our approach on bootstrap sampling instead of permutation sampling. The bootstrap works in settings where permutation tests cannot be applied or can be applied only with difficulty (36). Bootstrapping allows straightforward handling of family data even with complex patterns of missing data. In contrast, permutation procedures for family data are difficult to construct and do not use all information in the data if genetic variants drive phenotypic differences between families (36). Bootstrapping also handles confounding variables easily. Permutation tests do not, as the confounder is potentially associated with both predictor and outcome under the null hypothesis. Most important, this problem arises when covariates are included to adjust for cryptic population stratification. Bootstrapping will also work when the goal is to test an interaction in the presence of main effects.
Second, we introduce the concept of a total significance test to determine whether the strongest genotype-endophenotype associations are more extreme than expected by chance alone. The total significance test provides a rigorous statistical p value that collectively applies to the strongest results in the data but is less conservative than standard p value adjustments for multiple tests. This test is less dependent than other multiple testing methods on extremely small p values, which are difficult to obtain with moderate group sizes and may, even in large groups, be due more to rare sampling events or statistical flukes than to replicable biological findings. Last, we use the results of the bootstrap total significance test to provide an a posteriori predictive value for each genotype-endophenotype association, giving a measure of how likely each detected association is to be true. When the preceding factors are not present and bootstrapping is not required, the total significance test can also be based on permutation sampling.
We implemented the bootstrap total significance test in MatLab (MathWorks, Natick, Mass.). Specifically, we first applied a multiple regression model to the original data for each of the 1,385×12 SNP-endophenotype combinations, with the endophenotype as the dependent variable and the SNP (coded as the number of copies of the minor allele) and relevant covariates (those used in the variance-component Merlin analyses) as independent predictors. Thus, the multiple regression model used in the total significance test was identical to the variance-component model used in the Merlin analyses, with the sole exception of the within-family correlation structure. For each SNP-endophenotype combination in the original data, we calculated a z statistic, Z=(B–0)/s, where B is the estimated regression coefficient corresponding to the SNP and s is its estimated standard error (SE) in the multiple regression. The value 0 is the expected value of B under the null hypothesis of no SNP-endophenotype association.
We then simulated the same statistics under the null hypothesis by generating one group of 10,000 random bootstrap data sets (the training group) and a second group of 1,000 identically generated bootstrap data sets (the test group), following standard bootstrap theory for clustered data (37). To create each bootstrap data set, we randomly selected families from the original set of 130 families with replacement (i.e., families were selected randomly without respect to whether they were previously selected). Any one of the original families can appear multiple times in a single bootstrap data set or not at all. Each time a family appears, all data associated with it are placed in the new data set, including all family members, their covariates, endophenotypes, and genotypes. No data are rearranged, as they would be in a permutation test. For each bootstrap data set, z statistics were calculated as Z*=(B*–B)/s*, where B* and s* are the regression coefficient and SE calculated from the multiple regression applied to the bootstrap data. For a bootstrap sample, the true null hypothesis is given by the value estimated in the original data set, so that B replaces 0 in the formula for Z. The bootstrap Z* values provide an empirical estimate of the joint distribution of the Z values in the original data set, under the null hypothesis of no association. This is an application of standard bootstrap theory that is used, for example, in construction of the bootstrap-t confidence interval (37). The bootstrap automatically incorporates the empirically observed family-level correlation structure without relying on the assumption of a multivariate normal distribution, as well as inter-SNP correlations due to linkage disequilibrium.
To evaluate the p value for the total significance test, we then compared the test statistics for the original data to their bootstrap distributions. For each Z, we evaluated whether it was so extreme as to fall outside the range of Z* values for the same SNP-endophenotype combination in the 10,000 training bootstrap data sets. As each SNP-endophenotype combination is compared to its own distinct bootstrap distribution, an advantage of our approach is that there is no implicit assumption about exchangeability or identically distributed SNPs or endophenotypes. Let T0 be the total number of tested associations in the original data for which Z is either less than the minimum Z* or more than the maximum Z* in the 10,000 data set training group. Similarly, let T0* be the total number for each of the 1,000 independent bootstrap data sets in the test group, also based on comparison to the 10,000 data set training group. The collective p value for T0 is the proportion of bootstrap test data sets for which T0*≥T0. The question addressed by the total significance test is different from that for the “no family-wise error” criterion provided by a Bonferroni correction or a traditional permutation test. Thus, p values from the total significance test are not comparable to those provided by these other methods.
To obtain a posterior predictive value for the associations that were so significant as to be outside the range of the training group, we calculated the expected number of false positives F0 at this level as the averageT0* in the 1,000 training data sets. The posterior predictive value for all associations in this initial group was then calculated as (R0–F0)/R0, where R0 is the number actually out of range in the original data.
We then extended this approach to determine if somewhat weaker results, those within the range of the tails of the bootstrap distribution, also significantly exceeded results expected by chance. Let T1 and T1* refer to totals based on comparison to the training group after the smallest and largest Z* values for each SNP-endophenotype combination are discarded. The subscript denotes the number discarded. A cumulative p value for T1 was calculated as the proportion of bootstrap data sets in the test group for which either
T1*≥T1, T0*≥ T0, or both. T2, T3, and so on were computed, and a cumulative p value was calculated for each, with consideration of all prior tests of greater stringency. This p value simultaneously accounts for all stronger results and must increase sequentially. We considered all results satisfying a total cumulative, collective p value of ≤0.05 to be significant by the total significance and calculated posterior predictive values for each analogous to those described in the preceding.
+
Variance-Component Family Association Analyses
The results of the single-marker variance-component analyses implemented in Merlin, as shown in Figure 1 and detailed in the online Supplemental Table 2, revealed associations between the 12 endophenotypes and 46 of the 94 genes collectively. There were three SNPs having associations with p<10–4, 27 SNPs with p<10–3, and 147 SNPs with p<0.01, all of which may be of interest, given the a priori selection of these genes. There were 22 genes associated with at least one endophenotype at p<10–3, as indicated in Figure 1. The most significant finding in these analyses was the association of an SNP in NRG1 with spatial processing, which gave a p value of 6.4×10–6 and explained 6.9% of the genetic variation in this endophenotype. Two other SNPs gave p values less than 10–4, in GRIK4 (p=8.3×10–5) and CHRNA4 (p=9.0×10–5), explaining 5.4% and 4.5% of the genetic variation in antisaccade and sensorimotor dexterity, respectively. We also found evidence to support associations with four nonsynonymous SNPs and one synonymous SNP: GRM1 Gly884Glu (p=1.1×10–3 for verbal learning), NRG1 Arg38Gln (p=5.6×10–4 for verbal learning), SLC18A1 Val392Leu (p=9.7×10–3 for antisaccade), TAAR6 Val265Ile (p=1.1×10–3 for continuous performance), and HTR2A Ser34Ser (p=9.0×10–3 for letter-number span).
Figure 2 provides a summary of the minimum p value observed for each gene and endophenotype with the number of independent associations indicated, highlighting the associations of genes across endophenotypic domains. Although half of these genes were found to be associated (p<0.01) with two or more endophenotypes, eight genes in particular (CTNNA2, DISC1, ERBB4, GRID2, GRM1, NOS1AP, NRG1, and RELN) displayed extensive evidence for pleiotropy, revealing associations with four or more endophenotypes in this data set. In contrast, other genes (e.g., GRM3) were found to be associated with a single endophenotype (e.g., P50 suppression). Bivariate analyses revealed genetic correlations between continuous performance, abstraction, spatial processing, and emotion recognition that remained significant following correction for multiple testing (data presented elsewhere; 15). The genes that were generally associated with all of these four endophenotypes in the Merlin analyses were CTNNA2, GRM1, and RELN.
The COGS SNP chip includes a total of 40 genes that have shown prior allelic or haplotypic associations with schizophrenia or related endophenotypes. Specific SNPs for which evidence of association has been previously reported in the literature were included for 33 of these genes. Although associations with schizophrenia have also been reported for DRD2 (38), DRD4 (39), GRM4 (40), NRG1 (41–45), PPP3CC (46), PRODH (47), and SLC1A2 (48), we were unable to include the specifically associated polymorphisms on this array because quality genotyping assays using this method were not available for these SNPs. The SNPs included on the array for 33 of the genes with prior evidence of association are presented in Table 1 with a comparison of associations in the previous and current studies. We have found evidence for association of 25 of the 40 previously associated genes (AKT1, CHRNA7, COMT, DAO, DISC1, DRD2, DRD3, ERBB4, GABRB2, GRID1, GRIK3, GRIK4, GRIN1, GRIN2B, GRM3, GRM4, HTR2A, NCAM1, NRG1, PRODH, SLC18A1, SLC1A2, SLC6A3, SP4, and TAAR6) with one or more endophenotypes, as detailed in Figure 2, including associations with 10 specific SNPs previously reported to be associated with schizophrenia (see Figure 2, Table 1, and online Supplemental Tables 1 and 2). The majority of the associations with specific SNPs (eight of 10) were in the same direction as in the previous studies. Although this study group was not recruited for an assessment of schizophrenia and is thus not well powered for this purpose, an analysis of discordant sibling pairs did indicate associations of SNPs within ERBB4, HTR4, and GRM5 with schizophrenia as well (p<0.01, data not presented). We did not find evidence for association of any endophenotype with ADRBK2, BDNF, CACNG2, DAOA, DGCR2, DRD4, DTNBP1, GAD1, HTR7, NEUROG1, NOTCH4, PPP1R1B, PPP3CC, RGS4, or ZDHHC8, despite previous reports of associations with schizophrenia.
As shown in Figure 3, the genes included on the COGS SNP chip cluster into several putatively important pathways, including cell signal transduction, axonal guidance signaling, amino acid metabolism, and glutamate, serotonin, dopamine, and γ-aminobutyric acid (GABA) receptor signaling. The 46 genes found to be associated with at least one endophenotype were distributed among all of these pathways, with notably higher concentrations of associated genes observed in the glutamate signaling pathway. Of the 16 genes tested in the glutamate pathway, 14 revealed associations with at least one endophenotype, 10 of which were associated with two or more endophenotypes. Figure 4 further details the molecular interactions of a subset of the genes on the chip, highlighting the interactions between many of the 46 genes associated with at least one endophenotype. These data reveal a network of genes directly or indirectly related to glutamate signaling and suggest that disturbances of this pathway may contribute to schizophrenia susceptibility.
+
Total Significance Test for Multiple Tests
Given that the association analyses involved 16,620 tests (1,385 SNPs and 12 endophenotypes), we expect some positive results due to chance. We therefore developed the bootstrap total significance test to evaluate whether there were more highly significant findings than would be expected by chance alone. Forty-seven of the z statistics in the original data were entirely outside the range observed in 10,000 bootstrap training data sets, simulated under the null hypothesis. The median number of such z statistics in the 1,000 test data sets was only two, and in 95% of the test data sets it was at most 12. Only one test data set yielded the 47 out-of-range z statistics seen in the observed data (p=0.001). These 47 findings have an estimated posterior predictive value of 93%.
We extended the total significance test sequentially to identify 292 SNP-endophenotype associations that collectively satisfied a cumulative p value of 0.05, discarding the 40 lowest and 40 highest bootstrap values for each test in each training data set. The corresponding posterior predictive value is 53%, indicating that each of these 292 findings more likely than not represents a true positive result. As a less stringent criterion is used and more values are trimmed from each end of the bootstrap training distribution, the posterior predictive value decreases (i.e., the corresponding results include more false positives) and results become less significant. The 292 most significant findings are summarized in Table 2 by gene, along with their significance in the separate variance-component analyses (see online Supplemental Table 3 for a complete list). Of the 94 genes on the array, 55 contained at least one SNP with an a posteriori chance of 53% or greater of being a true finding of association with at least one endophenotype. For the 12 endophenotypes, the number of significant findings ranged from 14 to 34. Negative results obtained through this approach should not be overinterpreted, since they are based only on the most significant associations in the data. Failure for a gene to have a test reaching this strict level of significance does not preclude the existence of more modest levels of association.
This study combined the analysis of 94 neurobiologically relevant genes and 12 heritable endophenotypes involving schizophrenia-related deficits (15), identifying an interesting pattern of association results for 46 genes across all endophenotypes. Given the observed correlations between many of these endophenotypes (15), we expect that some genes will exhibit pleiotropy and contribute to the variance in two or more endophenotypes. Additionally, some of the genes, such as NRG1, have been shown to play a role in neurodevelopment and therefore may affect more than one physiological or cognitive function. Even with the limited number of genes tested here, we do indeed find evidence of this pleiotropy. Of the eight genes revealing extensive evidence for pleiotropy across the 12 endophenotypes, six genes (ERBB4, GRID2, GRM1, NOS1AP, NRG1, and RELN) involved either directly or indirectly in glutamate signaling featured prominently with associations with five or more endophenotypes. We also observed association for 14 of 16 tested genes in the glutamate signaling pathway with at least one endophenotype, 10 of which were associated with two or more endophenotypes. These results are consistent with the glutamate hypothesis, which proposes that compromised N-methyl-d-aspartate (NMDA) receptor function contributes to the development of schizophrenia (117, 118), and the observation of a disproportionate disruption of genes in the neuregulin and glutamate pathways in schizophrenia patients (119). Collectively, these results support a strong role for genes involved in glutamate signaling in mediating schizophrenia susceptibility.
The associations of NRG1 and ERBB4 with five and eight endophenotypes, respectively, in this study add to the growing body of human molecular genetic studies implicating these genes, offering a compelling picture of the importance of neuregulin-mediated ErbB4 signaling in the pathophysiology of schizophrenia and its associated heritable deficits (83, 120–122). The successful use of endophenotypes for schizophrenia in model organism studies provides additional support for the involvement of NRG1 in schizophrenia, as well as for this strategy of gene identification. For example, murine NRG1 hypomorphs show deficits in prepulse inhibition (120). Such deficits are well documented in schizophrenia patients (123–126) and were found to be associated with NRG1 in our analyses. Neuregulin-1 is a trophic factor that signals through the activation of the ErbB receptor tyrosine kinases, such as ErbB4. ErbB4 plays a crucial role in neurodevelopment and in the modulation of NMDA receptor signaling, processes often disturbed in schizophrenia (127–129). Neuregulin-mediated ErbB4 signaling has thus become an important pathway of consideration in schizophrenia research.
Custom SNP arrays, such as the COGS chip and the addiction array (130), have several advantages. They are affordable, are flexible with regard to the inclusion of desired variants, are focused by strong inference-based candidate gene selection to achieve disease specificity (for example, see reference 131), and may be much more feasible for use with smaller, yet well-defined, study groups that are underpowered for genome-wide association studies. Although more comprehensive, genome-wide arrays are nonspecific with regard to disease and may thus lack adequate representation of specific SNPs that have either been associated with or are thought to be of biological relevance to a particular disease. Some genes of interest, particularly smaller ones, may also be represented with insufficient coverage (e.g., SLC6A3) or not at all (e.g., DRD4) on genome-wide arrays. Large-scale analyses of candidate genes by means of custom arrays may therefore complement genome-wide association studies for investigators interested in specific genes and SNPs relevant to a particular disorder. This new array can serve as a publicly available resource for other investigators studying schizophrenia and related phenotypes, with the flexibility for modification of the SNP list to optimize it for the particular focus of the research group.
The novel bootstrap-based total significance test developed for this study demonstrates the overall significance of the COGS SNP chip and the associated endophenotypes. This total significance test goes beyond current multiple testing methods in order to provide a collective test of significance for the strongest results in an entire data set (or, if desired, over an individual gene or pathway), as well as to address situations where simple permutation schemes are not available, such as for family data and confounders that, by assumption, are associated with both genotype and phenotype (e.g., population stratification). Furthermore, it allows for the assignment of meaningful posterior predictive values to individual test results in the context of multiple testing. Limitations of the total significance test include its focus on the most significant test results, while ignoring the contribution of more modest association results. In addition, it was not practical, both in terms of software development and in terms of computer time, to embed the Merlin variance-component, pedigree-based analysis within the computationally complex total significance test. The bootstrap for clustered data (i.e., family data) is a well-validated statistical tool for obtaining accurate significance levels in this situation and is expected to correctly calibrate the statistical inferences for this limitation, but a total significance test that included within-family correlations in its statistical model might yield somewhat more efficient and powerful results than the present version.
Some caveats should be noted regarding this study. First, genetic analyses of schizophrenia are replete with failures to replicate previous findings (e.g., 132), despite the striking heritability of the disorder (2). Such failures are understandable in the context of ascertainment biases, population stratification, and cohort variance due to such factors as gender, smoking, treatment, and age at onset. Here, too, we have found no evidence for association with some prominent schizophrenia candidate genes, such as DAOA, DRD4, DTNBP1, PPP3CC, and RGS4 (6, 7). However, we have found further evidence to support association with 25 genes that have previously been reported to be associated with schizophrenia, including several specific SNPs for which the effect was in the same direction as in the previous study. Second, the family ascertainment scheme in this study focused on endophenotypes associated with schizophrenia and may thus be underpowered for detecting genetic variants associated with the disorder itself, and as might be expected for a heterogeneous disorder such as schizophrenia, not all individuals exhibit deficits across all of the endophenotypes studied. Additionally, antipsychotic medications may affect these results, although they tend to normalize endophenotypic scores, thus reducing, rather than increasing, the probability of significant associations. Although our study group was primarily (89%) of Caucasian ancestry, with most other subjects of partial Caucasian ancestry, we must also consider the possible confound of genetic admixture. We have used multidimensional scaling components as a measurement of ancestry to correct for this admixture in our analyses. We also note that allele frequencies from the Caucasian and African HapMap populations show an average difference of only 4% across our SNPs with p<0.01 and 6% across SNPs with p<10–3. Last, the degree of allelic, locus, and phenotypic heterogeneity associated with complex disorders now appears to be far more extensive than previously appreciated, with substantial contributions of rare de novo genetic variants, as well as epigenetic and environmental effects, none of which were assessed in this study (133).
Thus, we have observed many interesting associations between our endophenotypes and genes thought to be of biological relevance to schizophrenia. The extensive pleiotropy for some genes and singular associations for others in our data suggest alternative, independent pathways mediating schizophrenia pathogenesis. Further analyses of the genes associated with each of the endophenotypes will likely provide information regarding the underlying genetic pathways involved in schizophrenia susceptibility, as well as information regarding the interaction among these endophenotypes within the disorder. The illumination of the genetic basis of schizophrenia offers the exciting possibilities of early detection of the disorder and identification of novel pharmacologic targets to facilitate therapeutic intervention.