Copy number variants (CNVs) are deletions or duplications of DNA segments. Those as small as 10,000—100,000 base pairs (bp) can be detected by analyzing variations of fluorescent intensity from microarrays used in genome-wide association studies (GWAS). There are replicated associations of schizophrenia with rare CNVs on chromosomes 1q21.1 (1, 2), 15q13.3 (1, 2), and 16p11.2 (3), with suggestive support for exon-disrupting deletions in the gene for neurexin-1 (NRXN1) (see Table S1 in the data supplement accompanying the online version of this article) (4, 5). Cytogenetic methods were previously used in detecting the chromosome 22q11.21 deletions seen in patients with DiGeorge or velocardio-facial syndrome, of whom 20%—30% develop schizophrenia (6). These CNVs substantially increase the risk of schizophrenia. Remarkably, each of them is also associated with autism spectrum disorders, mental retardation, and epilepsy (6). An overall increase in the number of CNVs has been reported in individuals with schizophrenia versus comparison subjects (2, 7, 8), suggesting that additional pathogenic CNVs remain to be identified.
Here we report on a genome-wide study of rare autosomal CNVs in 3,945 subjects with schizophrenia or schizoaffective disorder and 3,611 screened comparison subjects from the Molecular Genetics of Schizophrenia study (MGS). The results support the four multigenic CNV associations already noted and establish NRXN1 as a specific gene associated with schizophrenia. Using data from MGS and other available data sets, we also report suggestive evidence for association with additional CNVs, including a 1.6-Mb deletion on chromosome 3q29 previously observed in individuals with mental retardation, autistic features, and/or microcephaly (9); exonic duplications in VIPR2, the gene for vasoactive intestinal peptide receptor 2, a receptor for peptides with hypothesized roles in autism (10) and schizophrenia (11); and exonic duplications in C16orf72, whose function is unknown.
Subjects and DNA Specimens
Clinical methods were described elsewhere (12). Briefly, subjects affected by schizophrenia or schizoaffective disorder (case subjects) who either were of European ancestry or were African American were recruited by 10 university-based sites in the United States and Australia under a common protocol. They received consensus diagnoses of DSM-IV schizophrenia (90%) or schizoaffective disorder (with schizophrenia criterion A for at least 6 months) based on available information from interviews, informants, and medical records. The comparison subjects were recruited through a nationally representative survey research panel (Knowledge Networks, Inc., Menlo Park, Calif.) (100% of subjects with European ancestry, 41% of African Americans) and Internet banner ads (Survey Sampling International, Shelton, Conn.) (59% of African Americans). They denied a history or treatment of psychosis or bipolar disorder in an online questionnaire. Table 1 describes the two groups.
Sources of Data for Analyses of Copy Number Variants (CNVs) in Schizophrenia
| Add to My POL
|Subjects in Molecular Genetics of Schizophrenia Study (MGS)||Comparison Subjects (Children's Hospital of Philadelphia)bb|
|Group||Total||DNA From Lymphoblastic Cell Lines||DNA From Blood||Subjects in International Schizophrenia Consortium (ISC) Data Set With CNVs >100 kbaa||Assayed With Illumina 550K Array (13)||Assayed With Illumina 610K Arraycc||All Subjects|
|Subjects with schizophrenia or schizoaffective disorder|
| European ancestry||2,671||1,998||673||3,391||6,062|
| African American||1,274||1,004||270||0||1,274|
| European ancestry||2,648||2,623||25||3,181||886||3,532||10,243|
| African American||963||962||1||0||582||3,029||4,574|
Most DNA specimens were extracted from Epstein-Barr virus-transformed lymphoblastic cell lines. Some were extracted from blood (primarily for the case subjects, for whom the National Institute of Mental Health repository expected fewer access requests) (Table 1). Because Epstein-Barr virus transformation can create CNVs (14), we excluded samples and chromosomal regions with possible artifacts and tested the case subjects for CNV differences between DNA from lymphoblastic cell lines and blood. The lymphoblastic cell lines from the case and comparison groups had similar estimated doubling times (25—30) prior to cryopreservation.
GWAS Assay and Detection of CNVs
The specimens were assayed at the Broad Institute, Cambridge, Mass., by using Affymetrix 6.0 genotyping arrays (Affymetrix, Santa Clara, Calif.); the assays included approximately 900,000 single-nucleotide polymorphisms (SNPs) plus approximately 900,000 copy number probes. CNVs were detected, or "called," with the Birdseye module of the Birdsuite software package (15), version 2 (internal version 1.3), which uses a hidden Markov model algorithm. The data were normalized within plates of up to 92 DNA samples. HG18 human genome build locations are reported.
CNV Quality Control Analyses
We relied on four quality control steps to reduce the calling errors expected to result from background variations in probe fluorescent intensities.
CNV call quality control.
Separately for each copy number, we identified criteria that maximized concordance between duplicate assays for 151 specimens (although some errors are concordant). We merged nearby pairs (or sequential pairs) of deleted or duplicated segments flanking a "normal" segment containing less than 20% of the probes in the merged CNV (primarily in segmental duplication regions); alternative merger procedures did not achieve better concordance. Table S2 in the online data supplement lists narrow and broad call criteria, which produced concordance rates of 93% and 83%—84%, respectively, for deletions and 78% and 72% for duplications; calls for larger duplications (50 or more probes) were more concordant (82%). We also excluded CNV calls (Table S3 in online data supplement) that overlapped (50%) with telomeres (100,000 bp) and centromeres, where CNV calls may be unreliable, or immunoglobulin gene regions where Epstein-Barr virus transformation causes structural changes (14). We also excluded CNVs seen predominantly on one or two plates, suggesting artifact. See Table S4 in the online data supplement for further details.
Quality control of DNA samples.
We excluded 1) samples with total numbers of narrowly defined deletions or duplications exceeding the group mean by 3 standard deviations, 2) those with more than two chromosomes with outlier call numbers, 3) data for outlier chromo-somes for subjects with one or two such chromosomes, and 4) samples (mostly lymphoblastic cell lines) with probe intensity variances exceeding the group mean by 4 standard deviations (predicting fewer CNV calls).
For 633 CNV calls in 36 regions of interest (to be described in the following), visual inspection of plots of (log of the mean intensity of probes for each location divided by the plate mean) supported 97.9% of the calls (including all of the calls reported here for new and confirmed CNV findings), and a second calling algorithm (16) confirmed 92.6% on the basis of point-by-point copy number estimates. Although agreement of two or more algorithms has been required for CNV calls by some studies (8, 17), we were unable to improve duplicate concordance with a second algorithm (14) and relied instead on the algorithm developed specifically for this platform (15).
Quantitative polymerase chain reaction.
Selected CNVs were confirmed with quantitative polymerase chain reaction (qPCR) (Table S5, online data supplement).
Analysis of CNV Association
Narrowly and broadly defined deletion and duplication data sets were created for each ethnic group and for all subjects. Using PLINK software (18), we scanned the genome with pointwise analyses for each file, for all rare CNVs (with <1% frequency) and those of more than 100,000 bp, and for DNA samples of case subjects derived from lymphoblastic cell lines versus blood. PLINK defines points as the start and end of all CNVs plus 1 bp beyond each endpoint, counts the number of CNVs at each point, and excludes CNVs with a specified length overlap (here, 50%) with points having CNVs in a specified proportion of subjects (here, 1%). One-sided pointwise nominal and genome-wide empirical p values were computed from 50,000 permutations of case-control status. Regions containing points with empirical genome-wide p values less than 1 (suggestive association) were examined to discern effects of the call criteria or DNA source and to identify the segments contributing to the signal. Then, counts of CNVs disrupting at least one exon were determined for each RefSeq gene (according to HG18 locations). Suggestive associations were observed only for genes with total frequencies less than 0.5%. On the basis of pointwise and exonic results, regions and genes were identified for visual and experimental validation and for analysis using additional data sets.
After excluding the five regions already shown to be associated with schizophrenia and CNVs greater than 4,000,000 bp (which showed a large excess of lymphoblastic cell line specimens but no case-control difference), we performed case-control analyses of the number of CNVs per subject genome-wide (using PLINK) for deletions or duplications in five size ranges and for gene-disrupting, exon-disrupting, and "singleton" (unique in the data set) CNVs; we also analyzed the numbers of CNVs in DNA samples from lymphoblastic cells versus blood (in case subjects).
For 1q21.1 and 15q13.3, we incorporated published data from the International Schizophrenia Consortium (ISC) (2) and the SGENE collaboration (1), which included deCODE Genetics (Reykjavik, Iceland); the Scottish data were omitted from the SGENE report because they were also used by ISC. For 16p11.2 duplications, we included ISC and a published meta-analysis (without the data set from the Genetic Association Information Network, which is from MGS) (3). We added MGS and ISC data to a meta-analysis of exonic NRXN1 deletions (deleting the Bulgarian data, which overlap with those in ISC) (4). These large CNVs should be well assayed by the diverse platforms.
For candidate regions, we used publicly available ISC data for rare CNVs larger than 100 kb (http://pngu.mgh.harvard.edu/isc/isc-r1.cnv.bed) (Affymetrix 500K assay, 5.0 or 6.0 arrays) and two childhood comparison data sets from Children's Hospital of Philadelphia primary care clinics: 1,464 unrelated children ages 0-18 with no recorded serious medical or neurodevelopmental diagnoses (assessed with Illumina 550K arrays [Illumina, San Diego] and the DNAcopy 1.7 module of the R statistical package) (13) and 6,561 unscreened children (mean age=12.75 years, SD=4.2) (assessed with Illumina 610K arrays and PennCNV software ; data provided by H.H. and K.W.). Illumina calls containing three or more probes were counted. Because the numbers of probes differed in each region, the analyses with the Philadelphia comparison subjects were exploratory, but virtually identical CNVs were detected by each platform in these regions, with similar frequencies in the Philadelphia and MGS comparison subjects. The use of childhood comparison subjects is justified because unscreened comparison subjects do not reduce the power for rare diseases (i.e., those with less than 1% frequency). Any excess of un-detected neuropsychiatric disorders (expected to be small in these primary care patients) would make these analyses conservative.
Statistical Analyses and Thresholds of Significance
For candidate CNVs (defined by narrow call criteria), two types of association tests were performed: Fisher's exact case-control tests on the pooled groups and the (one-sided) stratified Cochran-Mantel-Haenszel exact test (19) (http://sekhon.berkeley.edu/stats/html/mantelhaen.test.html). These tests differ when one or more strata have imbalances in the case-control ratio, which is the case for some of the additional data sets included in our analyses. For Cochran-Mantel-Haenszel tests, we separated the MGS European-ancestry and African American groups and other data sets.
In the online data supplement (Supplementary Methods, p. S18) we discuss the problem of selecting exact test thresholds for genome-wide significant association (corrected p<0.05) and suggestive association (expected less than once per genome-wide study). As guidelines for interpreting results, we suggest thresholds of p<10−5 for significant and p<0.0005 for suggestive association.
We examined clinical data for case subjects with selected CNVs versus all other case subjects.
Table 2 shows results for the candidate CNVs that showed the strongest evidence for association in all of the available data. Suggestive association (p<0.0005) was observed, after addition of the Children's Hospital of Philadelphia comparison subjects, for C16orf72 exonic duplications and 1.6-Mb 3q29 deletions. VIPR2 duplications showed consistency across all available data sets. Figure 1 shows plots of large 3q29 deletions in five MGS case subjects.
Most Significant New Association Results for Copy Number Variants (CNVs) in Schizophrenia
| Add to My POL
|Group or Analysis||VIPR2: Chromosome 7, 158.51-158.63 Mb, Exonic Duplications||AGTPBP1: Chromosome 9, 87.35-87.55 Mb, Exonic Duplications||GLB1L3/2: Chromosome 11, 133.65-133.69 Mb, Exonic Deletions||C16orf72: Chromosome 16, 9.09-9.12 Mb, Exonic Duplications||NEDD4L: Chromosome 18, 53.86-54.22 Mb, Exonic Duplications||3q29 Multigenic CNV: Chromosome 3, 197.2-198.83 Mb, Deletions||3q26.1 Intergenic CNV: Chromosome 3, 165.61-165.66 Mb, Deletions|
|Molecular Genetics of Schizophrenia study (MGS) (current study)|
| European-ancestry case subjects||7||2,664||4||2,667||6||2,665||8||2,663||0||2,671||4||2,667||2||2,669|
| European-ancestry comparison subjects||1||2,647||0||2,648||2||2,646||0||2,648||0||2,648||0||2,648||0||2,648|
| African American case subjects||3||1,271||1||1,273||9||1,265||2||1,272||6||1,268||1||1,273||3||1,271|
| African American comparison subjects||1||962||0||963||1||962||0||963||1||962||0||963||0||963|
| Total case subjects||10||3,935||5||3,940||15||3,930||10||3,935||6||3,939||5||3,940||5||3,940|
| Total comparison subjects||2||3,609||0||3,611||3||3,608||0||3,611||1||3,610||0||3,611||0||3,611|
|p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)|
|Meta-analysis, Cochran-Mantel-Haenszel exact testaa||0.03||0.04||0.009||0.002||0.12||0.04||0.05|
| One-sided odds ratio (CI, lower bound)bb||4.6 (1.2)||(1.1)||4.3 (1.4)||(2.7)||4.5 (0.7)||(1.1)||(1.0)|
| Two-sided odds ratio (CI)||4.6 (1.0-42.9)||(0.9)||4.3 (1.2-23.2)||(2.1)||4.5 (0.6-209.5)||(0.9)||(0.8)|
|Pooled analysis, Fisher's exact testaa||0.03||0.04||0.007||0.002||0.08||0.04||0.04|
| One-sided odds ratio (CI, lower bound)b||4.6 (1.2)||(1.1)||4.6 (1.5)||(2.6)||5.5 (0.8)||(1.1)||(1.1)|
| Two-sided odds ratio (CI)||4.6 (1.0)||(0.8)||4.6 (1.3-24.8)||(2.1)||5.5 (0.7-252.7)||(0.9)||(0.9)|
|International Schizophrenia Consortium study (ISC)cc|
| Case subjects||4||3,387||—||—||—||—||—||—||—||—||2||3,389||—||—|
| Comparison subjects||0||3,181||—||—||—||—||—||—||—||—||0||3,181||—||—|
|p||Odds Ratio (CI)||p||Odds Ratio (CI)|
|Meta-analysis of MGS and ISC studies, Cochran-Mantel-Haenszel exact testdd||0.004||0.01|
| One-sided odds ratio (CI, lower bound)bb||6.4 (1.8)||(1.8)|
| Two-sided odds ratio (CI)||6.4 (1.5-58.3)||(1.4)|
|Comparison subjects (Children's Hospital of Philadelphia)ee|
| European ancestry||3||4,415||5||4,413||6||4,412||1||3,531||0||4,418||0||4,418||1||4,417|
| African Americans||2||3,609||1||3,610||6||3,605||1||3,028||4||3,607||0||3,611||3||3,608|
|Total subjects from all studies|
| Case subjects||14||7,322||5||3,940||15||3,930||10||3,935||6||3,939||7||7,329||5||3,940|
| Comparison subjects||7||14,814||6||11,634||15||11,625||2||10,170||5||11,635||0||14,821||4||11,636|
| Proportion of case subjects||0.0019||—||0.0013||—||0.0038||—||0.0025||—||0.0015||—||0.0010||—||0.0013||—|
| Proportion of comparison subjects||0.0005||—||0.0005||—||0.0013||—||0.0002||—||0.0004||—||—||0.0003||—|
|p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)|
|Pooled analysis (Fisher's exact test)||0.002||0.12||0.003||0.0001||0.04||0.0004||0.05|
| One-sided odds ratio (CI, lower bound)bb||4.0 (1.8)||2.5 (0.7)||3.0 (1.5)||12.9 (3.3)||3.5 (1.1)||(3.8)||3.7 (1.0)|
| Two-sided odds ratio (CI)||4.0 (1.5-11.9)||2.5 (0.6-9.7)||3.0 (1.3-6.5)||12.9 (2.8-121.4)||3.5 (0.9-14.7)||(2.9)||3.7 (0.8-18.6)|
|MGS subjects with CNV >100 kb||11/12||2/5||0/18||5/10||0/7||5/5||0/5|
|Probes in region, Affymetrix 6.0/ Illumina 610K/Illumina 550Kff||92/32/19||94/28/20||26/15/15||55/14/—ee||292/131/124||855/296/261||28/5/6|
FIGURE 1.Intensity Plots of Large 3q29 Microdeletions in Five Subjects With Schizophrenia or Schizoaffective Disordera
a Deletions of approximately 1.6 Mb were observed in five case subjects from the Molecular Genetics of Schizophrenia study (MGS), two in the International Schizophrenia Consortium study (ISC), and none of the comparison subjects in MGS, ISC, or the Children's Hospital of Philadelphia group (plotted with genomic coordinates from the Human Genome 18 reference sequence). Each subject's mean intensity for probes at each location was divided by the mean intensity for all subjects on the DNA plate; each point in the plot is the log of this result. Values of −1, 0, and 1 represent copy numbers of 0, 2, and 4, respectively; the deletions shown here have a copy number of 1. Copy number variants (CNVs) were called with the Birdseye module of the Birdsuite software package (15), version 2 (internal version 1.3). Copy numbers were also estimated for each point by a second algorithm (16). The browser plot at the bottom of the figure (from the University of California, Santa Cruz, Genome Browser, http://genome.ucsc.edu) shows the genes in the region and the segmental duplications that surround (and probably generate) the typical 21-gene deletion, including TFRC to BDH1 (see Table S7 in the online data supplement). The first plot illustrates the ambiguities of microarray intensity data, with the two algorithms interpreting the variability of intensity somewhat differently at each boundary. Several small CNVs in the region, including some in comparison subjects, are not shown.
Table 3 shows results for five previously reported CNV regions. Combined analyses produced high odds ratios and p values that were genome-wide significant (p<10−5) by either test for long deletions of chromosomes 1q21.1, 15q13.3, and 22q11.21, duplications in 16p11.2, and deletions of NRXN1 exons. Weaker association evidence was observed for duplications in 1q21.1. No association (p>0.05) was observed in the combined MGS and ISC group for 15q13.3 duplications (seven case subjects, two comparison subjects) or NRXN1 exonic duplications (two case subjects and no comparison subjects).
Results in New Study Groups for Previously Reported Copy Number Variants (CNVs) in Schizophreniaaa
| Add to My POL
|Groupbb,cc or Analysis||1q21.1 Deletions, 144.6-46.3 Mb||1q21.1 Duplications, 144.6-46.3 Mb||15q13.3 Deletions, 28.7-30.3 Mb||22q11.21 Deletions, 17.1-20.2 Mb||16p11.2 Duplications, 29.5-30.1 Mb||16p11.2 Deletions, 29.5-30.1 Mb||NRXN1 Exonic Deletions, Chromosome 2, 50. 0-51.1 Mb|
|MGS total case subjects (current study)||4||3,941||7||3,938||7||3,938||21||3,924||13||3,932||1||3,944||10||3,935|
|MGS total comparison subjects (current study)||1||3,610||0||3,611||1||3,610||0||3,611||1||3,610||3||3,608||1||3,610|
|MGS European-ancestry case subjects||2||2,669||5||2,666||7||2,664||18||2,653||10||2,661||1||2,670||9||2,662|
|MGS European-ancestry comparison subjects||1||2,647||0||2,648||1||2,647||0||2,648||0||2,648||3||2,645||1||2,647|
|MGS African American case subjects||2||1,272||2||1,272||0||1,274||3||1,271||3||1,271||0||1,274||1||1,273|
|MGS African American comparison subjects||0||963||0||963||0||963||0||963||1||962||0||963||0||963|
|ISC case subjects||9||3,382||3||3,388||8||3,383||11||3,380||6||3,385||1||3,390||3||3,388|
|ISC comparison subjects||1||3,180||1||3,180||0||3,181||0||3,181||1||3,180||3||3,178||1||3,180|
|deCODE (11) Iceland case subjects||1||645||1||647||1||645||1||645||—||—||—||—||—||—|
|deCODE (11) Iceland comparison subjects||8||32,434||12||32,430||7||32,435||0||32,442||—||—||—||—||—||—|
|deCODE (11) Denmark case subjects||3||439||—||—||—||—||—||—||—||—||—||—||—||—|
|deCODE (11) Denmark comparison subjects||0||1,437||—||—||—||—||—||—||—||—||—||—||—||—|
|deCODE (11) the Netherlands case subjects||0||806||—||—||3||803||1||805||—||—||—||—||—||—|
|deCODE (11) the Netherlands comparison subjects (11)||0||4,039||—||—||1||4,038||0||4,039||—||—||—||—||—||—|
|deCODE (11) other case subjects||3||2,159||0||579||2||2,097||1||1,723||—||—||—||—||—||—|
|deCODE (11) other comparison subjects||0||2,611||1||574||0||2,649||0||2,148||—||—||—||—||—||—|
|Weiss et al. (2020) case subjects||—||—||—||—||—||—||—||—||0||648||1||647||—||—|
|Weiss et al. (2020) comparison subjects||—||—||—||—||—||—||—||—||5||18,829||2||18,832||—||—|
|McCarthy et al. (33)dd case subjects||—||—||—||—||—||—||—||—||12||1,894||1||1,905||—||—|
|McCarthy et al. (33)dd comparison subjects||—||—||—||—||—||—||—||—||1||3,970||3||3,968||—||—|
|Walsh et al. (88) case subjects||—||—||—||—||—||—||—||—||—||—||—||—||1||232|
|Walsh et al. (88) comparison subjects||—||—||—||—||—||—||—||—||—||—||—||—||0||268|
|Need et al. (2121) case subjects||—||—||—||—||—||—||—||—||—||—||—||—||3||1,070|
|Need et al. (2121) comparison subjects||—||—||—||—||—||—||—||—||—||—||—||—||0||1,148|
|Kirov et al. (44) case subjects||—||—||—||—||—||—||—||—||—||—||—||—||1||470|
|Kirov et al. (44) comparison subjects||—||—||—||—||—||—||—||—||—||—||—||—||3||2,789|
|Ikeda et al. (2222) case subjects||—||—||—||—||—||—||—||—||—||—||—||—||0||560|
|Ikeda et al. (2222) comparison subjects||—||—||—||—||—||—||—||—||—||—||—||—||0||547|
|Rujescu et al. (55) case subjects||—||—||—||—||—||—||—||—||—||—||—||—||5||2,972|
|Rujescu et al. (55) comparison subjects||—||—||—||—||—||—||—||—||—||—||—||—||5||33,741|
|Total case subjects||20||11,372||11||8,552||21||10,866||35||11,365||31||9,859||4||9,886||23||12,627|
|Total comparison subjects||10||47,311||14||39,795||9||45,913||0||45,361||8||29,589||11||29,586||10||45,284|
|Proportion of case subjects||0.0018||—||0.0013||—||0.0019||—||0.0031||—||0.0031||—||0.0004||—||0.0018||—|
|Proportion of comparison subjects||0.0002||—||0.0004||—||0.0002||—||0.0000||—||0.0003||—||0.0004||—||0.0002||—|
|p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)||p||Odds Ratio (CI)|
|Meta-analysis, Cochran-Mantel-Haenszel exact test||8.5×10−6||0.02||6.9×10−7||7.3×10−13||2.6×10−8||0.88||1.3×10−6|
| One-sided odds ratio (CI, lower bound)ee||9.5 (3.5)||4.5 (1.4)||12.1 (4.4)||(20.3)||9.5 (4.1)||0.6 (0.2)||7.5 (3.4)|
| Two-sided odds ratio (CI)||9.5 (3.0-33.4)||4.5 (1.2-17.5)||12.1 (3.8-42.0)||(15.6-∞)||9.5 (3.6-28.4)||0.6 (0.1-2.2)||7.5 (3.0-19.8)|
|Pooled analysis, Fisher's exact test||2.2×10−8||0.002||2.0×10−9||>1.0×10−16||1.5×10−12||0.54||5.5×10−9|
| One-sided odds ratio (CI, lower bound)ee||8.3 (4.2)||3.7 (1.7)||9.9 (4.9)||(44.7)||11.6 (5.8)||1.09 (0.32)||8.2 (4.2)|
| Two-sided odds ratio (CI)||8.3 (3.7-19.9)||3.7 (1.5-8.7)||9.9 (4.3-24.4)||(35.9-∞)||11.6 (5.6-29.3)||1.09 (0.3-3.7)||8.2 (3.8-19.4)|
Results with pointwise empirical suggestive significance are shown in Table S6 in the online data supplement. Table S7 provides information about genes within the multigenic CNVs of interest (Tables 2 and 3). Table S8 shows additional suggestive MGS results for exonic CNVs.
In MGS analyses of genome-wide CNV number (Table S9), the case subjects showed no excess of duplications, but for large CNVs (>100 kb) they had more exonic deletions per subject than the comparison subjects (0.304 versus 0.282, empirical p<0.05) and increases in several other variables related to CNVs spanning genes (e.g., 0.628 versus 0.531 genes per CNV, p<0.05). They also had more genic and exonic deletions larger than 1 Mb and more large singleton exonic deletions per subject (0.075 versus 0.063, p<0.05). Effects were in the same direction in European-ancestry and African American subjects (Table S9c). The case subjects with DNA samples from lymphoblastic cell lines and those with samples from blood did not differ in large deletions (Table S4b), but specimens from lymphoblastic cell lines had a small excess (4.5%) of deletions smaller than 100 kb.
Confirmation was provided by qPCR for all CNVs reported in Table 3, an atypical 22q11.21 distal deletion referred to in the Discussion section, and CNVs in 3q29, VIPR2, and selected additional regions (Table S5).
Table 4 summarizes clinical data for case subjects with seven large CNVs. Uncorrected p values are shown to illustrate differences, but none was significant after Bonferroni correction. Most differences were for learning problems and seizures (see Discussion). The analyses were based on self-reported medical information because it was available for all subjects and was generally consistent with the additional data available for some patients.
Clinical Features of Subjects With Schizophrenia or Schizoaffective Disorder Who Were Tested for Copy Number Variants (CNVs)aa
| Add to My POL
|Clinical Feature||No CNV||Strongest New Findings From Current Study||Confirmed Association With Schizophrenia|
|DSM-IV criteria metbb|
| Bizarre (impossible) delusions||—||0.63||—||0.60||—||0.80||—||0.75||—||0.60||—||0.67||—||0.46||—||0.52|
| Commentary or conversing hallucinations||—||0.50||—||0.40||—||0.50||—||0.00**||—||0.40||—||0.86||—||0.46||—||0.52|
| Formal thought disorder||—||0.67||—||0.60||—||0.70||—||0.75||—||0.80||—||0.43||—||0.85||—||0.52|
| Disorganized behavior||—||0.67||—||1.00||—||0.70||—||0.25||—||0.70||—||0.57||—||0.46||—||0.62|
| Negative symptoms||—||0.83||—||0.80||—||1.00||—||1.00||—||0.60**||—||0.86||—||0.77||—||0.81|
|Self-reported comorbid medical conditionscc|
| Thyroid disorder||355||0.10||2||0.40****||1||0.11||1||0.25||1||0.10||0||0.00||1||0.08||5||0.25****|
| Learning problem||760||0.22||2||0.50||3||0.38||1||0.25||5||0.50****||3||0.43||6||0.46****||11||0.52******|
|Number with clinical factor scores||3,406||5||8||3||10||7||13||20|
|Clinical factor scoresdd|
| Positive symptoms||0.00||0.67||0.21||—||0.50||—||−0.30||—||0.04||—||−0.09||—||0.57+||—||−0.08||—|
| Negative and disorganized symptoms||0.01||0.47||0.06||—||0.22||—||0.02||—||0.00||—||−0.22||—||0.11||—||−0.03||—|
| Mood (depressive and manic) symptoms||0.14||0.74||0.15||—||0.10||—||−0.30||—||0.79****||—||0.06||—||0.10||—||0.11||—|
|Age at onset (years)||21.26||6.93||19.60****||0.89||19.50||4.45||19.75||10.66||20.40||5.39||22.00||8.83||20.62||7.33||22.10||4.40|
| Total months of illness (lifetime)||260||137||353||—||293||—||192||—||285||—||309||—||287||262||—|
| Months with mood syndrome (lifetime)||18||48||2††||—||17||—||4****||—||10**||—||31||—||37||24||—|
| Proportion of illness with mood syndrome||0.08||0.18||0.01††||—||0.12||—||0.13||—||0.09||—||0.10||—||0.12||0.08||—|
Previously reported 500-kb deletions on chromosome 15q11.2 (1) were observed in 19 MGS case subjects and 17 comparison subjects. Exonic APBA2 duplications (21, 24) were observed in four case subjects (one duplication was 1.5 Mb) and one comparison subject. In the combined group of MGS subjects, ISC subjects, and Philadelphia subjects assessed with Illumina 550K arrays, we observed an odds ratio of 2.81 (n.s.) for exonic deletions in CNTNAP2 (seven case subjects and three comparison subjects).
These results convincingly support the findings of substantial increases in schizophrenia risk in individuals carrying large deletions on chromosomes 1q21.1, 15q13.3, and 22q11.21, exonic NRXN1 deletions, and 16p11.2 duplications. In this section we will discuss these regions, as well as new candidate CNVs, including 1.6-Mb deletions in chromosome 3q29 and exonic duplications of VIPR2, the hypothesis of a global increase in rare CNVs, and implications for future research.
While combined analysis showed a strong association of schizophrenia with deletions in this region, the frequency of duplications in case subjects was also higher than in comparison subjects in MGS (p=0.02 in all data by meta-analysis). The patients with deletions here did not report a higher number of seizures or learning problems. The typical 1.67-Mb deletion contained 11 genes (FAM108A3 to NBPF11, Table S7) listed (annotated) in the RefSeq database. It is not known which gene or genes underlie the pathogenic effects. One proposed candidate is HYDIN (144.9 Mb), an evolutionary duplication of HYDIN on 16q22.2. It is not annotated on 1q because segmental duplications prevent it from being sequenced confidently. (Note that 1q21.1 CNVs produce false positives on 16q22.2.) The 1q21.1 isoform is expressed in the brain. Microcephaly is observed in mice with homozygous HYDIN deletions and in neurodevelopmentally impaired children and their parents with long 1q21.1 deletions (and macrocephaly with duplications) (25).
Our data support associations of schizophrenia with 1.5-Mb deletions containing ARHGAP11B, MTMR15, MTMR10, TRPM1, KLF13, OTUD7A, and CHRNA7. Duplications were observed in seven schizophrenia patients in MGS and ISC and two comparison subjects (p=0.11). Shorter CNVs in the region are not associated with schizophrenia. Case subjects with these deletions reported more seizures, but clinical details are not available. For both long and short exonic deletions, the ratio of their occurrence in the case and comparison subjects was 35:20 for CHRNA7 and 30:17 for ARHGAP11B, higher (7:1 to 10:2) for the other five genes, and highest in OTUD7A (8:1), which codes for a deubiquitinating enzyme.
This is the one region where different copy numbers are more strongly associated with schizophrenia (duplications) than with autism (deletions) (3). In children, deletions produce macrocephaly and behavioral problems, including autism; duplications produce microcephaly and attention deficit hyperactivity disorder; and neurodevelopmental delays, learning disorders, congenital anomalies, and seizures occur in both groups (26). Our case subjects reported more learning problems and seizures. The region contains 26 annotated genes (SPN to CORO1A) plus three genes duplicated in each flanking segmental duplication region (see Table S7).
Exonic Deletions in NRXN1
Neurexin-1 is a presynaptic neuronal cell surface molecule that participates with postsynaptic neuroligins in cell adhesion and synaptic signaling. Neurexin and neuroligin mutations have been implicated in autism, and mice with NRXN1 deletions have deficient prepulse inhibition, a schizophrenia endophenotype (27). Diverse CNVs are seen in this 1.1-Mb-long gene. Association of rare exonic NRXN1 deletions, reported by Rujescu et al. (5) and supported by a meta-analysis (4), is strongly confirmed here. Exonic duplications were observed in only two MGS and ISC case subjects and in no comparison subjects. Deletions of specific exons and/or regulatory regions may be critical, given the approximately 1,000 alternative splicings producing proteins. The patients with NRXN1 deletions reported more learning problems and seizures, consistent with observations of deletions in mental retardation and of homozygous deletions in the Pitt-Hopkins syndrome of mental retardation and epilepsy (28). Pitt-Hopkins syndrome is also caused by mutations in TCF4, in which common SNPs are associated with schizophrenia (29). Owing to the strong association of exonic NRXN1 deletions with schizophrenia and to the large effect size, this appears to be the first single gene that has been shown to be involved in the etiology of schizophrenia.
Our findings here are consistent with previous reports. We observed 19 longer deletions, typically 3.5 Mb (from 17,256,428 to 19,795,835 bp), and two shorter (1.4—2.0 Mb) proximal deletions (spanning 43 and 29 genes, respectively) (Table S7). An additional case subject (not counted in this analysis) had a 760-kb distal deletion of unknown significance (19,035,775 to 19,795,835 bp), not previously reported in schizophrenia nor overlapping the proximal deletions. There are no robust SNP association findings within these genes (12). Our case subjects with 22q11.21 deletions reported more learning problems, seizures, and thyroid problems (seen in DiGeorge syndrome) than did those with no CNVs. We observed deletions in 0.53% of the case subjects, 18 (0.67%) of European ancestry and three (0.23%) African American. Velocardiofacial syndrome is associated with early mortality (30). The African American patients with deletions in this region were all under age 45, but deletions were seen in 0.35% and 0.65% of the case subjects of European ancestry under and over age 40; the age at interview did not differ between the two ethnic groups. It is unclear why there were no older African American patients, but the prevalence in the patients with European ancestry (0.67%) does not seem to have been underestimated because of early mortality.
These 1.6-Mb deletions were observed in five MGS and two ISC case subjects and no comparison subjects. They are identical to the 3q29 microdeletions reported to cause mild-moderate mental retardation, microcephaly in half of patients, autism in a minority, and inconsistent physical anomalies (31). Reciprocal duplications (not observed here) are also associated with mental retardation and microcephaly. Two of the five MGS case subjects (one reporting seizures in infancy) received consensus diagnoses of definite or possible mild mental retardation, and a third attributed seizures to a drug reaction. Walsh et al. (8) reported one similar deletion in a schizophrenia subject in a group of 150. After this article was submitted, Mulle et al. (17) reported an association between schizophrenia and 3q29 deletions on the basis of one 836-kb deletion in an Ashkenazi group and five longer deletions: the one reported by Walsh et al., the two ISC case subjects included in Table 2, and two MGS case subjects from the Genetic Association Information Net-work data set available from the Database of Genotypes and Phenotypes (dbGaP) (http://www.ncbi.nlm.nih.gov/gap). We have not added the Ashkenazi data to our analysis because all of the other reported deletions in case subjects cover the full 1.6-Mb region, and it is unknown whether 800-kb deletions are associated with schizophrenia; seven short exonic DLG1 deletions were observed in the MGS and Philadelphia comparison subjects.
The 1.6-Mb deletion spans 21 genes (TFRC to BDH1, Table S7). PAK2 and DLG1 are homologues of X-linked mental retardation genes PAK3 and DLG3 (listed as OMIM 300142 and OMIM 300189, respectively, in the Online Mendelian Inheritance in Man [OMIM] catalog; http://www.ncbi.nlm.nih.gov/omim). Expression of DLG1 (also known as SAP97) was lower than normal in postmortem prefrontal cortex from individuals with schizophrenia (32). (Exonic deletions in the homo-logue DLG2 were seen in four MGS case subjects and no comparison subjects, in two ISC case subjects and one comparison subject, and in one Philadelphia comparison subject assessed with the Illumina 550K microarray.) Other plausible candidates include MFI2, which shows high expression in amyloid plaques (33); PCYT1A (choline-phosphate cytidylyl-transferase A), which controls synthesis of the phosphoplipid phosphatidylcholine, itself hypothesized to play a role in schizophrenia (34); Tnk2, a tyrosine kinase involved in adult synaptic function and plasticity and in brain development; TM4SF19, involved in cell fusion and signaling and related to TM4SF2/A15, which is associated with mental retardation (35); FBXO45, a ubiquitin ligase involved in regulation of synaptic activity, neuronal migration, and patterning of neuronal connectivity; and PIGX/PIGZ, involved in biosynthesis of glycosylphosphatidylinositol, which anchors cell adhesion molecules and other proteins to the plasma membrane.
Exonic VIPR2 duplications were associated with schizophrenia with an odds ratio of 4.0, with consistency observed across MGS (10:2 case:control ratio), ISC (4:0), and the Philadelphia comparison group (five of 8,029). The MGS case subjects had high ratings for positive symptoms. VIPR2 encodes a receptor for vasoactive intestinal peptide and pituitary adenylate cyclase-activating polypeptide, peptides with diverse roles in embryonic neural development, neuroprotection and response to neural injury, and inflammatory processes. Both have been hypothesized to have roles in autism (10) and schizophrenia (11).
The strongest evidence for association (p=0.002 in MGS; p=0.0001 overall, including Philadelphia comparison subjects assessed with the Illumina 610K array; odds ratio=12.9) was observed for exonic duplications in C16orf72, a brain-expressed gene of unknown function. More data are needed to evaluate this region because of the diverse size range and the small number of probes on the Illumina arrays (we excluded data for the 550K array, which had only three probes). Other genes and regions shown in Table 2 are also plausible candidates: AGTPBP1, encoding the zinc carboxypeptidase NNA1 (nervous system nuclear protein induced by axotomy); NEDD4L (neural precursor cell expressed, developmentally down-regulated 4-like), a ubiquitin-protein ligase involved in inhibition of beta tumor growth factor (TGF-α signaling and ubiquitination of several plasma membrane channels; and GLB1L3 and GLB1L2, two forms of beta-galactosidase; neurodegeneration is a major feature of the recessive GM1 gangliosidosis caused by homozygous GLB1 mutations. We noted four large duplications in CSMD3 in case subjects and none in comparison subjects (but the ratio was 3:2 in ISC); two patients with translocation breakpoints near CSMD3 had autistic behavior and developmental delay (36). The GWAS of the MGS subjects of European ancestry produced moderate evidence for an association of schizophrenia with the paralogue CSMD1 (4.45×10−5) (12).
The case subjects had more large deletions (but not duplications) in the affected genes or exons, suggesting that additional associations will be discovered in larger data sets. Some CNVs may be too rare for association to be proven. The observed increase is modest and might explain a few more percent of schizophrenia cases.
Conclusions and Implications for Future Research
Microarrays have detected rare CNVs that substantially increase the risk of neuropsychiatric disorders. Although these CNVs explain only a small portion of schizophrenia risk in the population, fundamental discoveries could result from efforts to explain the biological mechanisms underlying the association of these CNVs with schizophrenia, autism, mental retardation, and epilepsy and perhaps from SNP associations with similar phenotypic overlap or biological mechanisms. It is not yet clear whether these mechanisms are relevant to most cases of schizophrenia, given that many susceptibility genes might be located in regions not prone to CNVs and thus not detectable by CNV scans; however, carriers of associated CNVs had typical clinical features similar to those in the rest of a large study group. We note that several schizophrenia-associated CNVs produce microcephaly, including 1q21.1 deletions (with duplications associated with macrocephaly), possibly related to HYDIN; 3q29 deletions, possibly related to PAK2 and/or DLG1; and 16p11.2 duplications.
The strong evidence for the association of NRXN1 with schizophrenia provides clear impetus for research into related neurodevelopmental and neural signaling processes. CNVs in VIPR2 or other genes could provide additional clues. The 1q, 3q29, 15q, 16p, and 22q CNVs involve many genes. While it has been hoped that shorter CNVs would provide clues as to which genes were critical to the association, we note several regions where shorter CNVs in obvious candidate genes (such as CHRNA7 in 15q13.3 or DLG1 in 3q29) have much lower odds ratios than the long CNVs. It is possible that pathogenic effects are due to the CNV's effects on combinations of genes within or (through expression changes) outside its boundaries, perhaps interacting with genotypes on the patient's intact or duplicated chromosome.
Some have argued that the association of rare CNVs with schizophrenia demonstrates that much of the risk for this disease will be explained by high-penetrance rare variants (37). However, it is not yet clear how many rare SNPs or small insertions/deletions will be as pathogenic as are these long, multigenic CNVs. Whole-genome sequencing is likely to produce additional surprises about the genomic basis of disease risk.
We would also urge caution about the use of high-penetrance CNVs as presymptomatic tests for schizophrenia (e.g., prenatally or in infants or children). If approximately 1% of 300,000,000 Americans will develop broadly defined schizophrenia, our data suggest that 1.25% (37,500) will carry one of the CNVs listed in Table 3, as will 0.09% (270,000) of individuals who never develop schizophrenia (possibly an underestimate if, as we suspect, individuals with mild learning or other neuropsychiatric problems are underrepresented in comparison groups) (38). Thus, we do not know the true proportions of carriers with severe, mild, or no neuropsychiatric disorder, and the overall positive predictive value for schizophrenia may be 12% or less.
We are reaching the point, however, where CNV testing could be indicated for individuals with schizophrenia. Several strongly associated CNVs have implications for clinical management and preconception reproductive counseling of patients. For example, a higher rate of premature death was observed in patients with 22q11.2 deletion, even those without congenital heart disease or schizophrenia (39). Among 558 adults (mean age, 34.7 years) with tetralogy of Fallot or pulmonary atresia, 24 (54%) of 44 with 22q11.2 deletions discovered by screening had not previously been diagnosed (40), and aortic root dilation has been detected in 22q11.2 patients without other cardiac anomalies (41). There are other medical features, as well as a specific mathematical learning disability that is relevant to rehabilitation and vocational planning (42). The development of cost-effective clinical assays for known CNVs would be valuable for these patients.
The authors thank the study participants, the research staff at the study sites, the GAIN quality control team (G.R. Abecasis and J. Paschall), S. Purcell for assistance with PLINK, Knowledge Networks, Inc., for recruiting the comparison group, and Dr. Steve McCarroll, Josh Korn, and Alec Wysoker for assistance with Birdsuite.