Attention deficit hyperactivity disorder (ADHD) is a highly heritable disorder (heritability estimates range from 75% to 90% [1, 2]). Rare genetic variants, specifically large, rare copy number variants (CNVs), play an important role in ADHD (3–5), but so far, genome-wide searches have not identified common risk variants. Four published genome-wide association studies (GWAS) of ADHD (6–9) and a recent meta-analysis (10) of all available data have failed to yield genome-wide significant results for any single-nucleotide polymorphism (SNP).
There are several explanations as to why it has been difficult to identify common genetic risk variants for psychiatric disorders (11), including ADHD (12). One important factor is that the effect size of any individual SNP is likely to be small (13). This means that with currently available sample sizes, true common risk alleles are unlikely to achieve the stringent statistical thresholds required for genome-wide significance (14), although, as has repeatedly been demonstrated for other phenotypes, this can in part be overcome for at least a proportion of risk variants as larger samples become available for performing meta-analyses. For GWAS of childhood-onset psychiatric disorders, such as ADHD and autism, the types of sample sizes required, even with international collaboration, have yet to be achieved (15). Another possibility is that if ADHD is genetically heterogeneous (in the sense that there are multiple phenotypes with limited or no overlap at the level of common risk alleles), the effects of each allele might be diluted, resulting in lower apparent effect sizes. However, it is currently unclear how best to subdivide ADHD in a way that might overcome this problem or whether such subdivisions are possible.
An alternative explanation for the negative GWAS findings might be that ADHD risk is entirely explained by multiple low-frequency variants that are not well captured by the genotyping arrays. In reality, population genetics theory predicts that risk is most likely conferred by alleles that span the spectrum of frequencies (13). If it is the case that both common and rare variants contribute to ADHD risk, but genome-wide significant association cannot be a realistic goal with currently sized samples, we might expect to see a convergence of subthreshold signals from both types of variants influencing common biological risk pathways.
In the present study, we investigated whether specific biological pathways were enriched for associated SNPs and for CNVs, and whether these overlapped.
Subjects and Clinical Measures
The ADHD patient sample consisted of 799 Caucasian children from Cardiff, Wales (N=559); St. Andrews, Scotland (N=44); and Dublin, Ireland (N=196). All children were recruited from community clinics and met DSM-IV or ICD-10 criteria for ADHD or hyperkinetic disorder. To be comparable with other GWAS, we excluded children with a major medical or neurological condition (including epilepsy), autism, bipolar disorder, or intellectual disability (IQ <70).
We obtained approval from North West England, Wales, NHS Tayside, and Eastern Regional Health Authority research ethics committees. Written informed consent from parents and assent from children were obtained.
Trained interviewers used the Child and Adolescent Psychiatric Assessment—Parent Version (16), a semistructured research diagnostic interview, to assess psychiatric diagnoses. Pervasiveness of ADHD symptoms (in school) was assessed using the Child Attention-Deficit Hyperactivity Disorder Teacher Telephone Interview (17) or the Conners Teacher Questionnaire (18). IQ was assessed using the WISC-IV (19).
The children were between 4 and 18 years old (mean=10 years 3 months [SD=3 years]). The sample consisted of 699 boys (87.4%) and 100 girls (12.6%). Table 1 summarizes ADHD subtypes and comorbidities.
ADHD Subtypes and Comorbid Disorder Rates in 799 Children With ADHDaa
| Add to My POL
|ADHD diagnoses (lifetime)aa|
| DSM-IV ADHD, combined type||498||64.8|
| DSM-IV ADHD, predominantly inattentive type||162||21.1|
| DSM-IV ADHD, predominantly hyperactive-impulsive type||62||8.1|
| DSM-III-R ADHD||46||6.0|
|Other diagnoses (current)|
| DSM-IV conduct disorder ||107||13.7|
| DSM-IV oppositional defiant disorder||364||46.5|
| DSM-IV anxiety disorder (generalized anxiety disorder, separation anxiety, or social phobia)||37||4.7|
| DSM-IV depressive disorder (any)||22||2.8|
Genotype control data were obtained from the Wellcome Trust Case Control Consortium–Phase 2 (20). They comprised 3,000 individuals born in the United Kingdom during 1 week in 1958 (the 1958 British Birth Cohort) and 3,000 individuals from the U.K. Blood Services collection. It has previously been shown that it is valid to combine these two samples for use as comparison subjects in genetic association studies using U.K. case samples (20). The comparison subjects were not screened for psychiatric disorders. However, the potential loss of power that is attained by using unscreened comparison subjects is more than offset by the large numbers of comparison samples available (21).
SNP data for our 100 most strongly associated SNPs were requested from deCODE Genetics and the ADHD GWAS Consortium. The deCODE sample included 1,142 Icelandic individuals who met DSM-IV criteria for ADHD. Patients were recruited from outpatient psychiatric clinics in Iceland. Diagnoses were based on standardized diagnostic assessments and were reviewed by experienced clinicians as previously described (22). A total of 35,243 Icelandic individuals were available as comparison subjects (22). The second sample consisted of 2,064 parent-child trios, 896 case subjects, and 2,455 comparison subjects from the ADHD GWAS Consortium meta-analysis and has been described in detail elsewhere (10). This data set consists of four projects: the Children's Hospital of Philadelphia, phase I and phase II of the International Multisite ADHD Genetics Project, and a Pfizer-funded study from the University of California, Los Angeles, Washington University, and Massachusetts General Hospital.
DNA samples for our ADHD case subjects were genotyped on the Illumina (San Diego) Human660W-Quad BeadChip according to the manufacturer's instructions. Comparison subjects were genotyped by Wellcome Trust Case Control Consortium–Phase2 using the Illumina Human 1.2M BeadChip. BeadStudio (version 2.0) was used to call genotypes and inspect cluster plots. Analysis was based on 518,511 SNPs that were present on both chips.
Quality Control Assessment
Sample and SNP quality control assessments were performed using PLINK, version 1.07 (23). Sample quality control assessment was performed separately for case and comparison subjects. Full details are provided in the data supplement that accompanies the online edition of this article. In brief, case and comparison subjects were excluded if there was a call rate less than 0.99, low or high heterozygosity, evidence of relatedness, duplication, or non-European ancestry. Exclusions included one member of related pairs. Also, SNPs were excluded if they had a call rate less than 0.99, had a minor allele frequency less than 0.01, deviated from Hardy-Weinberg equilibrium at p<1×10–5, or had more than 1% discordant genotypes between the Illumina 550K and the Illumina Human 1.2M BeadChip arrays. After all the quality control, 502,702 SNPs were tested for association in 727 case subjects and 5,081 comparison subjects.
The ADHD sample is an extension of 366 cases previously examined for large, rare CNVs (5). All quality control and CNV detection protocols were identical to those previously described. BeadStudio was used to determine the log R ratio and B allele frequency at each SNP according to standard Illumina protocols. CNVs were defined by PennCNV (24) with loci spanning at least 15 consecutive informative SNPs, with those having copy number calls <2 and >2 being classed as deletions and duplications, respectively. Samples with a high standard deviation in their genome-wide log R ratio (>0.30) and carrying more than 30 apparent CNVs over 100 kb were also excluded. Large (classified as those >500 kb) and rare (<1% frequency) CNVs were used in this analysis because they are called with greater accuracy, have better concordance across different platforms, and show the most robust associations with neurodevelopmental disorders (5).
SNPs were tested for association with ADHD using logistic regression in PLINK assuming an additive model. The EIGENSTRAT software package was used to calculate principle components by inferring continuous variation in allele frequencies reflecting ancestral differences in individuals (25). Two principle components were identified and used to control for population stratification since they had the maximum impact on the genomic control inflation factor λ. Genome-wide significance was considered to be achieved when the p value reached 5×10–8 (26).
Genotype data for the top 100 SNPs from the present GWAS were requested from the ADHD genetics consortium (10) and deCODE. In these samples, we tested for enrichment of association of our top 100 SNPs after linkage disequilibrium pruning (see the online data supplement). There were 204 samples from the present GWAS that overlapped with those included in the ADHD genetics consortium GWAS meta-analysis. Overlap was statistically accounted for in the analysis (see the online data supplement). There was no overlap between those two data sets and that of deCODE. Two methods were used to test for enrichment of association signal in the combined set of SNPs. The first of these was the Simes test (27), a more powerful and less conservative version of the Bonferroni method, which tests SNPs one at a time. The other was Fisher's method for combining p values, which aggregates the evidence for all SNPs simultaneously. Since odds ratios must be in the same direction as our GWAS to count as replication, one-sided p values were used in the analysis. Enrichment was tested in the ADHD genetics consortium and deCODE samples separately, and in both data sets combined (see the online data supplement). A meta-analysis of all three samples (Cardiff ADHD GWAS, ADHD genetics consortium, and deCODE) was also performed on each of the top 100 SNPs (without pruning) separately.
Pathway analysis of Cardiff GWAS data.
Gene sets used for pathway analyses of our GWAS data came from four sources (28): Gene Ontology (29), the Kyoto Encyclopedia of Genes and Genomes (ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/hsa_pathway.list), the Mouse Genome Informatics database (30), and PANTHER (Protein Analysis Through Evolutionary Relationships ). Gene sets were required to contain between three and 1,000 genes to be included in the analysis, giving a total of 12,371 gene sets. Analysis was carried out using ALIGATOR (28), which converts a list of significant and nominally significant SNPs into a list of significant genes and tests this list for enrichment for genes within the gene sets, allowing for variable numbers of SNPs per gene. ALIGATOR generates p values for enrichment for each gene set and corrects these for testing multiple nonindependent gene sets. It also tests whether the number of significantly enriched gene sets is higher than expected given the observed set of SNP p values in the GWAS. Gene sets required at least two signals to be tested to remove the possibility of a small gene set being deemed significantly enriched based on one signal. An important modification to the original ALIGATOR method is that significant genes in the same gene set that are less than 1 Mb apart (and thus could be explained by the same association signal) were counted as a single signal. SNPs within the boundaries of a gene (genome build 36.3) were assigned to that gene: if SNPs mapped to more than one gene, they were assigned to all such genes. Using this method, 203,663 SNPs were assigned to 14,929 genes. As before (32), the significant genes included the top 5% of all genes represented by SNPs, which was a total of 746 genes with at least one SNP (p<0.0054).
Overlap of GWAS and CNV pathways.
Gene sets with nominally significant (p<0.05) enrichment in the pathway analysis of the GWAS data were tested for an excess of genes affected by large, rare CNVs in case subjects by fitting the following logistic model, which overcomes biases relating to gene and CNV size (33), to the combined set of CNVs:
logit (pr[case]) = CNV size + total number of annotated genes affected outside the gene set + number of genes affected in the gene set
and comparing the change in deviance between it and the model
logit (pr[case]) = CNV size + total number of annotated genes affected outside the gene set.
The comparison of case to control CNVs allows for the possibility of nonrandom CNV location unrelated to disease (i.e., CNVs tend to occur in specific locations of the genome, and this is unrelated to case status). A one-sided test for an excess of genes affected by case CNVs was performed. The inclusion of CNV size in the regression allows for case CNVs being of different size than typical CNVs (and thus likely to affect more genes, regardless of function). Inclusion of the total number of genes affected outside the gene set in the regression corrects for case CNVs affecting more genes overall (regardless of function) than control CNVs. Analysis was restricted to gene sets containing at least eight gene hits in total (case and control combined), since pathways with a large number of gene hits are more likely to be biologically meaningful. This criterion is different from that used for the ALIGATOR analysis of GWAS data (two significant genes) for two reasons. First, each gene is counted only once in ALIGATOR but can be counted multiple times in the CNV analysis (if hit by multiple CNVs). Second, two significant genes may be sufficient to flag a pathway of interest in a GWAS context if these gene associations are sufficiently significant. Correction for multiple testing was applied by randomly permuting case/control status of CNVs and repeating the analysis 5,000 times. This procedure gave a corrected p value for enrichment of gene hits in case CNVs for each gene set as well as a test of whether more gene sets than expected are significantly enriched. The latter gives a test of overlap in the pathways enriched for rare CNVs and common associated SNPs.
The quantile-quantile plot of the observed versus expected chi-square tests is presented in Figure 1. The genomic control inflation factor λ was 1.069. Standardized to a sample size of 1,000, λ1,000 was 1.054. No SNP achieved genome-wide significance. Table 2 lists the top 20 independent SNPs ordered by significance.
Quantile-Quantile Plot for 502,702 Single-Nucleotide Polymorphisms (SNPs) Genotyped in 727 Case Subjects and 5,081 Comparison Subjects With Genomic Control Inflation Factor λ=1.069 and λ1,000=1.054 in a Study of Common Genetic Variants and ADHD Risk
Top 20 Independent Single-Nucleotide Polymorphisms (SNPs) in an ADHD Genome-Wide Association Study (GWAS)
| Add to My POL
|SNP||Chromosome||Position||Closest Gene||Location Relative to Gene||Minor Allele||Other Allele||Minor Allele Frequency||p||Odds Ratio ||95% CI|
|rs1744062||6||137350879||IL20RA||Within noncoding gene||G||A||0.43||4.16E-06||0.75||0.67–0.85|
|rs790531||13||49623515||DLEU2||Within noncoding gene||G||A||0.06||1.50E-05||1.62||1.30–2.02|
|rs1050567||2||61559167||XPO1||3′ untranslated region||T||C||0.11||2.89E-05||1.44||1.22–1.72|
We next sought replication for our top SNPs (see Table S1 in the online data supplement). To obtain 100 independent SNPs, we linkage disequilibrium pruned the GWAS data set using PLINK. For pairs of SNPs less than 1 Mb apart with r2>0.2, only the most significant SNP in the Cardiff GWAS in each pair was retained, leaving a total of 60 SNPs. No significant excess signal was observed among these SNPs in the ADHD Genetics Consortium meta-analysis data set (Simes p=0.176, Fisher's p=0.159) or in deCODE (Simes p=0.291, Fisher's p=0.621), or when both data sets (all published reports) were meta-analyzed (Simes p=0.135, Fisher's p=0.095). The individual p values for each of the 60 independent SNPs from these analyses are listed in Table S2 in the online data supplement. The individual p values for the top 100 Cardiff SNPs (see Table S1 in the online data supplement) and the p values from meta-analysis of the Cardiff ADHD GWAS, ADHD Genetics Consortium meta-analysis, and deCODE data sets are listed in Table S3 in the online data supplement. In the combined analysis of all data, no marker SNPs approached genome-wide significance (pmin=6.38×10–6 at rs11698703).
Pathway Analysis of Cardiff SNP Data
In the ALIGATOR analysis of our GWAS data set, 315 pathways were enriched at p<0.05 and 81 at p<0.01. More categories were enriched at the more stringent threshold (p=0.033) given the distribution of p values in the genes in the data set as a whole, but none was significant after correcting for multiple testing. Enrichment p values for the top 100 significant pathways are listed in Table S4 in the online data supplement.
Overlap of Enriched Pathways Between CNV and SNP Data
We included 727 ADHD case subjects and 1,047 comparison subjects in the CNV analysis. We observed a significantly (p=0.002) higher rate of large (>500 kb), rare CNVs in case subjects (N=85, 21 deletions, 64 duplications) than in comparison subjects (N=78, 13 deletions, 65 duplications). In the subset of the 727 ADHD case subjects (N=409) that had not been included in the previous report (5), the rate of large, rare CNVs was also significantly greater than in the comparison subjects (rate=0.112, compared with 0.075; p=0.02). More information is provided in Table S5 in the online data supplement.
More of the gene sets that were nominally significantly enriched in the ALIGATOR analysis of the SNP data were also significantly enriched for case CNVs (Figure 2). Thus, of the 315 pathways with enrichment at p<0.05 from the SNP data, in the CNV data 13 were enriched at p<0.05, eight at p<0.01, and seven at p<0.001. These numbers are significantly higher than expected by chance (p=0.0080, p=0.0022, p<0.0001, respectively). The 13 pathways significantly enriched (p<0.05) in both the SNP data and the CNV data are listed in Table 3. Although there was strong evidence of SNP and CNV signal convergence at the level of pathways, this was not evident at the individual gene level. Within the 13 significantly enriched pathways, 63 genes for which there were gene-wide (Simes) p values from the GWAS were affected by at least one CNV in a case subject or a comparison subject. Among these, there was some correlation (r=0.236) between genes showing evidence for association (-log GWAS Simes p and -log CNV enrichment p) at the level of SNPs and CNVs, but this did not quite achieve statistical significance (p=0.063).
Significant Overlap of Biological Pathways, Including Cholesterol-Related and CNS Development, Enriched for Single-Nucleotide Polymorphism (SNP) Association, and Those Enriched for Rare Copy Number Variants (CNVs) in a Sample of 727 Children With ADHD and 5,081 Comparison Subjects
Pathways Showing Nominally Significant Enrichment (p<0.05) in Both the Single-Nucleotide Polymorphism Data and the Copy Number Variant (CNV) Data in a Genome-Wide Association Study of ADHD
| Add to My POL
|Pathway Numberaa||Number of Genes||Gene Hits (Cases)||Gene Hits (Comparison)||p (CNV)||p (corr)bb||p (GWAS)||Description|
|MGI:5278||188||14||0||1.47E-05||0.002||0.030||Abnormal cholesterol homeostasis|
|MGI:3947||182||13||0||2.61E-05||0.004||0.023||Abnormal cholesterol level|
|MGI:180||169||13||0||2.61E-05||0.004||0.026||Abnormal circulating cholesterol level|
|GO:16746||214||14||0||1.42E-04||0.009||0.004||Transferase activity, transferring acyl groups|
|GO:16747||205||13||0||1.43E-04||0.008||0.004||Transferase activity, transferring acyl groups other than amino-acyl groups|
|GO:32680||34||7||1||8.83E-03||0.341||0.014||Regulation of tumor necrosis factor production|
|GO:5261||271||17||6||1.84E-02||0.547||0.042||Cation channel activity|
|GO:7417||441||28||10||2.77E-02||0.683||0.002||Central nervous system development|
|GO:16247||56||8||2||3.07E-02||0.719||0.026||Channel regulator activity|
|GO:70011||553||25||8||4.89E-02||0.848||0.038||Peptidase activity, acting on L-amino acid peptides|
The exception to this was CHRNA7, which is a member of the Gene Ontology (GO) categories “cation channel activity” (GO:5261; case CNV hits enrichment p=0.0184, GWAS enrichment p=0.042), “channel regulator activity” (GO:16247; p=0.0307, p=0.026), and “regulation of tumor necrosis factor production” (GO:32680; p=0.0088, p=0.014). CHRNA7 was affected by six duplications in case subjects but none in comparison subjects (p=9.08×10–4) and had a Simes-corrected gene-wide p value of 0.0002 from the GWAS.
In a sample of 727 children with ADHD and 5,081 comparison subjects, there was no evidence of genome-wide significant association with any SNP. In keeping with previous results from a subsample of the present study, we found an increased burden of large and rare CNVs. Analysis of our top 100 SNPs in the ADHD genetics consortium meta-analysis and deCODE data sets yielded no significant evidence of association, after allowing for testing of individual SNPs, when the 100 SNPs were considered together and when the discovery GWAS data were combined with those from the other data sets. These results add to the four published GWAS studies of ADHD (6–10) that include meta-analyses in which no genome-wide significant findings had been found. The lack of significant GWAS findings could simply reflect sample sizes that are inadequate for the multiple testing burden, and it may be that when much larger samples are assembled for extended meta-analyses, common risk variants will be detected. That more sets of genes were significantly enriched for subthreshold association signals is consistent with this hypothesis, as it implies that the distribution of the association signals with respect to genes is not random.
One major motivation for undertaking genetic studies is to identify underlying biological risk mechanisms. In the present study, we sought evidence on whether the pathways enriched for SNP association converge with those enriched for rare CNVs. Our finding of significant evidence for such a convergence underscores our contention that it is premature to dismiss the contribution of SNP variation, but more importantly, it begins to provide evidence that genome-wide studies of ADHD, based on common or rare variants, are likely to inform processes of relevance to pathophysiology. At present, our study is not sufficiently powered to identify any of these categories unambiguously. Significant pathways included those related to cholesterol (four pathways) and CNS development. The latter has been previously implicated in ADHD (3), although different methods were used. The lack of a clear overlap at the level of individual genes may reflect true differences in the specific genes within pathways implicated by SNPs and CNVs, perhaps arising from the different mutational mechanisms responsible for generating large CNVs and SNPs, neither of which occur randomly with respect to the genomic sequence context. However, it is also likely that it reflects low power to identify specific risk genes. Although not supported at a genome-wide level of significance, the convergence of SNP and CNV association at CHRNA7, which encodes the cholinergic receptor nicotinic alpha 7, is intriguing. CHRNA7 is widely expressed in the brain, especially the hippocampus (34), and is involved in rapid synaptic transmission. CHRNA7 has been examined in relation to schizophrenia, associated cognitive deficits, and nicotine dependence (35, 36), although findings have not been entirely consistent. There has been little published work on ADHD, although incomplete evaluations of the gene in much smaller samples have not been supportive (37). Thus, to date this gene has yet to be comprehensively investigated in relation to ADHD.
Small duplications and deletions on 15q13.3 have been found to be associated with neuropsychiatric phenotypes that include ADHD. Recurrent deletions of chromosome 15q13.3 are associated with developmental delay and a variety of neuropsychiatric phenotypes. It has been suggested that haploinsufficiency of CHRNA7 may have a causal role (38). Duplications spanning CHRNA7 have also been found to be associated with a broad range of neuropsychiatric phenotypes that include ADHD (39, 40). Increased dosage of CHRNA7 in these microduplications has been considered to be responsible.
GWAS and CNV studies capture only a proportion of genetic variation and do not allow for the effects of unmeasured genetic and environmental risk factors. In the future, the next generation of sequencing studies will go some way toward addressing some of these gaps. The pathway analysis using ALIGATOR relies on Gene Ontology, the Kyoto Encyclopedia of Genes and Genomes pathways, Mouse Genome Informatics, and PANTHER-defined functional categories (28). The ability to detect enriched pathways will depend on how well and how accurately biological processes are defined, and again, this knowledge will evolve over time.
In summary, in keeping with similarly sized previous genome-wide association studies of ADHD, we failed to find significantly associated common variants. We previously found large, rare CNVs to be associated with ADHD, and the results remain similar in this newly extended sample. Contrary to what some might expect, we found a highly significant overlap of biological pathways hit by both CNVs and SNPS. This implies that both types of gene variants are relevant to ADHD risk. Finally, our results suggest that CHRNA7 is a promising candidate to examine further.
The authors thank the families who participated in this project and the clinicians who supported it; the field team members for sample collection (Charlotte Davies, Emma Evans, Elisabeth Felter, Kate Greening, Amy Hensey, Andrew Martin, Joanna Martin, Joanne Park, Rachel Roberts, Janet Robinson, Sarah Scott, Nina Smyth, Sharifah Syed, Lauren Whittington); Dr. Dobril Ivanov for setting up and maintaining the software and server; Dr. Kiran Mantripragada for assistance with genotyping; and Dr. Denise Harold for advice with analysis. The authors also acknowledge the support of the Health Research Board, Dublin (Drs. Gill and Hawi), the Dublin Molecular Medicine Centre, and the Hyperactive and Attention Disorder Group Ireland.