Genes across the 2q31 region were screened for association in two stages as detailed in the Method section. The genes analyzed included glutamate decarboxylase 1 (GAD1) (in collaboration with Drs. Shigeo Kure, Kiyoshi Kanno, and Yoichi Matsubara), four hypothetical proteins (FLJ13096, FLJ13984, PRO2037, and FLJ23462, recently identified as duodenal cytochrome b) (in collaboration with Drs. Paolo Gasparini and Massimo Carella), histone acetylase-1 (HAT-1) (in collaboration with Dr. Salah Uddin Qureshi), the cytoplasmic dynein subunit DNCI2, the aspartate/glutamate carrier SLC25A12, and the homeobox protein DLX2 (
+Figure 1). These candidate genes were chosen on the basis of their position relative to the positive linkage results from three studies
+(13–
+15), their expression in brain tissue, and, in some cases, their known function, their novelty, or the existence of related genes within the region of chromosome 7 showing linkage to autism. For this latter criterion, we note that the linked region of chromosome 7 contains genes paralogous to DNCI2, SLC25A12, DLX1, and DLX2 (i.e., DNCI1, SLC25A13, DLX5, and DLX6).
In the first stage, all known exons (with flanking intronic sequence) of these genes were screened by single-strand conformation polymorphism and denaturing high-performance liquid chromatography for variants in 35 to 47 unrelated individuals chosen from families showing linkage to D2S335 as described in the Method section. In the nine genes, 82 exons were screened and 29 SNPs were identified. Frequencies of each variant were then evaluated in autistic patients (using only one affected individual per family, N=38) and in 50 ethnically matched nonautistic subjects, after confirming that the distribution of allele frequencies were in Hardy-Weinberg equilibrium. Only two SNPs, both within the SLC25A12 gene, showed significant differences in allele frequencies between autistic and nonautistic subjects using both allele-based (p<0.004) and genotype-based (p<0.03) tests.
Within the SLC25A12 gene, we identified a total of five variants in the first stage screen (including the two meeting criteria for further study).
+Figure 1 presents the five variants of the SLC25A12 gene identified in 47 affected subjects linked to the chromosome 2q24-q33 region. The two polymorphisms meeting criteria in the first stage, rs2056202 (I3-21A/G) and rs2292813 (I16+70A/G), are G/A variants in flanking intronic sequence located 21 base pairs upstream of exon 4 and 70 base pairs downstream of exon 16, respectively. Two variants, a C-T variant at nucleotide 99 (rs1878583) and a G-A variant at nucleotide 1418, were within coding regions. G1418A changes arginine 473 to glutamine, while the C99T variant is silent. G1418A is a new SNP (i.e., not reported in the National Center for Biotechnology Information dbSNPs database) located in a region conserved across mammalian species, but the amino acid glutamine is observed in mice. The final variant appears in the 3′ untranslated region. We did not find in our sample SNP rs1059299, reported in the public database, which changes amino acid 600.
Given the evidence for association of rs2056202 and rs2292813 in a small number of affected and nonaffected subjects, the entire sample was genotyped at these SNPs for analysis by the Transmission Disequilibrium Test, which makes use of family-based comparison subjects. Of the 411 families studied, 197 had at least one parent heterozygous for at least one SNP. This group consisted of 140 multiplex and 57 singleton families. To test for association by the Transmission Disequilibrium Test, transmission from heterozygous parents to one affected child was analyzed (
+Table 1). Transmission Disequilibrium Test analysis demonstrated association for rs2056202 (p=0.001) and for rs2292813 (p=0.01). Similar excess transmission was assessed by TRANSMIT for rs2056202 (χ
2=10.71, df=1, p=0.001) and rs2292813 (χ
2=7.24, df=1, p=0.007). In both cases, the G allele appeared to be the risk allele (or the A allele the protective allele). For simplicity,
+Table 1 and
+Table 2 show transmission data for just the G allele for both SNPs.
Association studies were also carried out using multiple affected individuals per family (
+Table 2). Such analysis is more properly a measure of linkage rather than association, while providing increased power. Transmission disequilibrium was observed for both rs2056202 (p=0.003) and rs2292813 (p=0.007). Similar results were found with TRANSMIT for rs2056202 (χ
2=10.3, df=1, p=0.001) and rs2292813 (χ
2=8.17, df=1, p=0.004).
Looking at haplotypes, there was an increased transmission of the G*G haplotype in autism when analyzing either one affected individual per family (p=0.000003) (
+Table 1) or all affected individuals (p=0.000006) (
+Table 2). Using a global analysis, two-locus Transmission Disequilibrium Test showed disequilibrium of transmission of the four observed haplotypes for both one affected individual per family (χ
2=32.31, df=3, p=0.0000005) or all affected individuals (χ
2=28.76, df=3, p=0.000003). Similar observations were made with TRANSMIT for transmission of the G*G haplotype to either one affected subject (χ
2=8.1, df=1, p=0.004) or all affected subjects (χ
2=12.37, df=1, p=0.0004).
Genotype relative risk was estimated for individuals carrying one or two copies of the risk alleles (the G alleles for both SNPs). Using one affected subject per family, genotype relative risk could be estimated as 1.56 and 3 in heterozygotes and 2.51 and 4.81 in homozygotes for rs2056202 and rs2292813, respectively. Using all affected subjects, the values were 1.92 and 2 in heterozygotes and 2.36 and 2.88 in homozygotes. Note that estimates of genotype relative risk tend to be underestimated in family studies such as these
+(25).
Two-point linkage analysis using nonparametric lod score analysis indicated some evidence for linkage between autism and rs2056202 or rs2292813 (
+Table 3). Two-point heterogeneity lod score supported this linkage, with maximal heterogeneity lod scores of 1.52 (p=0.06) and 1.79 (p=0.04) for rs2056202 and rs2292813, respectively. However, information was low at these SNPs (estimated as 0.21 and 0.28 for rs2056202 and rs2292813, respectively). To increase information we used multipoint linkage analyses with these two SNPs. Under these conditions, maximal multipoint nonparametric linkage scores of 1.57 and maximal multipoint heterogeneity lod scores of 2.11 were observed. The two markers showed linkage disequilibrium with each other as determined by analyzing linkage disequilibrium in unrelated patients (D′=0.79, SD=0.06).
To examine the relationship between the linkage and the association, we first identified a subset (selected from the 197 informative families) of 76 families that showed linkage, defined as a positive multipoint NPL value. In this subset, we found an increased transmission for rs2056202 with one affected per family (Transmission Disequilibrium Test: χ2=6.06, df=1, p=0.01; TRANSMIT: χ2=7.23, df=1, p=0.01) or all affected subjects (Transmission Disequilibrium Test: χ2=12.31, df=1, p=0.0005; TRANSMIT: χ2=14.64, df=1, p=0.0001) and for r2229813 with all affected subjects (Transmission Disequilibrium Test: χ2=8.91, df=1, p=0.003; TRANSMIT: χ2=9.24, df=1, p=0.002), but not with one affected subject (Transmission Disequilibrium Test: χ2=2.27, df=1, p=0.13; TRANSMIT: χ2=3.33, df=1, p=0.12). The G*G haplotype showed association with either one affected subject per family (Transmission Disequilibrium Test: χ2=7.14, df=1, p=0.008; TRANSMIT: χ2=8.97, df=1, p=0.004) or all affected subjects (Transmission Disequilibrium Test: χ2=21.25, df=1, p=0.000004; TRANSMIT: χ2=17.26, df=1, p=0.0006). In contrast, in the 121 informative families that did not show linkage, neither rs2056202 (for all affected subjects, Transmission Disequilibrium Test: χ2=1.28, df=1, p=0.26; TRANSMIT: χ2=2.03, df=1, p=0.15) nor rs2292813 (for all affected subjects, Transmission Disequilibrium Test: χ2=1.00, df=1, p=0.32; TRANSMIT: χ2=2.00, df=1, p=0.16) showed such evidence for association.