High-density SNP association study and copy number variation analysis of the AUTS1 and AUTS5 loci implicate the IMMP2L–DOCK4 gene region in autism susceptibility

ORIGINAL ARTICLE

High-density SNP association study and copy numbervariation analysis of the AUTS1 and AUTS5 loci implicatethe IMMP2L–DOCK4 gene region in autism susceptibility

E Maestrini1,11, AT Pagnamenta2,11, JA Lamb2,3,11, E Bacchelli1, NH Sykes2, I Sousa2, C Toma1,

G Barnby2, H Butler2, L Winchester2, TS Scerri2, F Minopoli1, J Reichert4, G Cai4, JD Buxbaum4,

O Korvatska5, GD Schellenberg6, G Dawson7,8, A de Bildt9, RB Minderaa9, EJ Mulder9, AP Morris2,

AJ Bailey10 and AP Monaco2, IMGSAC12

1Department of Biology, University of Bologna, Bologna, Italy; 2The Wellcome Trust Centre for Human Genetics,University of Oxford, Oxford, UK; 3Centre for Integrated Genomic Medical Research, University of Manchester,Manchester, UK; 4Department of Psychiatry, Seaver Autism Research Center, Mount Sinai School of Medicine,New York, NY, USA; 5Geriatric Research Education and Clinical Centre, Veterans Affairs Puget Sound Health CareSystem, Seattle Division, Seattle, WA, USA; 6Department of Pathology and Laboratory Medicine, University ofPennsylvania School of Medicine, Philadelphia, PA, USA; 7Autism Speaks, New York, NY, USA; 8Department ofPsychology, University of Washington, Seattle, WA, USA; 9Department of Psychiatry, Child and Adolescent Psychiatry,University Medical Center Groningen, Groningen, The Netherlands and 10University Department of Psychiatry,Warneford Hospital, Oxford, UK

Autism spectrum disorders are a group of highly heritable neurodevelopmental disorders witha complex genetic etiology. The International Molecular Genetic Study of Autism Consortiumpreviously identified linkage loci on chromosomes 7 and 2, termed AUTS1 and AUTS5,respectively. In this study, we performed a high-density association analysis in AUTS1 andAUTS5, testing more than 3000 single nucleotide polymorphisms (SNPs) in all known genes ineach region, as well as SNPs in non-genic highly conserved sequences. SNP genotype datawere also used to investigate copy number variation within these regions. The study sampleconsisted of 127 and 126 families, showing linkage to the AUTS1 and AUTS5 regions,respectively, and 188 gender-matched controls. Further investigation of the strongestassociation results was conducted in an independent European family sample containing390 affected individuals. Association and copy number variant analysis highlighted severalgenes that warrant further investigation, including IMMP2L and DOCK4 on chromosome 7.Evidence for the involvement of DOCK4 in autism susceptibility was supported byindependent replication of association at rs2217262 and the finding of a deletion segregatingin a sib-pair family.Molecular Psychiatry (2010) 15, 954–968; doi:10.1038/mp.2009.34; published online 28 April 2009

Keywords: autistic disorder; disease susceptibility; single nucleotide polymorphisms; linkagedisequilibrium; chromosome 7; chromosome 2

Introduction

Autism (OMIM: %209850) is a complex neurodeve-lopmental disorder characterized by impairments in

reciprocal social interaction, difficulties in verbal andnonverbal communication, stereotyped behaviors andinterests, and an onset in the first 3 years of life.Autism belongs to the group of pervasive develop-mental disorders (PDD), also known as autismspectrum disorders (ASDs), which also includeAsperger syndrome and pervasive developmentaldisorder—not otherwise specified (PDD-NOS). Theestimated population prevalence of core autism isaround 15–20 in 10 000, with a male/female sex ratioof approximately 4:1. When all ASD subtypes arecombined the prevalence is several times higher,reaching 116 in 10 000.1–3

Several lines of evidence indicate that geneticfactors are important in susceptibility to idiopathic

Received 20 October 2008; revised 19 February 2009; accepted 2April 2009; published online 28 April 2009

Correspondence: Professor AP Monaco, Wellcome Trust Centrefor Human Genetics, University of Oxford, Roosevelt Drive,Oxford OX3 7BN, UK.E-mail: [email protected] or Professor AJ Bailey,University Department of Psychiatry, Warneford Hospital, Head-ington, Oxford OX3 7JX, UK.E-mail: [email protected] authors contributed equally to this work.12IMGSAC: see list of authors in Supplementary Information.

Molecular Psychiatry (2010) 15, 954–968& 2010 Macmillan Publishers Limited All rights reserved 1359-4184/10

www.nature.com/mp

http://dx.doi.org/10.1038/mp.2009.34

mailto:[email protected]

mailto:[email protected]

http://www.nature.com/mp

autism. Twin studies show a concordance of 60–92%for monozygotic (MZ) twins and 0–10% for dizygotic(DZ) twins, depending on phenotypic definitions, andthe sibling recurrence risk is 25–60 times higher thanthe population prevalence.4 Furthermore, relatives ofaffected probands show a higher incidence of mildercognitive or behavioral features, consistent with thehypothesis of a ‘spectrum’ of severity.5

Autism spectrum disorders exhibit wide clinicalvariability and a high degree of genetic heterogeneity.A variety of chromosomal abnormalities are found ina small proportion of affected individuals (6–7%),most frequently in syndromic cases with dysmorphicfeatures and cognitive impairment.6 The autismphenotype is also associated with known geneticconditions such as the Fragile X syndrome andtuberous sclerosis. Recently, rare ASD-causing muta-tions were reported in a number of genes, includingNLGN3, NLGN4,7 NRXN1,8 SHANK39 and NHE9.10

In recent years, the development of DNA micro-array technologies has revealed that submicroscopicdeletions and duplications of DNA, known as copynumber variants (CNVs), may be significant in autismsusceptibility.11–14 Recent surveys identified a higherrate of de novo CNVs in autism pedigrees compared tocontrols, with the increased rate becoming moreexaggerated in singleton than in multiplex fa-milies.10,12,13 Nevertheless, it remains difficult tointerpret the significance of the numerous CNVsidentified in ASDs, to distinguish those that influencesusceptibility from normal polymorphic variation andto understand how they might interact with othergenetic and non-genetic factors.

Although individually rare, highly penetrant ab-normalities, such as microdeletions/microduplica-tions or point mutations, may have a significantfunction in ASDs. It is also likely that geneticsusceptibility may also result from the combinedaction of several common genetic variants. Commonvariation in several candidate genes has been im-plicated in autism (MET, CNTNAP2, SLC6A4, RELN,GABRB3),15 but in most cases consistent replicationhas not been achieved.

Because the strong genetic component in ASDs wasclearly demonstrated over a decade ago, a largenumber of molecular genetic studies have searchedfor susceptibility genes, following the general ap-proach of a genome-wide linkage scan using affectedsibling/relative pair families. The International Mole-cular Genetic Study of Autism Consortium (IMGSAC)identified the first autism linkage locus on chromo-some 7q21–q32 (designated autism susceptibilitylocus 1, AUTS1) with a multipoint maximum LODscore (MLS) of 2.53 in 87 families.16 This result wasconfirmed in follow-up studies conducted by theIMGSAC using additional families and markers.17,18

Another linkage susceptibility locus (AUTS5) wasidentified by IMGSAC on chromosome 2q24–q33with an MLS of 3.74 in 152 affected sibling pairs.17

Replication of linkage signals in independentstudies has proven difficult for ASDs. To date, 13

whole-genome linkage scan for ASDs have beenpublished,15 and no single locus has been consis-tently confirmed in all studies. This finding is likelyto result from the small effect size attributable toindividual genes, as well as from the clinical andgenetic complexity of ASDs; differences in ascertain-ment and inclusion criteria may have been additionalfactors. However, AUTS1 is one of the few identifiedloci that has been supported by overlapping positiveresults in multiple multiplex collections,19,20 and inmeta-analyses.21,22 Similarly, the chromosome 2qlocus is supported by overlapping linkage findingsin another two independent genome scans,23,24 and byhomozygosity mapping in consanguineous families.10

The largest genome scan published to date, carriedout by the Autism Genome Project (AGP) usingAffymetrix 10K single nucleotide polymorphism(SNP) arrays and 1181 multiplex families, alsoprovided some support for both the chromosome 2qand 7q loci within the families of inferred Europeanancestry.8

Despite the support for linkage on chromosomes 2qand 7q, the candidate genomic intervals remainbroad, each spanning approximately 40 Mb andcontaining approximately 200 known genes. Systema-tic screening and association studies of severalpositional candidate genes on chromosomes 2q and7q have been conducted by the IMGSAC,25–29 butthese studies have not led to the identification ofconfirmed autism susceptibility variants. Owing tothe recent technological advances in high-densitySNP genotyping and bioinformatic resources, wefocused our efforts on performing a gene-based high-density SNP association study of the autism suscept-ibility loci on chromosomes 2q and 7q implicated byIMGSAC linkage studies. SNP genotype data werealso used to investigate copy number variation withinthese regions. The genetic architecture of ASDs islikely to be extremely complex, with disease riskdetermined by both common variants of modesteffect, as well as rare variants with a range of effectsizes. The strategy of focusing on linkage regions forfine-mapping studies by high-density associationscreens will prioritize genes containing penetrantrare variants, which would not be well identifiedthrough association analysis. However, we mightexpect that genes containing such variants alsocontain more common variants of lesser effect andthus are still natural candidates to follow-up throughassociation studies.

Genotyping was conducted in two stages, based onHapMap Phase I and Phase II data, respectively. Intotal, 3002 SNPs were genotyped in each region,directly testing 173 genes on chromosome 2 and 270genes on chromosome 7. The study sample consistedof 126 and 127 affected individuals and their parents,selected from 293 IMGSAC multiplex families basedon identity-by-descent (IBD) sharing on chromosomes2q and 7q, respectively, as well as 188 gender-matched controls. This study design, where the sameprobands are used for family-based and case–control

SNP association and CNV analysis of AUTS1 and AUTS5E Maestrini et al

955

Molecular Psychiatry

analysis, should be more robust against the respectiveweaknesses of the case–control and TDT approaches(such as population structure and segregation distor-tion, respectively), and extract the maximum informa-tion from our sample.30 Moreover, by selectingfamilies showing excess allele sharing in the regionof interest, we are likely to increase the frequencyof the disease-associated alleles in the casesample, thereby increasing the power of associationstudies.31 Power calculations were performed over arange of risk allele frequencies and odds ratios (OR),confirming that the strategy of selecting familiesfor increased IBD sharing outperformed a strategyin which families are selected at random, givenfixed genotyping resources (see SupplementaryInformation).

Our study thus represents a deep exploration ofSNP and copy number variation within genic regionsof the two autism linkage loci on chromosomes 2qand 7q and pinpoints several genes that need furtherinvestigation.

Materials and methods

Study populationsThe chromosome 2 primary sample included 126independent autism families, for 371 individuals (119parent–parent–child trios and 7 single parent–childpairs). The chromosome 7 primary sample included127 independent autism families (117 parent–parent–child trios and 10 single parent–child pairs). Allfamilies were Caucasian (Table 1). The assessmentmethods and diagnostic criteria used by the IMGSAChave been described in detail previously.17 Diagnosiswas based on the Autism Diagnostic Interview—Revised (ADI-R) and the Autism Diagnostic Observa-tion Schedule (ADOS) and clinical evaluation. Kar-yotypes were obtained on all affected individualswhen possible, and gross karyotypic abnormalitieswere excluded in at least one affected individual perfamily in 93% of families and in both affectedindividuals in 83% of families.

Trios for the primary sample were selected from the293 multiplex families in the IMGSAC multiplexcollection (using one affected sib per family) based onIBD sharing on chromosomes 2q and 7q, respectively.Calculation of IBD states was based on microsatellitemarker data available from our genome scan18 andfine-mapping studies (unpublished data). RankedZ-scores were calculated for each family usingMerlin32 at the linkage peak position (D2S2302-D2S2310 and D7S2430-D7S684 for chromosomes 2and 7, respectively).

Two main sample collections were used for replica-tion (Table 1): (1) ‘IMGSAC replication’ (IMGSAC-R)sample: 260 parent-affected child trios or pairs and 34single cases and (2) ‘Northern Dutch’ sample (ND): 96singleton families from the north of the Netherlands,including 82 parent–parent–child trios and 14 par-ent–child pairs. Both replication sample collectionsfulfilled diagnostic criteria for Case ‘Type 1’ or ‘Type Table

1D

esc

rip

tion

of

sam

ple

s

Au

tism

sam

ple

Con

trols

Tota

laff

ecte

dS

ex

(M/F

)F

am

ily

typ

eC

ou

ntr

yof

ori

gin

Tota

lS

ex

(M/F

)C

ou

ntr

y

IMG

SA

Cch

r.2

126

103:2

3P

PC

119,

PC

773

UK

,25

US

A,

16

Neth

erl

an

ds,

8G

erm

an

y,3

Fra

nce,

1G

reece

188

154:3

4U

K

IMG

SA

Cch

r.7

127

101:2

6P

PC

117,

PC

10

66

UK

,28

US

A,

13

Neth

erl

an

ds,

9F

ran

ce,

7G

erm

an

y,3

Den

mark

,1

Gre

ece

188

148:4

0U

K

IMG

SA

Cre

pli

cati

on

294

236:5

8P

PC

213,

PC

47,

C34

129

UK

,85

Italy

,32

Germ

an

y,31

Neth

erl

an

ds,

10

Den

mark

,7

Fra

nce

180

144:3

6133

UK

,47

Italy

ND

96

85:1

1P

PC

82,

PC

14

Nort

hof

the

Neth

erl

an

ds

ND

-all

204

175:2

9P

PC

165,

PC

39

Nort

hof

the

Neth

erl

an

ds

Abbre

via

tion

s:C

,si

ngle

case

;F,fe

male

;IM

GS

AC

,In

tern

ati

on

al

Mole

cu

lar

Gen

eti

cS

tud

yof

Au

tism

Con

sort

ium

;M

,m

ale

;N

D,N

ort

hern

Du

tch

;P

C,p

are

nt–

ch

ild

pair

s;P

PC

,p

are

nt–

pare

nt–

ch

ild

trio

s.


956


2’ as defined by IMGSAC17 (meet ADI-R criteria or onepoint below threshold on one behavioral domain,meet ADOS/ADOS-G criteria for autism or PDD,performance IQ > 35). An extended Northern Dutchsample (ND-all; Table 1) was available, including 108cases that did not meet stringent criteria for one of thefollowing reasons: (1) met ADI-R criteria but failed tomeet ADOS criteria or did not undergo ADOSevaluation, (2) met ADI-R and ADOS criteria buthad an IQ score < 35, (3) did not meet full criteria forASD on the ADI-R.

The most significant SNPs from the chromosome 2locus were also tested in a collection of 358 multiplexfamilies (‘Mount Sinai’ sample), which have beenpreviously described.23,33 Similarly, three SNPs fromtwo of the most strongly associated genes in the case–control and family-based analysis on chromosome 7were genotyped in 62 Caucasian families selected forIBD sharing from a sample of 222 families showinglinkage to the same region of chromosome 719

(‘University of Washington’ sample).Controls used in the primary experiment included

188 DNA samples from UK random blood donorsfrom the ECACC HRC panels,34 sex-matched with theautism case sample. The additional set of 180 controlsgenotyped in the replication phase included 92 DNAsfrom ECACC HRC panels, 41 random donors from theUK and 47 random donors from Italy.

The study was reviewed by the relevant local ethicscommittees.

GenotypingSingle nucleotide polymorphisms for the primaryanalysis were genotyped using the GoldenGate assay(Illumina, San Diego, CA, USA) on an IlluminaBeadStation according to the manufacturer’s instruc-tions. BeadArrays were scanned using the BeadArrayReader at 532 and 647 nm. BeadStudio genotypingmodule (version 3.2.23) was used to generate genotypes.

Genotyping was conducted in two parallel stagesfor both chromosomal loci. A total of 3072 SNPs weregenotyped in each stage using two custom 1536-plexIllumina arrays, one for each chromosome. Theregions of interest ranged from 94.246 to 136.661 Mbon chromosome 7 and from 152.305 to 191.605 Mb onchromosome 2 (NCBI Build 36). These intervals weredefined using the approximate 1-LOD drop of thelinkage peaks on the two chromosomes, based onIMGSAC microsatellite marker data.18

In the first stage of this study, we evaluated thepatterns of linkage disequilibrium (LD) and thedistribution of haplotype blocks in the CEU genotypedata from the HapMap project release 13 (HapMapPhase I data). Genic regions were defined by NCBIBuild 34, by merging all RefSeq and UCSC KnownGenes, including all exonic, intronic and 30 UTRsequences, as well as 5 kb upstream of the 50 end. Atotal of 1496 tag SNPs on both chromosome 2q and 7qwere identified using HaploView35 and the Gabrielalgorithm for block definition from LD blocks over-lapping all genic regions.

In the second stage of genotyping, we tookadvantage of the higher-density HapMap Phase IIdata to better represent genetic variation in regions oflower LD not previously captured by the HapMapPhase I data. We also used the latest genomeannotation (NCBI Build 36) to investigate novel genesand ensure comprehensive coverage of all intragenicand putative regulatory regions on both chromo-somes. We identified ‘non-genic’ evolutionary con-served regions from PhastCons elements.36 Wedownloaded SNP genotype data from the CEUpopulation from HapMap release 22, and selectedall SNPs in all genic regions and in the top 5% of non-genic PhastCons elements. We also selected allnonsynonymous SNPs with minor allele frequency(MAF) X0.05. We then used the Tagger program fromHaploView35 (version 4) to select a second set of 1516tag SNPs for each chromosomal region. Parametersused for Tagger were r2

X0.75 (chromosome 2) andr2X0.63 (chromosome 7), minimum MAF of 5%,

aggressive tagging and force including SNPs alreadygenotyped in stage 1. We estimated that our two setsof SNPs were able to tag 96 and 85% of intragenicHapMap SNP variation (MAF > 0.05) with r2 > 0.8 onchromosomes 2 and 7, respectively.

Genotypes for 212 SNPs (99 on chromosome 2 and113 on chromosome 7), previously generated by theAGP using the Affymetrix 10K version 2 SNP array,8

were available on the IMGSAC family sample andwere also included in the family-based associationanalysis.

A total of 50 genome-wide unlinked SNPs weregenotyped for detection of population stratification,37

and 10 chromosome X SNP were also included toestimate levels of mistyping. In addition, for regionsof high LD, where tagging SNPs captured the mostgenetic variation, extra SNPs were chosen in case ofgenotyping failure.

Replication SNP genotypingSingle nucleotide polymorphisms for replicationwere genotyped using a combination of the MassExtend iPLEX Gold (Sequenom, San Diego, CA, USA)and TaqMan platforms. A 100% genotyping concor-dance was observed for two replicate DNA samplesgenotyped in each experiment. Twenty-five genome-wide SNPs were also genotyped in the IMGSAC-Rsample to test for population stratification.

Statistical analysisAssociation analysis. We evaluated evidence ofassociation using both ‘frequentist’ and Bayesianstatistical approaches.

Primary association analysis of the 5880 SNPs(including the 212 SNPs available from the AGPlinkage study8) successfully genotyped in theIMGSAC data set at the two loci was carried outusing the PLINK package.38 To extend the amountof information captured by single-marker tests, anadditional set of two-marker haplotype tags was


957


devised using the ‘aggressive’ option of the Taggerprogram39 implemented in HaploView.35 In total, 3526tests (2959 single-marker tests and 567 haplotypetags) were performed for the chromosome 2 study, and3380 tests (2921 single-marker tests and 459 haplo-type tags) for the chromosome 7 study.

Standard TDT from PLINK was used for family-based analysis, and the Cochran–Armitage trend test(1 degree of freedom) for the case–control analysis.Haplotype-based tests were calculated using PLINK.

Bayesian logistic regression analysis was performedusing the GENEBPM algorithm,40,41 again using both acase–control and family-based approach (see Supple-mentary Methods). The logistic regression modelallowed for additive and dominance effects of un-observed causal variants, a main effect of gender aswell as for parent-of-origin effects in the family-basedanalysis. GENEBPM analyses were performed using asliding window of five SNPs across each chromosomalregion. For comparison with frequentist single-SNPanalyses, the GENEBPM algorithm was also applied toeach SNP in turn (that is, single-SNP ‘haplotypes’).

Replication analysisAssociation analysis of the IMGSAC-R and ND repli-cation data sets was carried out using the UNPHASEDpackage,42 given the presence of a higher proportionof families with missing parents (24%) (Table 1).UNPHASED implements maximum-likelihood-basedassociation analysis for nuclear families and unrela-ted subjects allowing for missing genotypes anduncertain haplotype phase. In the presence of missingdata it has only minor loss of robustness to populationstratification and is more powerful than standardTDT.42

Analysis of the combined primary and replicationcohorts was also carried out using UNPHASED, againusing both a case–control approach and a family-based approach. Only the IMGSAC and IMGSAC-Rdata sets were combined for the population-basedmeta-analysis, because appropriate controls were notavailable for the ND population.

Copy number variationWe used transmission patterns of SNP genotypeswithin parent–offspring families to detect Mendelianerrors consistent with the presence of a deletion. Inaddition, the clustering of all SNP genotypes wasvisually examined to identify abnormal clusteringpatterns or outlying samples that might point to CNVsassociated with the autism phenotype. Sequencingwas carried out to confirm the presence of microdele-tions, CNVs or secondary SNPs.

After exclusion of whole-genome amplified sam-ples, data from both GoldenGate arrays were com-bined for each region, no-calls were deleted, and runon QuantiSNP version 1.0.43 CNV validation andscreening was carried out by multiplex PCR andquantitative multiplex PCR of short fluorescent frag-ments (QMPSF).44 Positive results were confirmed ina second independent QMPSF assay.

The distal breakpoint of the deletion detected infamily 15-0084 was better defined by quantitativePCR (qPCR) of DOCK4 exons 52, 37, 31, 14 and 7, withGAPDH as a reference.

Additional information is available as Supplemen-tary Methods.

Results

GenotypingA total of 6004 SNPs—3002 in each chromosomeregion—were genotyped using the Illumina Gold-enGate technology. After quality control procedures,we excluded 336 markers for one or more of thefollowing reasons: MAF < 0.05, more than 1 Mende-lian error, genotyping rate < 90%, poor clustering anddeviations from Hardy–Weinberg equilibrium(P < 0.001) in the control population.

For the 5668 (94%) SNPs that passed qualitycontrol, the genotyping efficiency exceeded 99.7%,with an estimated error rate from duplicate SNPs andfrom heterozygote calls of X chromosome SNPs inmales in the order of 2–5� 10�4. In summary, 2860SNPs from the chromosome 2q23.3–q32.3 region weresuccessfully genotyped in 559 DNA samples includ-ing 126 affected individuals, 245 parents and 188gender-matched controls from the ECACC collection;2808 SNPs from the chromosome 7q21.3–q33 regionwere successfully genotyped in 559 DNA samplesincluding 127 affected individuals, 244 parents and188 ECACC gender-matched controls. In addition, ourfamily-based analysis included genotypes from 212SNPs (99 on chromosome 2 and 113 on chromosome7), which were generated by the AGP using theAffymetrix 10K version 2 SNP array.8

There was no significant difference in the pattern ofLD between our sample and the HapMap CEU sample,indicating that the LD structure in the HapMap CEUdata can be readily applied to our autism sample.SNPs were selected to capture efficiently the largemajority of the currently known variation in allintragenic regions and highly conserved non-genicelements (see Supplementary Methods).

Population stratificationThe presence of stratification in a population-basedassociation study that is not suitably accounted for incase–control analysis can lead to an increase in thefalse-positive error rate. Furthermore, haplotypeanalyses in family-based association studies are notrobust to population stratification if random mating isassumed among parents in the haplotype estimationstep.

We tested for population structure in our primaryIMGSAC sample using Structure45,46 software, andtesting 50 unlinked genome-wide SNPs. Comparingthe fit of the admixture model for K = 1, 2 and 3 strata,we found strongest support for a model of nostratification (K = 1) in both of the following groupsof individuals: (1) probands, controls and HapMapCEU founders; and (2) parents and HapMap CEU


958


founders. Similarly, no evidence of stratification wasdetected in the combined IMGSAC primary andIMGSAC-R sample, using 25 unlinked genome-wideSNP markers. These results reassure us that no strongpopulation stratification is present in our IMGSACprimary and IMGSAC-R sample.

Association analysisThe results of the case–control (Cochran–Armitagetrend test) and family-based analysis (TDT) are shownin Figure 1 and summarized in Table 2.

Chromosome 2 association resultsThree SNPs in the NOSTRIN gene provided thestrongest association in the case–control analysis(rs7583629, P = 3.2�10�5; rs829957, P = 9.0�10�5;rs482435, P = 1.4� 10�4), followed by rs1020626(P = 3.8�10�4) in the FAM130A2 gene.

For the TDT analysis the strongest results camefrom SNPs in the ZNF533 gene (rs11885327,P = 8.0�10�4; rs1964081, P = 1.4� 10�3), and an SNPin the UPP2 gene (rs6709528, P = 8.0� 10�4).

Single-marker logistic regression analysis provideda similar ranking of results. In the case–control analy-sis the most strongly associated SNP rs7583629 inNOSTRIN provided a log10 Bayes factor (logBF) of 2.9,whereas in the family-based analysis the top signalwas for rs1139 in the ZNF533 gene (logBF = 1.7).GENEBPM multimarker analysis using 5-SNP slidingwindows (Supplementary Figure S1) showed in-creased evidence in favor of association for the

NOSTRIN locus (logBF = 3.2) in the case–controlanalysis, but did not identify additional interestingsignals. Family-based multimarker analysis revealedan additional association signal with a haplotypespanning 75 kb in the METTL8 gene (logBF = 2.3).

Chromosome 7 association resultsThe strongest signal for the case–control (trend test)analysis was from IMMP2L (rs12537269, P = 1.2� 10�4;rs1528039, 6.3� 10�4) and just upstream of SMO(rs6962740, P = 3.4� 10�4). The TDT test implicatedPlexin A4 (PLXNA4; rs4731863, P = 1.0� 10�4) andcut-like homeobox 1 isoform b (CUX1; rs875659,P = 2.0� 10�4).

Single-marker logistic regression analysis provideda similar ranking of results in the case–controlanalysis with rs12537269 (logBF = 2.9) in IMMP2Lshowing the most significance. In the family-basedanalysis, the most significant result was seen forrs4730037 in LHFPL3 (logBF = 2.1), closely followedby rs4731863 in PLXNA4 (logBF = 2.0). Moreover,GENEBPM revealed a parent-of-origin effect in theIMMP2L locus, with increased risk for causal variantsinherited from the father compared to those inheritedfrom the mother. For this reason we investigated SNPsin IMMP2L by parent-specific TDT, which revealed aP-value of 0.01 for rs2030781, with a transmitted/untransmitted allele ratio of 31:14 for paternaltransmissions (Table 2).

GENEBPM multimarker analysis using 5-SNP slid-ing windows showed increased evidence of associa-

Figure 1 Graphical representation of chromosome 2 and 7 association results. �Log10 P-values are plotted against thechromosome position. (a) P-values obtained for single markers (Cochran–Armitage trend test) and 2-SNP haplotype case–control association (PLINK). (b) P-values for single-marker TDT and 2-SNP haplotype TDT.


959


Table

2S

um

mary

of

pri

mary

ass

ocia

tion

resu

lts

Fam

ily-b

ase

dC

ase

Con

trol

SN

P/h

ap

loty

pe

Ch

r.P

osi

tion

Gen

eR

isk

all

ele

P-v

alu

eLogB

FP

-valu

eLogB

F

rs1427395

2153

442

168

Ph

ast

Con

saT

0.0

016

0.7

80.0

133

0.6

6rs

3769357

2157

101

520

GP

D2

A0.0

031

1.0

4rs

6437129,

rs6709528_C

C2

158

669

000

UP

P2

CC

9.7

2E

-04

1.0

0

rs6437133

2158

672

533

UP

P2

C0.0

045

0.6

5rs

6709528

2158

678

671

UP

P2

T8.0

0E

-04

1.0

9rs

12620556

2158

905

017

LO

C130940

A0.0

041

0.4

50.0

153

0.5

5rs

764660

2165

921

543

SC

N2A

C0.0

047

0.6

8rs

1020626

2166

106

880

FA

M130A

2T

3.8

0E

-04

1.9

2rs

10930170

2166

107

713

FA

M130A

2G

0.0

020

1.4

0rs

829957

2169

367

080

NO

ST

RIN

T0.0

116

0.5

99.0

3E

-05

2.4

7rs

6433093

2169

367

190

NO

ST

RIN

A5.5

9E

-04

1.9

4rs

7583629

2169

381

125

NO

ST

RIN

A0.0

027

1.0

93.2

2E

-05

2.9

2rs

482435

2169

384

291

NO

ST

RIN

C0.0

084

0.7

81.3

9E

-04

2.5

4rs

2098802

2170

760

429

MY

O3B

G0.0

042

1.0

2rs

6738892

2170

768

975

MY

O3B

A0.0

035

1.1

3rs

13007575

2174

386

625

Ph

ast

Con

saA

0.0

077

0.8

10.0

026

1.1

9rs

6717587

2175

865

296

Ph

ast

Con

saA

0.0

015

1.4

0rs

1434087

2178

912

043

OS

BP

L6

T0.0

010

1.5

6rs

7590028

2180

257

688

ZN

F533

T0.0

010

1.8

0rs

11885327

2180

276

318

ZN

F533

C8.0

0E

-04

1.5

60.0

271

0.5

6rs

11885327,

rs1964081_T

G2

180

287

992

ZN

F533

TG

0.0

019

1.5

4

rs2008230,

rs1964081_G

G2

180

288

449

ZN

F533

GG

0.0

030

1.0

4

rs881737,

rs1964081_G

G2

180

294

093

ZN

F533

GG

0.0

016

1.7

5

rs1964081

2180

299

666

ZN

F533

A0.0

014

1.6

1rs

2126424,

rs1139_C

G2

180

312

034

ZN

F533

CG

0.0

073

1.1

80.0

038

1.1

1

rs1139

2180

318

326

ZN

F533

G0.0

067

1.7

30.0

049

1.2

6rs

415994

2183

266

932

50

of

DN

AJC

10

C0.0

027

1.5

4rs

3755248

2188

078

477

TF

PI

T0.0

036

0.9

0rs

7573488

2188

106

325

TF

PI

G0.0

046

0.9

1rs

3811608

2191

043

302

FLJ2

0160

T0.0

023

0.8

1rs

6757698

2191

071

741

FLJ2

0160

C0.0

012

0.3

1rs

12538145

795

636

377

SLC

25A

13

C0.0

039

0.5

20.0

168

0.5

5rs

2307355

799

531

488

MC

M7

A0.0

040

0.8

8rs

11768465

7100

036

322

FB

X024

C0.0

184

0.5

70.0

031

1.2

1rs

875659

7101

696

376

CU

X1

C2.0

0E

-04

1.8

2rs

3819479

7103

184

318

RE

LN

T0.0

028

1.0

2rs

6976167

7103

848

209

LH

FP

L3

T0.0

026

1.1

6rs

12666599

7103

905

157

LH

FP

L3

T0.0

038

0.8

9rs

4730037

7104

129

973

LH

FP

L3

C0.0

032

2.0

7rs

176481

7105

515

161

SY

PL1

T0.0

385

8.1

9E

-04

1.9

2rs

9690688

7107

507

398

LA

MB

4T

0.0

047

0.7

5rs

6951925

7108

588

860

NT

_007933.6

89

bG

0.0

029

1.2

0rs

1464895

7110

111

977

IMM

P2L

A0.0

049

0.9

5rs

2030781

7110

149

994

IMM

P2L

C0.0

11

c1.5

70.0

037

1.1

5rs

12537269

7110

184

783

IMM

P2L

A1.2

0E

-04

2.8

5rs

10500002

7110

229

091

IMM

P2L

T0.0

012

1.5

8rs

1528039

7110

230

008

IMM

P2L

C6.2

8E

-04

1.7

7rs

12531640

7110

266

771

IMM

P2L

T0.0

021

1.2

7rs

2217262

7111

583

613

DO

CK

4A

0.0

143

0.4

10.0

042

1.0

2


960


tion for the IMMP2L locus in the case–control analysis(logBF = 2.9) and for PLXNA4 (logBF = 2.9) in thefamily-based analysis, but did not identify additionalinteresting signals (Supplementary Figure S1).

Analysis of the LD landscape across the AUTS1region, using both HapMap (CEU) and data from the127 probands used in our primary sample, indicatedthat the six associated SNPs in IMMP2L (Table 2) areall within a single block of LD, and thus likely to beindexing the same effect. In contrast, the modestassociation seen in the first intron of the neighboringDOCK4 gene was in a separate block of LD.

ReplicationWe attempted replication of 56 SNPs (28 on eachchromosome) that attained the most significant associa-tion results in primary case–control and TDT analyses(Table 3; Supplementary Table S1). The replicationpopulation consisted of the IMGSAC-R and the NDcollections, including 390 affected individuals (seeTable 1; Materials and methods for a description ofsamples). Family-based analysis of the replicationsample showed significant overtransmission of thecommon allele of SNP rs2217262 in the DOCK4 gene(P = 9.2� 10�4, OR = 2.28, confidence interval 1.37–3.77)(Table 3; Supplementary Table S1). This result remainssignificant after Bonferroni correction for multipletesting (28 SNPs tested on chromosome 7, P = 0.026).The trend toward association of rs2217262 (P = 0.029)was also seen in the extended ND sample, whichincluded additional subjects fulfilling broader diag-nostic criteria (ND-all, 204 affected subjects; Table 1).

The remaining SNPs did not show significantreplication after correction for multiple testing, andno parent-of-origin effects were seen for rs2030781.

Finally, the 56 SNPs selected for replicationwere investigated in the combined primary andreplication data sets; only 7 SNPs attained uncor-rected significance of P < 0.001 (Table 4). The DOCK4SNP rs2217262 reached a nominal significance ofP = 5.23� 10�5 in the family-based analysis of allcohorts (IMGSAC primary, IMGSAC-R and ND). In thecase–control analysis of the combined IMGSACcollections (421 cases and 368 controls), rs12537269in IMMP2L achieved the most significant result(P = 7.3�10�5). Additional loci retaining associationevidence in case–control meta-analysis were ZNF533on chromosome 2, and TSPAN12, FEZF1 andSLC13A1 on chromosome 7.

Several SNPs in the most interesting genes from theprimary analysis were also tested in two additionalfamily collections, which had previously shownevidence of linkage to the chromosome 2q and 7qloci19,23,33 (Supplementary Table S1). Five SNPs inNOSTRIN, ZNF533 and OSBPL6 were tested in asample of 358 multiplex families (‘Mount Sinai’cohort),23,33 but no significant results were obtained.Of the 28, 3 AUTS1 replication SNPs in IMMP2L andCUX1 were genotyped in 62 Caucasian familiesselected for IBD sharing from 222 families showinglinkage to the same region of chromosome 7T

able

2C

on

tin

ued

Fam

ily-b

ase

dC

ase

Con

trol

SN

P/h

ap

loty

pe

Ch

r.P

osi

tion

Gen

eR

isk

all

ele

P-v

alu

eLogB

FP

-valu

eLogB

F

rs989613

7113

233

792

NT

_007933.6

32

bG

0.0

049

0.5

3rs

7807053

7120

137

743

KC

ND

2A

0.0

022

1.1

8rs

41620

7120

213

054

30

of

TS

PA

N12

A0.0

046

1.0

7rs

2525720

7120

392

266

ING

3A

0.0

049

1.2

5rs

538558

7121

724

673

30

of

FE

ZF

1A

0.0

065

0.9

8rs

11978485

7122

480

367

30

of

SLC

13A

1G

0.0

295

0.9

80.0

039

1.2

3rs

6962740

7128

614

047

50

of

SM

OG

3.3

9E

-04

2.1

4rs

4110091

7128

719

985

AH

CY

L2

T0.0

012

1.5

6rs

2030974

7129

693

119

50

of

CPA

2C

0.0

197

0.5

60.0

032

1.1

7rs

2171493

7129

693

383

50

of

CPA

2C

0.0

412

0.3

90.0

038

1.1

8rs

13226219

7129

806

727

50

of

CPA

1T

0.0

032

1.2

6rs

1863009

7130

649

715

AK

054623

T9.2

0E

-04

1.6

0rs

7787173

7131

107

683

NT

_007933.1

017

bA

6.0

0E

-04

1.1

2rs

4731863

7131

674

323

PLX

NA

4T

1.0

0E

-04

2.0

20.0

321

0.4

5

On

lyS

NP

ssh

ow

ing

P<

0.0

05

ineit

her

fam

ily-b

ase

dor

case

–con

trol

an

aly

sis

are

rep

ort

ed

.P

-valu

es

>0.0

5are

not

show

n.P

-valu

es

<0.0

01

are

inbold

.T

he

rep

ort

ed

risk

all

ele

iscon

sist

en

tin

the

two

ap

pro

ach

es.

aP

hast

Con

s,h

igh

lycon

serv

ed

regio

n.

bP

red

icte

dgen

es,

refe

ren

ce

sequ

en

ce

an

nota

tion

ch

an

ged

from

Bu

ild

34.

cP

are

nt-

specif

icT

DT

.


961


(‘University of Washington’ sample),19 again with noevidence for association.

Copy number variationA Mendelian error in one family for SNP rs7585982pinpointed a potentially interesting deletion in theUPP2 gene on chromosome 2. The deletion bound-aries were defined by sequence analysis of additionalSNPs flanking rs7585982. Using long-range PCRfollowed by sequencing, we refined the deletionto 5897 bp of the UPP2 gene (158 681 612–158 687 508 bp; UCSC Build 36), removing two codingexons (exons 6 and 7) and predicted to cause aframeshift leading to a premature termination codon(Supplementary Figure S2A). This deletion was notpresent in the Database of Genomic Variants (DGV,http://projects.tcag.ca/variation/), suggesting it couldbe an autism-specific CNV. We screened the samesample used for the SNP association experiment (126cases and 188 controls) for the presence of thisdeletion using multiplex PCR (SupplementaryFigure S2B). The frequency of the deletion was notsignificantly different between cases and controls(1.6 and 3.2%, respectively, P = 0.2). To investigate ifthe deletion segregates with the ASD phenotype,we also screened 265 sib-pair families from theIMGSAC collection, including relatives of the 126cases. Of these, we found 30 families with a parentcarrying the deleted allele, and in only 13families was it transmitted to affected children(in 5 families to both affected siblings and in 8families to a single affected individual). Theseresults suggest that the UPP2 deletion is not involvedin autism susceptibility. The coding sequence ofUPP2 was also sequenced in 47 unrelated subjects,including 12 probands carrying the deletion of exons6 and 7; no novel coding variants were identified,except one silent change in exon 4 in only oneindividual.

By combining data from both SNP arrays for eachcandidate region, a sufficient SNP density wasachieved to carry out copy number analysis on thesesamples using QuantiSNP.43 We detected 17 CNVs inseven regions of chromosome 7 and 6 CNVs in fiveregions of chromosome 2 (Supplementary Table S2).For the chromosome 7 analysis, an B800 kb duplica-tion was detected in family 13-3023 that wastransmitted from father to proband (SupplementaryFigure S3). This duplication includes two genes:IMMP2L and DOCK4. Another duplication overlap-ping EMID2 and RABL5 was detected in three familieswhere it was transmitted from mother to proband,whereas a smaller duplication containing only EMID2was detected in a father, but not transmitted, and inone control. A third CNV in EXOC4 was detected as anontransmitted loss in a father and as a gain (fourcopies) in a control.

On chromosome 2, five duplications and onedeletion were detected in parents and a singlecontrol, but never transmitted to an affected child(Supplementary Table S2).T

able

3F

am

ily-b

ase

dan

aly

sis

of

rep

licati

on

sam

ple

su

sin

gU

NP

HA

SE

D

IMG

SA

C-R

(294

aff

ecte

dsu

bje

cts

)N

D(9

6aff

ecte

dsu

bje

cts

)IM

GS

AC

-Rþ

ND

(390

aff

ecte

dsu

bje

cts

)

SN

PC

hr.

Gen

eA

llele

sR

isk

all

ele

P-v

alu

eC

a-F

req

Co-F

req

P-v

alu

eC

a-F

req

Co-F

req

P-v

alu

eC

a-F

req

Co-F

req

rs1427395

2P

hast

Con

saA

/TT

0.3

634

0.5

64

0.5

32

0.0

216

0.5

00

0.3

87

0.0

505

0.5

47

0.4

96

rs6437133

2U

PP

2C

/TC

0.1

630

0.5

43

0.5

00

0.0

395

0.4

82*

0.5

99*

0.7

106

0.5

29

0.5

19

rs12620556

2LO

C130940

A/G

A0.8

454

0.8

99

0.9

05

0.0

235

0.9

05*

0.9

65*

0.2

247

0.9

01

0.9

20

rs13007575

2P

hast

Con

saA

/GA

0.1

337

0.9

21

0.9

46

0.0

063

0.9

58

0.8

86

0.8

726

0.9

31

0.9

29

rs1434087

2O

SB

PL6

C/T

T0.0

399

0.9

28

0.8

90

0.7

597

0.9

16

0.9

25

0.0

988

0.9

25

0.8

98

rs11768465

7F

BX

024

C/T

C0.3

915

0.7

85

0.7

65

0.0

184

0.7

61*

0.8

63*

0.6

561

0.7

79

0.7

90

rs1464895

7IM

MP

2L

A/G

A0.4

854

0.1

61

0.1

45

0.0

042

0.1

20*

0.2

35*

0.3

043

0.1

50

0.1

70

rs12537269

7IM

MP

2L

A/G

A0.0

485

0.2

62

0.2

10

0.7

737

0.2

55

0.2

43

0.0

667

0.2

60

0.2

20

rs2217262

7D

OC

K4

A/C

A0.0

272

0.9

55

0.9

24

0.0

055

0.9

79

0.9

16

9.2

1E

-04

0.9

62

0.9

21

rs2171493

750

of

CPA

2A

/CC

0.0

230

0.2

42*

0.3

01*

0.8

506

0.2

16

0.2

24

0.0

459

0.2

35*

0.2

82*

rs4731863

7P

LX

NA

4A

/TT

0.1

591

0.9

07

0.9

31

0.0

987

0.8

91

0.9

38

0.0

391

0.9

03*

0.9

34*

Abbre

via

tion

s:C

a-F

req,

frequ

en

cy

inaff

ecte

doff

spri

ngs;

Co-F

req,

frequ

en

cy

inu

ntr

an

smit

ted

pare

nta

lall

ele

s;IM

GS

AC

-R,

Inte

rnati

on

al

Mole

cu

lar

Gen

eti

cS

tud

yof

Au

tism

Con

sort

ium

-rep

licati

on

;N

D,

Nort

hern

Du

tch

;S

NP,

sin

gle

nu

cle

oti

de

poly

morp

his

m.

On

lyn

om

inal

P-v

alu

es

<0.0

5are

show

n.A

llele

frequ

en

cie

sare

rep

ort

ed

for

the

risk

all

ele

dete

cte

din

the

pri

mary

ass

ocia

tion

an

aly

sis.

Fli

p-f

lop

of

ass

ocia

ted

all

ele

isfl

agged

by

an

ast

eri

sk.

aP

hast

Con

s,h

igh

lycon

serv

ed

regio

n.


962


http://projects.tcag.ca/variation/

Most of the identified CNVs are well represented inthe DGV, suggesting that they do not have a majorfunction in autism susceptibility. However, theduplication involving IMMP2L and DOCK4 warrantedfurther analysis, as it involved two adjacent genesshowing possible SNP association with autism.Therefore we developed a QMPSF assay able tosimultaneously test CNVs in exons 2, 3 and 6 ofIMMP2L, exon 4 of LRRN3 and the last exon of DOCK4(number 52). We validated the duplication in family13-3023, identified by QuantiSNP, and verified that itis transmitted from the father to the affected son, butit was not transmitted to the other affected sib or to anunaffected sib. Screening of 475 UK controls and 285IMGSAC multiplex families with 487 affected indivi-duals was then carried out using the QMPSF assay,to check if CNVs in these genic regions segregatedwith the autism phenotype in families and/or have ahigher frequency in cases than controls. We identifiedsix additional deletions of different length, of whichsome were transmitted. One deletion disrupted exons2 and 3 of IMMP2L and the last exon of DOCK4, andwas transmitted from the mother to both affectedsons, as well as to a daughter, who did not have anASD. Both the carrier mother and daughter werereported to have dyslexia. qPCR indicated that thedeletion distal breakpoint is located between exons31 and 14 of DOCK4 (Supplementary Figure S4). Twosmaller deletions were transmitted from the parent toonly one of their affected children, one was found

only in the father but not transmitted and the othertwo were found in controls. The relative lengthand position of the CNVs identified are depicted inFigure 2.

Discussion

Several linkage studies have suggested that chromo-somes 2q and 7q may harbor one or more genescontributing to the risk for developing an ASD. Here,we have presented a comprehensive high-densitySNP genotyping, association and CNV study coveringthe 2q23.3–q32.3 and 7q21.3–q33 chromosome re-gions. We have tested more than 3000 SNPs in eachregion, covering all known genes, as well as in highlyconserved non-genic sequences.

The complementary case–control and family-basedapproach taken in our study allowed us to extract themaximum information from our sample, taking intoconsideration the advantages and disadvantages ofthe two different approaches. Case–control studiesare more powerful compared to family-based ap-proaches, but are sensitive to the presence of popula-tion stratification. Structure analysis using 50genome-wide SNPs did not reveal strong populationstratification, although we cannot exclude that un-detected low levels may be present. Family-basedapproaches are more robust to confounding bypopulation stratification and in addition they enabletesting for parent-of-origin effects.

Table 4 Combined analysis of primary and replication samples

All samples combineda

Family-based analysisIMGSAC samples combinedb

Case–control analysis

Chr SNP Gene location Risk allele P-value(Ca-Co Freq)

OR (CI) P-value(Ca-Co Freq)

OR (CI)

2 rs7590028 ZNF533 intronic T 0.3227 6.56E-04 1.41(0.52, 0.50) (0.54, 0.45) (1.16–1.72)

7 rs2030781 IMMP2L intronic C 0.08613 4.63E-04 1.53(0.25, 0.22) (0.27, 0.19) (1.20–1.95)

7 rs12537269 IMMP2L intronic A 0.01047 7.26E-05 1.62(0.27, 0.22) (0.27, 0.19) (1.27–2.06)

7 rs2217262 DOCK4 intronic A 5.23E-05 2.37 1.75E-03 2.08(0.96, 0.92) (1.53–3.68) (0.96, 0.92) (1.31–3.32)

7 rs41620 30 of TSPAN12 A 0.08796 8.14E-04 1.48(0.77, 0.74) (0.78, 0.71) (1.18–1.86)

7 rs538558 30 of FEZF1 A 0.4688 5.77E-04 1.45(0.36, 0.34) (0.37, 0.28) (1.17–1.80)

7 rs11978485 30 of SLC13A1 G 0.04972 2.89E-04 1.59(0.82, 0.79) (0.84, 0.76) (1.24–2.04)

Abbreviations: Ca-Co freq, risk allele frequency in affected offspring and in untransmitted parental alleles (family-based) orin control (case–control); IMGSAC, International Molecular Genetic Study of Autism Consortium; OR (CI), odds ratio and95% confidence interval; SNP, single nucleotide polymorphism.Results generated by UNPHASED. Only SNPs with nominal P < 0.001 are shown. P-values < 0.001 are in bold.aIMGSAC primary sample, IMGSAC-R, ND (515–516 affected individuals).bIMGSAC primary sample, IMGSAC-R (420–421 cases, 368 controls).


963


Although the strongest signals identified by the twoapproaches did not coincide, comparison of theresults led us to pinpoint the most interesting locisupported by both methods, albeit with differentstrength. In addition, consistency of the resultsobtained by frequentist and Bayesian approachessuggested that our strongest signals are independentof the analysis method.

Primary association analysis of the chromosome 2region identified the most interesting results inNOSTRIN, UPP2 and ZNF533. NOSTRIN encodesthe nitric oxide synthase trafficker. Interestingly, thenitric oxide signaling pathway has been recentlyshown to be overrepresented in genes disrupted byCNVs in schizophrenia.47 However, the NOSTRINassociation was stronger in the case–control analysiswith only minor support from the TDT, and it wasnot confirmed in the replication sample or in thecombined meta-analysis, suggesting that it mightrepresent a false-positive result.

Similarly, the ZNF533 association was not repli-cated, however rs7590028 remained one of thestrongest signals in case–control combined analysisof IMGSAC samples. ZNF533 encodes a proteincontaining four matrin-type zinc fingers and is highlyconserved in evolution. Given its putative nuclearlocation, it is thought to act as a repressor oftranscription, although no specific targets are cur-rently known. ZNF533 is widely expressed in adult

tissues, including brain. Expression of all isoforms infetal brain was confirmed by reverse transcriptase–PCR (data not shown). Deletions including ZNF533have been described in several patients with aneurological phenotype including mental retarda-tion,48,49 and other zinc-finger genes have alsobeen implicated in mental retardation cases.50–52 Thezinc-finger gene ZNF804A was recently identified asthe strongest result in a genome-wide associationstudy of schizophrenia and bipolar disorder,53 sug-gesting that they may act as transcription regulators ina wide range of human cognitive processes.

On chromosome 7, the most significant associationresult from the primary cohort was in the IMMP2Lgene. Although SNPs in this gene failed to replicate inindependent samples, the IMMP2L intronic SNPrs12537269 achieved the strongest result in thecase–control meta-analysis of the IMGSAC sample(P = 7.3�10�5). This gene encodes an inner mitochon-drial membrane protease-like protein and is a plau-sible candidate for autism, because it was previouslyreported to be disrupted in an individual withTourette syndrome, a complex neuropsychiatric dis-order showing phenotypic overlap with ASDs.54

Moreover, IMMP2L contains a neuronal leucine-richrepeat gene (LRRN3) nested within its large thirdintron. The expression profile of LRRN3 also makes itan interesting candidate gene for autism, as it is mosthighly expressed in fetal brain. Studies in Drosophila

Figure 2 Summary of IMMP2L and DOCK4 copy number variants (CNVs). Fragments tested by QMSPF are shown as redbars at the top. CNVs from the Database of Genomic Variants (DGV) are shown as orange bars. Deletions and duplicationsidentified in affected individuals and in parents or controls are depicted at the bottom. Dashed and continuous lines indicatethe maximum and minimum length of the CNVs, respectively. The distal breakpoint of the deletion in pedigree 15-0084 wasdefined by qPCR. The distal breakpoint of the duplication in pedigree 13-3023 was not defined precisely.


964


demonstrate that many members of the LRR familyprovide an essential role in target recognition, axonalpathfinding and cell differentiation during neuraldevelopment,55 and murine studies suggest these LRRproteins could have similar functions in mammalianneural development.56

The only SNP that achieved significant replication,after Bonferroni correction for multiple testing, isrs2217262 in the neighboring gene DOCK4, also agood autism candidate. This gene encodes a proteinthat activates Rac GTPase and is often deleted duringtumor progression.57 A recent study in rats indicatesthat DOCK4 is predominantly expressed in thehippocampus as well as in the lung.58 This studyfurther demonstrated that in cultured hippocampalneurons, DOCK4 is upregulated at the same time asdendrites start growing, and that knockdown of thisgene by RNA interference results in impaired den-dritic morphogenesis.

The association result for rs2217262 indicates thatthe common allele in the population is associatedwith increased risk for autism, or the minor allele is a‘protective’ variant. It has been shown that inpresence of missing data, SNPs with a low MAFmay show a bias in TDT, resulting in artificialovertransmission of the common allele.59 This pro-blem is not likely to apply to rs2217262, as thisassociation was supported also by case–controlanalysis.

Although only the rs2217262 association wasconfirmed by replication analysis, suggesting thatthe other results may represent false positives, thispolymorphism (with MAF only about 5%) would notalone account for the linkage signal seen at AUTS1 inthe IMGSAC sample. It is thus possible that multipleloci might contribute to the overall linkage seen forthis region, and that the other significant SNPs fromprimary analysis may in reality be true signals butwith lower OR, which our replication study wasunderpowered to detect. We do recognize that severallimitations may have affected our replication sample.The primary sample was composed of trios selectedfrom multiplex families based on IBD sharing, therebymore likely to be enriched for susceptibility alleles.By contrast, the replication population was a moreheterogeneous sample, not preselected on linkage,and was mostly composed of singleton families.Power calculation suggested that our replicationsample (IMGSAC-R and ND) should give us sufficientpower to replicate the most significant primaryresults. However, the well-known ‘winners curse’theory also suggests that the effect sizes from theinitial study may have been overestimated, thusrequiring a much larger sample for replication. Wedid not detect presence of structure in the combinedIMGSAC primary and IMGSAC-R samples, but it ispossible that heterogenity may be present among thedifferent samples used in this study (ND, Mount Sinaiand University of Washington). This could havealso contributed to the lack of replication, as couldhave gene–environment interactions, when different

environmental exposures are present between popu-lation samples.

De novo and/or inherited CNVs are emerging asimportant causes of ASDs and other complex dis-orders.8,11–13 Hence we exploited our dense SNPgenotyping data to mine for structural variants. Themost interesting discovery is the occurrence ofdeletions and duplications in four independentfamilies in the IMMP2L/DOCK4 locus, given thecoincident SNP association also seen for these genes.A maternal deletion was transmitted to both affectedsons and the unaffected daughter in family 15-0084.In all other instances (two deletions and oneduplication) the second affected sib did not inheritthe CNV. Interestingly, the maternally segregatingdeletion extends to the 30 end of the DOCK4 gene,whereas the non-segregating deletions or those iden-tified in controls and in the DGV were limited toIMMP2L. Taken together, these data seem to suggestthat a copy number loss of DOCK4 may influencesusceptibility to ASDs, whereas duplications may notbe damaging. The effect of DOCK4 deletions might beless penetrant in women because the mother and theunaffected daughter also carried the deletion. Largerstudies will be needed to confirm this hypothesis.

The predominantly gene-based nature of our studyrepresents a possible limitation, as we may havemissed susceptibility alleles in intergenic regions.Recent findings from the ENCODE Consortium em-phasize the importance of looking at noncodingsequence, as several functional elements in thegenome seem to be in these regions.60 We attemptedto minimize this limitation by including several SNPsin non-genic evolutionary conserved elements.

Our study also suggests that no common variants oflarge effect size are present within genic regions atAUTS1 and AUTS5 and highlights the importance ofvery large sample sizes for identification of robustassociations and rare CNVs with sufficient power forstatistical significance. Evidence from recent genome-wide association studies for various disorders clearlyshows that effect sizes for loci contributing tocomplex traits are generally lower than those pre-dicted a few years ago.61 Several whole-genomeassociation and CNV studies for autism are currentlyin progress by large consortia, and it will be interest-ing to see if any of the genes highlighted by this studyare also identified by these extensive studies.

It is possible that rare variants, both point muta-tions and CNVs, may account for a larger fraction ofthe overall genetic risk in complex psychiatricdisorders than previously assumed. The presentstudy was not designed to assess the contribution ofrare sequence variants and our results do not precludethat the chromosome 2q and 7q linkage regions mayharbor rare variation showing allelic heterogeneityacross families, which may require resequencing touncover.

The inconclusive findings identified with thisstudy reflect the status of the field of autism geneticsand suggest that classical approaches such as linkage


965


and association analyses alone may not be sufficientto deal with the genetic and phenotypic heterogeneityseen in autism. One recent study of note usedhomozygosity mapping to uncover a number of largehomozygous deletions in consanguineous pedigrees,highlighting the utility of this approach for hetero-geneous disorders like autism.10 Another successfulstudy found linkage to 15q13.3–q14 in a subset offamilies with IQ X70, suggesting that the use ofinformative subphenotypes to define homogeneoussets of ASD families could be very important indetecting susceptibility loci involved in autism.62

Finally, another report indicated that level of somaticCNVs between MZ twins may be higher thanexpected.63 If confirmed, this finding could be apowerful tool for identification of autism suscept-ibility loci in MZ twins with a discordant phenotype.We believe a combination of these (and other) novelapproaches, together with traditional methods will berequired to uncover all the genes and biologicalpathways leading to autism.

In summary, the present high-density SNP associa-tion and CNV screen have provided evidencethat variants in the IMMP2L/DOCK4 locus on chro-mosome 7 and in ZNF533 on chromosome 2 mayincrease susceptibility to ASDs. Association of thecommon allele of SNP rs2217262 in DOCK4 wassupported by an independent replication, whereasthe associations in IMMP2L and ZNF533 are notsufficiently significant in the context of multipletesting and warrant further studies.

Conflict of interest

The authors declare no conflict of interest.

Acknowledgments

We thank all the families who have participated in thestudy and the professionals who made this studypossible. We also thank John Broxholme for bioinfor-matics support, Joseph Trakalo and Chris Allan at theWTCHG core genomics facility for Illumina andSequenom genotyping, respectively. We especiallythank Professor Giovanni Romeo at the MedicalGenetics Unit, S Orsola-Malpighi Hospital, Universityof Bologna for his generous provision of laboratoryspace and equipment to EM, EB and CT. The CPEA(Collaborative Program of Excellence in Autism)thank Jeffery Munson, and Raphael Bernier andAnnette Estes. This work was funded by the NancyLurie Marks Family Foundation; the Simons Founda-tion; the EC Sixth FP AUTISM MOLGEN, Telethon-Italy; the Korczak Foundation for Autism and RelatedDisorders; the Netherlands Organization for ScientificResearch (NWO). The IMGSAC was funded by UKMedical Research Council, Wellcome Trust, BIOMED2 (CT-97-2759), EC Fifth Framework (QLG2-CT-1999-0094), Deutsche Forschungsgemeinschaft, FondationFrance Telecom, Conseil Regional Midi-Pyrenees,Danish Medical Research Council, Sofiefonden, Bea-

trice Surovell Haskells Fund for Child Mental HealthResearch of Copenhagen, Danish Natural ScienceResearch Council (9802210) and National Institutesof Health (U19 HD35482, MO1 RR06022, K05MH01196, K02 MH01389). AJ Bailey is the Cheryland Reece Scott Professor of Psychiatry. AP Monacois a Wellcome Trust principal research fellow.

References

1 Chakrabarti S, Fombonne E. Pervasive developmental disorders inpreschool children: confirmation of high prevalence. Am JPsychiatry 2005; 162: 1133–1141.

2 Fombonne E. Epidemiology of autistic disorder and otherpervasive developmental disorders. J Clin Psychiatry 2005;66(Suppl 10): 3–8.

3 Baird G, Simonoff E, Pickles A, Chandler S, Loucas T, Meldrum Det al. Prevalence of disorders of the autism spectrum in apopulation cohort of children in South Thames: the Special Needsand Autism Project (SNAP). Lancet 2006; 368: 210–215.

4 Bailey A, Le Couteur A, Gottesman I, Bolton P, Simonoff E, YuzdaE et al. Autism as a strongly genetic disorder: evidence from aBritish twin study. Psychol Med 1995; 25: 63–77.

5 Bolton P, Macdonald H, Pickles A, Rios P, Goode S, Crowson Met al. A case–control family history study of autism. J ChildPsychol Psychiatr 1994; 35: 877–900.

6 Vorstman JA, Staal WG, van Daalen E, van Engeland H,Hochstenbach PF, Franke L. Identification of novel autismcandidate regions through analysis of reported cytogeneticabnormalities associated with autism. Mol Psychiatry 2006; 11:1, 18–28.

7 Jamain S, Quach H, Betancur C, Rastam M, Colineaux C, GillbergIC et al. Mutations of the X-linked genes encoding neuroliginsNLGN3 and NLGN4 are associated with autism. Nat Genet 2003;34: 27–29.

8 Szatmari P, Paterson AD, Zwaigenbaum L, Roberts W, Brian J, LiuXQ et al. Mapping autism risk loci using genetic linkage andchromosomal rearrangements. Nat Genet 2007; 39: 319–328.

9 Durand CM, Betancur C, Boeckers TM, Bockmann J, Chaste P,Fauchereau F et al. Mutations in the gene encoding the synapticscaffolding protein SHANK3 are associated with autism spectrumdisorders. Nat Genet 2007; 39: 25–27.

10 Morrow EM, Yoo SY, Flavell SW, Kim TK, Lin Y, Hill RS et al.Identifying autism loci and genes by tracing recent sharedancestry. Science 2008; 321: 218–223.

11 Christian SL, Brune CW, Sudi J, Kumar RA, Liu S, Karamohamed Set al. Novel submicroscopic chromosomal abnormalitiesdetected in autism spectrum disorder. Biol Psychiatry 2008; 63:1111–1117.

12 Marshall CR, Noor A, Vincent JB, Lionel AC, Feuk L, Skaug J et al.Structural variation of chromosomes in autism spectrum disorder.Am J Hum Genet 2008; 82: 477–488.

13 Sebat J, Lakshmi B, Malhotra D, Troge J, Lese-Martin C, Walsh Tet al. Strong association of de novo copy number mutations withautism. Science 2007; 316: 445–449.

14 Ullmann R, Turner G, Kirchhoff M, Chen W, Tonge B, Rosenberg Cet al. Array CGH identifies reciprocal 16p13.1 duplications anddeletions that predispose to autism and/or mental retardation.Hum Mutat 2007; 28: 674–682.

15 Abrahams BS, Geschwind DH. Advances in autism genetics:on the threshold of a new neurobiology. Nat Rev Genet 2008; 9:341–355.

16 IMGSAC. A full genome screen for autism with evidence forlinkage to a region on chromosome 7q. Hum Molec Genet 1998; 7:571–578.

17 IMGSAC. A genomewide screen for autism: strong evidence forlinkage to chromosomes 2q, 7q, and 16p. Am J Hum Genet 2001;69: 570–581.

18 Lamb JA, Barnby G, Bonora E, Sykes N, Bacchelli E, Blasi F et al.Analysis of IMGSAC autism susceptibility loci: evidence for sex


966


limited and parent of origin specific effects. J Med Genet 2005; 42:132–137.

19 Schellenberg GD, Dawson G, Sung YJ, Estes A, Munson J,Rosenthal E et al. Evidence for multiple loci from a genome scanof autism kindreds. Mol Psychiatry 2006; 11: 1049–1060, 979.

20 Pericak-Vance MA, Wolpert CM, Menold MM, Bass MP, HauserER, Donnelly SL et al. Chromosome 7 and autistic disorder (AD).Am J Hum Genet 1998; 63: A16.

21 Trikalinos TA, Karvouni A, Zintzaras E, Ylisaukko-oja T, PeltonenL, Jarvela I et al. A heterogeneity-based genome search meta-analysis for autism-spectrum disorders. Mol Psychiatry 2006; 11:29–36.

22 Badner JA, Gershon ES. Regional meta-analysis of published datasupports linkage of autism with markers on chromosome 7. MolPsychiatry 2002; 7: 56–66.

23 Buxbaum JD, Silverman JM, Smith CJ, Kilifarski M, Reichert J,Hollander E et al. Evidence for a susceptibility gene for autism onchromosome 2 and for genetic heterogeneity. Am J Hum Genet2001; 68: 1514–1520.

24 Shao Y, Raiford KL, Wolpert CM, Cope HA, Ravan SA, Ashley-Koch AA et al. Phenotypic homogeneity provides increasedsupport for linkage on chromosome 2 in autistic disorder. Am JHum Genet 2002; 70: 1058–1061.

25 Bonora E, Bacchelli E, Levy ER, Blasi F, Marlow A, Monaco APet al. Mutation screening and imprinting analysis of fourcandidate genes for autism in the 7q32 region. Mol Psychiatry2002; 7: 289–301.

26 Bonora E, Beyer KS, Lamb JA, Parr JR, Klauck SM, Benner A et al.Analysis of reelin as a candidate gene for autism. Mol Psychiatry2003; 8: 885–892.

27 Bonora E, Lamb JA, Barnby G, Sykes N, Moberly T, Beyer KS et al.Mutation screening and association analysis of six candidategenes for autism on chromosome 7q. Eur J Hum Genet 2005; 13:198–207.

28 Bacchelli E, Blasi F, Biondolillo M, Lamb JA, Bonora E, Barnby Get al. Screening of nine candidate genes for autism on chromosome2q reveals rare nonsynonymous variants in the cAMP-GEFII gene.Mol Psychiatry 2003; 8: 916–924.

29 Blasi F, Bacchelli E, Carone S, Toma C, Monaco AP, Bailey AJ et al.SLC25A12 and CMYA3 gene variants are not associated withautism in the IMGSAC multiplex family sample. Eur J Hum Genet2006; 14: 123–126.

30 Ackerman H, Usen S, Jallow M, Sisay-Joof F, Pinder M,Kwiatkowski DP. A comparison of case–control and family-basedassociation methods: the example of sickle-cell and malaria. AnnHum Genet 2005; 69: 559–565.

31 Fingerlin TE, Boehnke M, Abecasis GR. Increasing the power andefficiency of disease-marker case–control association studiesthrough use of allele-sharing information. Am J Hum Genet2004; 74: 432–443.

32 Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlin—rapidanalysis of dense genetic maps using sparse gene flow trees. NatGenet 2002; 30: 97–101.

33 Ramoz N, Cai G, Reichert JG, Silverman JM, Buxbaum JD. Ananalysis of candidate autism loci on chromosome 2q24–q33:evidence for association to the STK39 gene. Am J Med Genet BNeuropsychiatr Genet 2008; 147B: 1152–1158.

34 http://www.hpacultures.org.uk/collections/ecacc.jsp.35 Barrett JC, Fry B, Maller J, Daly MJ. HaploView: analysis and

visualization of LD and haplotype maps. Bioinformatics 2005; 21:263–265.

36 Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M,Rosenbloom K et al. Evolutionarily conserved elements invertebrate, insect, worm, and yeast genomes. Genome Res 2005;15: 1034–1050.

37 Seldin MF, Shigeta R, Villoslada P, Selmi C, Tuomilehto J, Silva Get al. European population substructure: clustering of northernand southern populations. PLoS Genet 2006; 2: e143.

38 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender Det al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575.

39 de Bakker PI, Yelensky R, Pe’er I, Gabriel SB, Daly MJ, Altshuler D.Efficiency and power in genetic association studies. Nat Genet2005; 37: 1217–1223.

40 Morris AP. Direct analysis of unphased SNP genotype data inpopulation-based association studies via Bayesian partitionmodelling of haplotypes. Genet Epidemiol 2005; 29: 91–107.

41 Morris AP. A flexible Bayesian framework for modeling haplotypeassociation with disease, allowing for dominance effects of theunderlying causative variants. Am J Hum Genet 2006; 79: 679–694.

42 Dudbridge F. Likelihood-based association analysis for nuclearfamilies and unrelated subjects with missing genotype data. HumHered 2008; 66: 87–98.

43 Colella S, Yau C, Taylor JM, Mirza G, Butler H, Clouston P et al.QuantiSNP: an Objective Bayes Hidden-Markov Model to detectand accurately map copy number variation using SNP genotypingdata. Nucleic Acids Res 2007; 35: 2013–2025.

44 Saugier-Veber P, Goldenberg A, Drouin-Garraud V, de La Rocheb-rochard C, Layet V, Drouot N et al. Simple detection of genomicmicrodeletions and microduplications using QMPSF in patientswith idiopathic mental retardation. Eur J Hum Genet 2006; 14:1009–1017.

45 Pritchard JK, Stephens M, Donnelly P. Inference of populationstructure using multilocus genotype data. Genetics 2000; 155:945–959.

46 Falush D, Stephens M, Pritchard JK. Inference of populationstructure using multilocus genotype data: linked loci andcorrelated allele frequencies. Genetics 2003; 164: 1567–1587.

47 Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB,Cooper GM et al. Rare structural variants disrupt multiple genes inneurodevelopmental pathways in schizophrenia. Science 2008;320: 539–543.

48 Mencarelli MA, Caselli R, Pescucci C, Hayek G, Zappella M,Renieri A et al. Clinical and molecular characterization of apatient with a 2q31.2–32.3 deletion identified by array-CGH. Am JMed Genet A 2007; 143A: 858–865.

49 Monfort S, Rosello M, Orellana C, Oltra S, Blesa D, Kok K et al.Detection of known and novel genomic rearrangements by arraybased comparative genomic hybridisation: deletion of ZNF533 andduplication of CHARGE syndrome genes. J Med Genet 2008; 45:432–437.

50 Shoichet SA, Hoffmann K, Menzel C, Trautmann U, Moser B,Hoeltzenbein M et al. Mutations in the ZNF41 gene are associatedwith cognitive deficits: identification of a new candidate for X-linked mental retardation. Am J Hum Genet 2003; 73: 1341–1354.

51 Kleefstra T, Yntema HG, Oudakker AR, Banning MJ, KalscheuerVM, Chelly J et al. Zinc finger 81 (ZNF81) mutations associatedwith X-linked mental retardation. J Med Genet 2004; 41: 394–399.

52 Lugtenberg D, Yntema HG, Banning MJ, Oudakker AR, Firth HV,Willatt L et al. ZNF674: a new Kruppel-associated box-containingzinc-finger gene involved in nonsyndromic X-linked mentalretardation. Am J Hum Genet 2006; 78: 265–278.

53 O’Donovan MC, Craddock N, Norton N, Williams H, Peirce T,Moskvina V et al. Identification of loci associated with schizo-phrenia by genome-wide association and follow-up. NatGenet 2008.

54 Petek E, Windpassinger C, Vincent JB, Cheung J, Boright AP,Scherer SW et al. Disruption of a novel gene (IMMP2L) by abreakpoint in 7q31 associated with Tourette syndrome. Am J HumGenet 2001; 68: 848–858.

55 Battye R, Stevens A, Perry RL, Jacobs JR. Repellent signalingby Slit requires the leucine-rich repeats. J Neurosci 2001; 21:4290–4298.

56 Fukamachi K, Matsuoka Y, Ohno H, Hamaguchi T, Tsuda H.Neuronal leucine-rich repeat protein-3 amplifies MAPK activationby epidermal growth factor through a carboxyl-terminal regioncontaining endocytosis motifs. J Biol Chem 2002; 277: 43549–43552.

57 Yajnik V, Paulding C, Sordella R, McClatchey AI, Saito M, WahrerDC et al. DOCK4, a GTPase activator, is disrupted duringtumorigenesis. Cell 2003; 112: 673–684.

58 Ueda S, Fujimoto S, Hiramoto K, Negishi M, Katoh H. Dock4regulates dendritic development in hippocampal neurons.J Neurosci Res 2008; 86: 3052–3061.

59 Mitchell AA, Cutler DJ, Chakravarti A. Undetected genotypingerrors cause apparent overtransmission of common alleles inthe transmission/disequilibrium test. Am J Hum Genet 2003; 72:598–610.


967


http://www.hpacultures.org.uk/collections/ecacc.jsp

60 Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR,Margulies EH et al. Identification and analysis of functionalelements in 1% of the human genome by the ENCODE pilotproject. Nature 2007; 447: 799–816.

61 WTCCC. Genome-wide association study of 14 000 cases of sevencommon diseases and 3000 shared controls. Nature 2007; 447:661–678.

62 Liu XQ, Paterson AD, Szatmari P, Autism Genome ProjectConsortium. Genome-wide linkage analyses of quantitative andcategorical autism subphenotypes. Biol Psychiatry 2008; 64:561–570.

63 Bruder CE, Piotrowski A, Gijsbers AA, Andersson R, Erickson S,de Stahl TD et al. Phenotypically concordant and discordantmonozygotic twins display different DNA copy-number-variationprofiles. Am J Hum Genet 2008; 82: 763–771.

This work is licensed under the CreativeCommons Attribution-NonCommercial-

No Derivative Works 3.0 Unported License. To viewa copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/

Supplementary Information accompanies the paper on the Molecular Psychiatry website (http://www.nature.com/mp)


968


http://www.nature.com/mp

High-density SNP association study and copy number variation analysis of the AUTS1 and AUTS5 loci implicate the IMMP2L–DOCK4 gene region in autism susceptibility

Documents