Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis)

Finding the Molecular Basis Finding the Molecular Basis of Quantitative Genetic of Quantitative Genetic

VariationVariation

Richard MottRichard MottWellcome Trust Centre for Human Wellcome Trust Centre for Human

GeneticsGeneticsOxford UKOxford UK

Genetic Traits

• Quantitative (height, weight)

• Dichotomous (affected/unaffected)

• Factorial (blood group)

• Mendelian - controlled by single gene (cystic fibrosis)

• Complex – controlled by multiple genes*environment (diabetes, asthma)

Molecular Basis of Quantitative Traits

QTL: Quantitative Trait Locus

chromosome

genes

Molecular Basis ofQuantitative Traits


QTG: Quantitative Trait Gene

chromosome

Molecular Basis ofQuantitative Traits


QTG: Quantitative Trait Gene

QTN: Quantitative Trait Nucleotide

chromosome

SNP: Single Nucleotide Polymorphism

Association Studies

• Compare unrelated individuals from a population• Phenotypes:

– Cases vs Controls– Quantitative measure

• Genotypes: state of genome at multiple variable locations (Single Nucleotide Polymorphism = SNP) in each individual

• Seek correlation between genotype and phenotype

Problems with Association Studies

• Population stratification

• Linkage Disequilibrium

• Allele Frequencies

• Multiple loci

• Small Effect Sizes

• Very few Successes

Population Stratification

• If the sampling population comprises genetically distinct sub-populations with different disease prevalences

• Then -

• Any variant that distinguishes the sub-populations is likely to show disease association

Admixture Mapping

• Population is homogeneous but each individual’s genome is a mosaic of segments from different populations

• May be used to map disease loci– multiple sclerosis susceptibility– Reich et al 2005, Nature Genetics

Linkage Disequilibrium

Mouse

Effects of Linkage Disequilibrium

• Correlation between nearby SNPs

• SNPs near to QTN will show association– Risk of false positive interpretation– But need only genotype “tagging” SNPs– ~ 1 million tagging SNPs will be in LD with

~50% of common variants in the human genome

The Common-Disease Common-Variant Hypothesis

• Says – disease-predisposing variants will exist at relatively high

frequency (i.e. >1%) in the population.– are ancient alleles occurring on specific haplotypes. – detectable in an case-control study using tagging SNPs.

• Alternative hypothesis says– disease-predisposing alleles are sporadic new mutations,

perhaps around the same genes, on different haplotypes. – families with history of the same disease owe their condition to

different mutations events. – Theoretically detectable with family-based strategies which do

not assume a common origin for the disease alleles, but are harder to detect with case-control studies (Pritchard, 2001).

Power Depends on• Disease-predisposing allele’s

– Effect Size (Odds Ratio)– Allele frequency

• Sample Size: #cases, #controls

• Number of tagging SNPs

• To detect an allele with odds ratio of 1.25 and with allele frequency > 1%, at 5% Bonferroni genome-wide significance and 80% power, we require– ~ 6000 cases, 6000 controls– ~ 0.5 million tagging SNPs, one of which must be in perfect LD with the

causative variant– [Hirschorn and Daly 2005]

WTCCCWellcome Trust Case-Control Consortium

• 2000 cases from each of– Type I Diabetes– Type II Diabetes– rheumatoid arthritis, – susceptibility to TB – bipolar depression– …. and others …

• 3000 common controls• 0.675 million SNPs• ~10 billion genotypes• Data expected mid 2006

Mouse Models

Map inHuman or Animal Models ?

• Disease studied directly• Population and environment

stratification• Very many SNPs (1,000,000?)

required• Hard to detect trait loci – very

large sample sizes required to detect loci of small effect (5,000-10,000)

• Potentially very high mapping resolution – single gene

• Very Expensive

• Animal Model required• Population and environment

controlled• Fewer SNPs required (~100-

10,000)• Easy to detect QTL with ~500

animals

• Poorer mapping resolution – 1Mb (10 genes)

• Relatively inexpensive

QTL Mapping in Mice using Inbred Line Crosses

• Genetically Homozygous – genome is fixed, breed true.

• Standard Inbred Strains available • Haplotype diversity is controlled far more than in

human association studies

• QTL detection is very easy • QTL fine mapping is hard

Sizes of Mapped Behavioural QTL in rodents

(% of total phenotypic variance)

0

5

10

15

20

25

30

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Effect size (% var)

Nu

mb

er

Physiological QTL

0

5

10

15

20

25

30

35

40

45

50

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Effect size (% var)

Nu

mb

er

Effect sizes of cloned genes

0

1

2

3

4

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Effect size (% var)

Nu

mb

er

QTL detection: F2 Intercross

A B

X

QTL mapping: F2 Intercross

A B

X X

F1


A B

X X

F1 F2


+1 -1

F2

0 0 0 +2 -2

F1

QTL


+1 -1

F2

0 0 0 +2 -2

F1


F2

0 0 +2 -2

Genotype a skeleton of

markers across genome

20cM


F2

0 0 +2 -2

AB AA AB BA

AB BA AB BA

AB BA BA BA

BA BA BA AA

BB BB AB AA

BA BA BA AA


F2

0 0 +2 -2

AB AA AB BA

AB BA AB BA

AB BA BA BA

BA BA BA AA

BB BB AB AA

BA BA BA AA

Single Marker Association

• Test of association between genotype and trait at each marker position.

• ANOVA• F2 crosses are

– good for detecting QTL– bad for fine-mapping– typical mapping resolution 1/3 chromosome – 20-30

cM

Increasing mapping resolution

• Increase number of recombinants:– more animals– more generations in cross

Heterogeneous Stocks

• cross 8 inbred strains for >10 generations





0.25 cM

Mosaic Crossesfounders G3 GN F20

mixing chopping up inbreeding

F2, dialleleHS, AI, outbreds

RI (RIHS, CC)

chromosome

markers

• Want to predict ancestral strain from genotype• We know the alleles in the founder strains• Single marker association lacks power, can’t

distinguish all strains • Multipoint analysis – combine data from neighbouring

markers

alleles 1 1 2 1 1 2 1 11 2 2 1 2 2 1 1 1 1 2 1 1 2 111 11 2 12 2 1 21

Analysis of mosaic crosses

chromosome

markers

alleles 1 1 2 1 1 2 1 11 2 2 1 2 2 1 1 1 1 2 1 1 2 111 11 2 12 2 1 21

Analysis of mosaic crosses

•Hidden Markov model HAPPY•Hidden states = ancestral strains•Observed states = genotypes•Unknown phase of genotypes

- analyse both chromosomes simultaneously•Output is probability that a locus is descended from a pair of strains

•Mott et al 2000 PNAS

Testing for a QTL

• piL(s,t) = Prob( animal i is descended from strains s,t at locus L)

• piL(s,t) calculated using– genotype data – founder strains’ alleles

• Phenotype is modelledyi = s,t piL(s,t)T(s,t) + Covariatesi + ei

• Test for no QTL at locus L– H0: T(s,t) are all same– ANOVA– partial F test

Example: Open Field Avtivity

• Mouse Model for Anxiety

OFA Tracking

Talbot et al 1999, Mott et al 2000

multipoint

singlepoint

significancethreshold

Relation Between Marker and Genetic Effect

No effect

observableObservable

effect

QTLMarker 2 Marker 1

How Much Mapping Resolution do we need?

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

#Genes per Mb in mouse genome

Cu

mu

lati

ve P

rob

abil

ity

Mapping Resolution in Mouse QTL experiments

• F2– ~25-50 Mb [250-300 genes]

• HS– 1-5 Mb [10-50 genes]

• Need More Resolution

Other Outbred Populations

• Commercially available outbreds may contain more historical recombination

• Potentially finer mapping resolution

• How to exploit it ?

MF1 Outbred Mice MF1

Analysis of MF1

Single Marker Analysis

0

2

4

6

8

10

12

14

0 0.5 1 1.5 2 2.5 3 3.5

Unknown progenitors

• Sometime in the 1970’s….

LACA x CF

MF1

MF1 resemble HS

• Sequencing revealed very few new variants in MF1 compared to HS strains

• Variants present in HS strains also present in MF1

MF1 as a mosaic of inbred strains

Mapping with 30 generation HS

Mapping with MF1 mice

Yalcin et al 2004 Nature Genetics

Acknowledgements

• Jonathan Flint

• Binnaz Yalcin

• William Valdar

• Leah Solberg

Further Reading

• Mouse– Flint et al Nature Reviews Genetics 2005

• Human– Hirschhorn and Daly, Nature Reviews

Genetics 2005– Zondervan and Cardon, Nature Reviews

Genetics 2004

Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis)

Documents

disease association

asthma slide

phenotype slide

successes slide

disease alleles

human genome slide

quantitative trait locus

quantitative trait gene