Genetics for Imagers: How Geneticists Model
Quantitative Phenotypes
Nelson Freimer
UCLA Center for Neurobehavioral Genetics
What makes a genetic association significant?
Outline
• The problem of achieving validated findings in psychiatric genetics
• Approaches to genetic mapping and statistical significance
- linkage analysis (+ examples)
- association analysis (+ examples
Psychiatric genetics: The brains of the family
10 July 2008 | Nature 454, 154-157 (2008)
Does the difficulty in finding the genes responsible for mental illness reflect the complexity of the genetics or the poor definitions of psychiatric disorders?
“The studies sofar are statisticallyunderpowered.We need biggerstudies.”— Jonathan Flint
“Geneticists knownothing about
psychiatric disease.”— Daniel Weinberger
WHAT IS THE PROBLEM?
• Psychiatric disorders are highly heritable• No psychiatric susceptibility genes known
• Studies so far are underpowered– Phenotypes are of uncertain validity– Samples are too small and markers too few – Signal to noise ratio is too low
(etiological heterogeneity: genetic and non-genetic)
“We are just too ignorantof the underlyingneurobiology to makeguesses about candidategenes.” —Steven Hyman
This is why geneticists have turned to genome wide
mapping
Genome-wide mapping and allelic architecture
Allelic architecture and genetic mapping approachesE
ffect
Siz
e
Disease Gene Allele FrequencyRare (<1%) Common (>5%)
Sm
all
Lar
ge
LINKAGE
ASSOCIATION
Family-based Case-control
OR
NOT FOUND TO DATE
COPY NUMBER VARIANTS
Disease Gene IBD Region
Shared IBD Region
Founder
Present-day affectedindividuals
IBD= Identical By Descent
The Principle of Genetic Linkage
If genes are located on different chromosomes theyshow independent assortment.
compute this probability.
However, genes on the same chromosome, especially ifthey are close to each other, tend to be passed ontotheir offspring in the same configuration as on theparental chromosomes.
Genetic markers: SNPs
Detecting Genetic Linkage: Linkage Analysis vs Association
Analysis• Linkage Analysis
– Using pedigree samples, search for regions of the genome where affected individuals share alleles more than you would expect
• Association Analysis– Compare allele frequency distributions in
cases and controls• For quantitative traits can apply similar
principles
G,T T,T
T,T G,T G,T T,T
G,T
G,T
G,T G,T G,T G,T
AssociationAnalysis
LinkageAnalysis
T,T T,T T,TT,T
When are two genetic loci significantly linked?
Stringent significance thresholds based on…• Low prior probability of linkage between
any two loci– Considered when there were few markers
• Multiple tests involved in genotyping studies– Considered after there were many markers
• Both considerations yielded ~ same threshold:
LOD score (log. base 10 of the likelihood ratio) >~ 3
(i.e. p < 10-4)
• Prior probability of linkage between a given locus and a random genome location: 0.02
• To obtain posterior probability of linkage of >0.95 (i.e. <0.05 false positive linkages), apply Bayes theorem:
• Solving for the likelihood ratio Pr(Data | Linkage) / Pr(Data | NoLinkage)…– ratio must be >1,000, i.e. LOD >3
Controlling for multiple testing in linkage
• With complete genome marker sets, prior probability that some marker linked is 1
• ~500 fully informative, independent markers cover linkage in all regions of the genome
• To control at 0.05 level, the global hypothesis of no linkage anywhere in the genome: 0.05/500 = 10-4 for each test, i.e. LOD >3
• Suggestive linkage: a lod score or p value expected to occur once by chance in a whole genome scan.
LOD >2.2, p < 7.4 x 10-4
• Significant linkage: a lod score or p value expected to occur by chance 0.05 times in a whole genome scan
LOD >3.6, p < 2.2 x 10-5
• Highly significant linkage: a lod score or p value expected to occur by chance 0.001 times in a whole genome scan.
LOD > 5.4, p < 3 x 10-7
• Confirmed linkage - a significant linkage observed in one study is confirmed by finding a lod score or p value expected to occur 0.01 times by chance in a specific search of the candidate region.
Significance thresholds for linkage Lander and Kruglyak, 1996
An example of linkage to a quantitative neurobehavioral
trait
Monoamine Neurotransmitters
Norepinephrine
and epinephrineAttention
Blood pressure
HistamineGastric acid release
Immune response
DopamineReward
SerotoninAppetite,Mood
Gastrointestinal motility
From David Krantz
Catecholamine Synthesis and Degradation
Genome wide linkage analysis of HVA in a vervet monkey
pedigree
Vervet research colony pedigree
MONOAMINE METABOLITES
0
0.2
0.4
0.6
0.8
5-HIAA HVA MHPG
PR
OP
VA
R
h2-GENETIC
c2-MATERNAL
Heritability of Monoamine Metabolites in vervet monkeys
HVA level in Vervets on Chromosome 10
Linkage analysis in extended pedigrees may be powerful for
structural MRI phenotypes
Brain MRIs in the VRC
357 Vervets scanned
Mobile Siemens Symphony1.5 Tesla scanner
Genetic association analysis
Linkage analysis is not very powerful for mapping complex
traits
(with many alleles of small effect)
Disease gene discovery methodsE
ffect
Siz
e
Disease Gene Allele FrequencyRare (<1%) Common (>5%)
Sm
all
Lar
ge
LINKAGE
ASSOCIATION
Family-based Case-control
OR
NOT FOUND TO DATE
COPY NUMBER VARIANTS
G,T T,T
T,T G,T G,T T,T
G,T
G,T
G,T G,T G,T G,T
AssociationAnalysis
LinkageAnalysis
T,T T,T T,TT,T
Significance thresholds for association
Consider simple Bayesian argument: - Prior probability that a random gene
associated with trait: ~1/30,000, assuming 30,000 genes/genome
- Likelihood ratio should be > 550,000 for association to be significant (posterior probability >0.95)- With χ2 test, p< 2.6 x 10-7
A more complete evaluation of significance
Posterior odds = Prior odds x Power(for true association) Significance
• Strength of evidence depends on likely number of true associations and power to detect them
• These depend on effect sizes and sample sizes• Less well-powered studies need more stringent
thresholds to control false-positive rate
See Wacholder et al., J. National Cancer Institute 2004
Genome wide association thresholds
• Controlling for multiple testing E.g. Bonferroni: 0.05 x No. of SNPs x No. of traits
E. g. For single trait with 106 SNPs, p < 5 x10-8
• However, more complicated…– SNPs are not all independent (LD) – LD varies across genome and populations– traits are not all independent
• False discovery rate (FDR) increasingly used
(proportion of false positives among all positives)
…if 1 out of 20 hits are false not so bad
Evaluating association in neurobehavioral genetics
studies
Monoamine Neurotransmitters
Norepinephrine
and epinephrineAttention
Blood pressure
HistamineGastric acid release
Immune response
DopamineReward
SerotoninAppetite,Mood
Gastrointestinal motility
From David Krantz
Serotonin Transporter Promoter Polymorphism Association Studies
as of 2002
Phenotype P<.05 P>.05 Phenotype P<.05 P>.05
Schizo. 2 7 BP/mood disorder
8 13
OCD 2 2 Personality traits
12 10
Drug response
3 0 Suicide 4 1
Anorexia 0 2 Late Onset Alzheimer’s
2 2
Smoking related
4 1 Alcohol related 5 2
Autism 2 2 Fibromyalgia 1 0
Panic disorder
0 3
Association of Anxiety-Related Traits with Polymorphism in the Serotonin Transporter Gene
Regulatory Region Lesch et al. Science. 1996;274(5292):1527-31.
• Two samples (N = 221, N = 284)
• Association with P ~ 0.02
A more complete evaluation of significance
Posterior odds = Prior odds x Power(for true association) Significance
• Strength of evidence depends on likely number of true associations and power to detect them
• These depend on effect sizes and sample sizes• Less well-powered studies need more stringent
thresholds to control false-positive rate
See Wacholder et al., J. National Cancer Institute 2004
In large samples: No association of 5HTTLPR with temperament
Example from Northern Finland Birth Cohort, N ~ 4000
Influence of Life Stress on Depression: Moderation by a Polymorphism in the 5-HTT
Gene
Caspi et al.
Science 301: 386 – 389 2003
Interaction Between the Serotonin Transporter Gene (5-HTTLPR),
Stressful Life Events, and Risk of Depression: A Meta-analysis
Risch et al.
JAMA. 2009;301(23):2462-2471.
Copyright restrictions may apply.
Logistic Regression Analyses of Risk of Depression for 14 Studies
Genomewide association analysis
51
Progress in identifying gene variants for common traitsProgress in identifying gene variants for common traits
CholesterolObesityMyocardial infarctionQT intervalAtrial FibrilliationType 2 Diabetes Prostate cancerBreast cancerColon cancerheight
KCNJ11
2003
2000
PPAR
2001
IBD5NOD2
2005
2006
2002
CTLA4
2004
PTPN22
Age Related Macular DegenerationCrohns DiseaseType 1 DiabetesSystemic Lupus ErythematosusAsthmaRestless leg syndromeGallstone diseaseMultiple sclerosisRheumatoid arthritisGlaucoma
2007
CD25IRF5PCSK9CFH
NOS1APIFIH1
PCSK9CFB/C2
LOC387715
8q24IL23R
TCF7L2
CDKN2B/A
8q24 #28q24 #38q24 #48q24 #58q24 #6ATG16L1
5p1310q21IRGM
NKX2-3IL12B3p211q24
PTPN2TCF2
CDKN2B/A
IGF2BP2CDKAL1
HHEXSLC30A8
MEIS1LBXCOR
1BTBD9
C38q24
ORMDL3
4q25TCF2GCKRFTO
C12orf30
ERBB3KIAA03
50CD22616p13PTPN2SH2B3FGFR2TNRC9
MAP3K1LSP18q24
HMGA2GDF5-UQCCHMPGJAZF1CDC123ADAMTS9THADAWSF1LOXL1IL7RTRAF1/C5STAT4ABCG8GALNT2PSRC1NCANTBL2TRIB1KCTD10ANGLPT3GRIN3A
Slide from David Altshuler
HDL Association at 16q22.1
HDL Association near LIPC
55
Progress in identifying gene variants for common traitsProgress in identifying gene variants for common traits
CholesterolObesityMyocardial infarctionQT intervalAtrial FibrilliationType 2 Diabetes Prostate cancerBreast cancerColon cancerheight
KCNJ11
2003
2000
PPAR
2001
IBD5NOD2
2005
2006
2002
CTLA4
2004
PTPN22
Age Related Macular DegenerationCrohns DiseaseType 1 DiabetesSystemic Lupus ErythematosusAsthmaRestless leg syndromeGallstone diseaseMultiple sclerosisRheumatoid arthritisGlaucoma
2007
CD25IRF5PCSK9CFH
NOS1APIFIH1
PCSK9CFB/C2
LOC387715
8q24IL23R
TCF7L2
CDKN2B/A
8q24 #28q24 #38q24 #48q24 #58q24 #6ATG16L1
5p1310q21IRGM
NKX2-3IL12B3p211q24
PTPN2TCF2
CDKN2B/A
IGF2BP2CDKAL1
HHEXSLC30A8
MEIS1LBXCOR
1BTBD9
C38q24
ORMDL3
4q25TCF2GCKRFTO
C12orf30
ERBB3KIAA03
50CD22616p13PTPN2SH2B3FGFR2TNRC9
MAP3K1LSP18q24
HMGA2GDF5-UQCCHMPGJAZF1CDC123ADAMTS9THADAWSF1LOXL1IL7RTRAF1/C5STAT4ABCG8GALNT2PSRC1NCANTBL2TRIB1KCTD10ANGLPT3GRIN3A
Slide from David Altshuler
A success story in neuropsychiatry
8
6
4
2
-log
10
(P v
alu
e)
Chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
HLA
17 18 19 20 21 22
Genome Wide association in narcolepsyin Japan (222 cases vs 389 controls)
From Emmanuel Mignot
J. Hallmayer et al.
Nature Genetics 41, 708 - 711 (2009)
Narcolepsy is strongly associated with the T-cell receptor alpha locus
~2000 cases in GWAS + ~2000 cases in replication
Analysis of rs1154155 Genotypes in Three Replication Cohorts and Combined
Ethnicity AA Case/Ctrl
AC Case/Ctrl
CC Case/Ctrl
ORAC ORCC ORC
African Americans
90/117 23/20 0/1 1.50 (0.74,3.04)
0.00 (0.00,22.90)
1.31 (0.68,2.52)
Asians 86/161 296/318 167/120 1.74
(1.27,2.39) 2.61
(1.81,3.76) 1.54
(1.30,1.83) Caucasians 201/259 132/83 10/6 2.05
(1.45,2.89) 2.15
(0.70,6.77) 1.80
(1.35,2.41) Replication (MH)
1.83 (1.48,2.27)
2.50 (1.80,3.48)
1.59 (1.38,1.83)
* All Samples (MH)
1.94 (1.68,2.25)
2.55 (1.92,3.38)
1.69 (1.52,1.88)
**
* 2 = 42.9, P=5.9x10-11** 2 = 94.2, P=2.8x10-22
Strong genome-wide evidence
Known genes and environment explain little of trait variance
Sequencing: the currently unexplored middle of the allelic spectrum
Whole genome sequencing is coming soon…
But we don’t have very good models for it yet
Summary• The allelic spectrum of complex traits
determines the appropriate genetic mapping approach
• Genetic linkage and association studies require stringent statistical thresholds
• Single candidate gene studies have very low probability of being true positives
• Genome-wide linkage and association studies are beginning to bear fruit for neurobehavioral traits
• Whole-genome sequencing is just around the corner