BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27 - DGM 328 CH-1005 Lausanne Switzerland work: ++41-21-692-5452 cell: ++41-78-663-4980 http://serverdgm.unil.ch/bergmann
50
Embed
BSc Course: "Experimental design“ Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue de Bugnon 27.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BSc Course:
"Experimental design“
Genome-wide Association Studies
Sven BergmannDepartment of Medical Genetics
University of LausanneRue de Bugnon 27 - DGM 328
CH-1005 Lausanne Switzerland
work: ++41-21-692-5452cell: ++41-78-663-4980
http://serverdgm.unil.ch/bergmann
Overview• Population stratification
• Associations: Basics
• Whole genome associations
• Genotype imputation
• Uncertain genotypes
• New Methods
Overview• Population stratification
• Associations: Basics
• Whole genome associations
• Genotype imputation
• Uncertain genotypes
• New Methods
6’18
9 in
divi
dual
s
Phenotypes
159 measurement
144 questions
Genotypes
500.000 SNPs
CoLaus = Cohort Lausanne
Collaboration with:Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…
ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…
Genetic variation in SNPs (Single Nucleotide Polymorphisms)
Analysis of Genotypes only
Principle Component Analysis reveals SNP-vectors explaining largest variation in the data
Example: 2PCs for 3d-data
http://ordination.okstate.edu/PCA.htm
Raw data points: {a, …, z}
Example: 2PCs for 3d-data
http://ordination.okstate.edu/PCA.htm
Normalized data points: zero mean (& unit std)!
Example: 2PCs for 3d-data
http://ordination.okstate.edu/PCA.htm
Identification of axes with the most variance
Most variance is along PCA1
The direction of most variance
perpendicular to PCA1 defines
PCA2
Ethnic groups cluster according to geographic distances
PC1 PC1
PC
2P
C2
PCA of POPRES cohort
Overview• Population stratification
• Associations: Basics
• Whole genome associations
• Genotype imputation
• Uncertain genotypes
• New Methods
Phenotypic variation:
0
0.2
0.4
0.6
0.8
1
1.2
-6 -4 -2 0 2 4 6
What is association?chromosomeSNPs trait variant
Genetic variation yields phenotypic variation
Population with ‘ ’ allele Population with ‘ ’ allele
Distributions of “trait”
Quantifying Significance
T-test
t-value (significance) can be translated into p-value (probability)
Association using regression
genotype Coded genotype
phen
otyp
e
Regression analysis
X
Y
“response”
“feature(s)”
“intercept”
“coefficients”
“residuals”
Regression formalism
(monotonic)transformation
phenotype(response variable)of individual i
effect size(regression coefficient)
coded genotype(feature) of individual i
p(β=0)error(residual)
Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)
Standard approach: Evaluate significance for association of each SNP independently:
sig
nif
ican
ce
Whole Genome Associationsi
gn
ific
ance
Manhattan plot
ob
serv
edsi
gn
ific
ance
Expected significance
Quantile-quantile plot
Chromosome & position
GWA screens include large number of statistical tests!• Huge burden of correcting for multiple testing!• Can detect only highly significant associations (p < α / #(tests) ~ 10-7)
GWAS: >20 publications in 2006/2007
Massive!
Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the
calcium-sensing receptor (CASR) gene
Karen Kapur, Toby Johnson, Noam D. Beckmann, Joban Sehmi, Toshiko Tanaka, Zoltán Kutalik, Unnur Styrkarsdottir, Weihua Zhang, Diana Marek, Daniel F. Gudbjartsson, Yuri Milaneschi, Hilma Holm, Angelo DiIorio, Dawn Waterworth, Andrew Singleton, Unnur Steina Bjornsdottir, Gunnar Sigurdsson, Dena Hernandez, Ranil DeSilva, Paul Elliott, Gudmundur Eyjolfsson, Jack M Guralnik, James Scott, Unnur Thorsteinsdotti, Stefania Bandinelli, John Chambers, Kari Stefansson, Gérard Waeber, Luigi Ferrucci, Jaspal S Kooner, Vincent Mooser, Peter Vollenweider, Jacques S. Beckmann, Murielle Bochud, Sven Bergmann
Current insights from GWAS:
• Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations
• Many of these associations have been replicated in independent studies
Current insights from GWAS:
• Each locus explains but a tiny (<1%) fraction of the phenotypic variance
• All significant loci together explain only a small (<10%) of the variance
David Goldstein:
“~93,000 SNPs would be required to explain 80% of the population variation in height.”
Common Genetic Variation and Human Traits, NEJM 360;17
The “Missing variance” (Non-)Problem
Why should a simplistic (additive) model using incomplete or approximate features possibly explain anything close to the genetic variance of a complex trait?
… and it doesn’t have to as long as Genome-wide Association Studies are meant to as an undirected approach to elucidate new candidate loci that impact the trait!
1. Other variants like Copy Number Variations or epigenetics may play an important role
2. Interactions between genetic variants (GxG) or with the environment (GxE)
3. Many causal variants may be rare and/or poorly tagged by the measured SNPs
4. Many causal variants may have very small effect sizes
5. Overestimation of heritabilities from twin-studies?
So what do we miss?
Overview• Population stratification
• Associations: Basics
• Whole genome associations
• Genotype imputation
• Uncertain genotypes
• New Methods
Intensity of Allele G
Inte
nsi
ty o
f A
llele
A
Genotypes are called with varying uncertainty
Some Genotypes are missing at all …
… but are imputed with different uncertainties
… using Linkage Disequilibrium!
Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles.
Marker 1 2 3 n
LD
D
Conclusion
• Genotypic markers are always measured or inferred with some degree of uncertainty
• Association methods should take into account this uncertainty
Two easy ways dealing with uncertain genotypes
1. Genotype Calling: Choose the most likely genotype and continue as if it is true(p11=10%, p12=20% p22=70% => G=2)
2. Mean genotype: Use the weighted average genotype(p11=10%, p12=20% p22=70% => G=1.6)
Overview• Associations: Basics
• Whole genome associations
• Population stratification
• Genotype imputation
• Uncertain genotypes
• New Methods
1. Improve measurements:- measure more variants (e.g. by UHS)- measure other variants (e.g. CNVs)- measure “molecular phenotypes”
2. Improve models:- proper integration of uncertainties- include interactions- multi-layer models
How could our models become more predictive?
Towards a layered Systems Model
We need intermediate (molecular) phenotypes to better understand organismal phenotypes