Identifying genetic variants for complex disorders Peristera Paschou, PhD Associate Professor of Population Genetics Dept. of Molecular Biology and Genetics Democritus University of Thrace Marianthi Georgitsi, PhD Assistant Professor of Medical Biology-Genetics Dept. of Medicine Aristotle University of Thessaloniki
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Identifying genetic variants for complex disorders
Peristera Paschou, PhD
Associate Professor of Population Genetics
Dept. of Molecular Biology and Genetics
Democritus University of Thrace
Marianthi Georgitsi, PhD
Assistant Professor of Medical Biology-Genetics
Dept. of Medicine
Aristotle University of Thessaloniki
The average genome (2x3 billion bp) contains:
– 3-4 million single nucleotide variations, compared to the reference sequence (Single Nucleotide Polymorphisms – SNPs)
– ~0.5 million small insertions or deletions ‘indels’ (1-100bp)
– ~5,000 larger insertions or deletions (>100bp)
We are quite similar, but we are different…
Variation across all (~23,000) genes - the ‘exome’~18,000 variants
-Large portion of the genome function is not known.
Claude Maunet 1891. Haystack at the Sunset near Giverny
Looking for genes in 23 pairs of chromosomes…
SNPs can be used to create dense marker maps
microsatellites SNPs
Recent genome-wide association studies use millions of SNPs.
Clinical Phenotype
Indications for a genetic basis?
Study design
families Sib pairs Single patients
Sample and data collection
Analysis
Linkage analysis Association studies
Define candidate regions
Physical mapping/ gene identification
Positional cloning…
Linkage analysis
Large families Sib-pair studies
1989 – The cystic fibrosis gene is identifiedFirst successful positional cloning study
Linkage analysis
• What is the probability that the disease-causing mutation and the polymorphism we are studying are linked (at a specific genetic distance) vs non-linked?
gametes
Recombination and linkage•Two loci are linked when little or no recombination occurs among them.
•Recombination: Exchange of DNA segments among homologous regions during meiosis and gamete formation.
Recombination and linkage
• Two loci at different chromosomes cannot be linked.
• Linkage increases as physical distance decreases.
• 1cM distance among two genetic loci, means that 1% of produced gametes will be recombined.
Logarithm of the odds LOD score
Logarithm of the probability of linkage Ratio of the probability that the two loci are linked
vs the probability that they are not linked. likelihood of obtaining the test data if the two loci
are indeed linked, to the likelihood of observing the same data purely by chance.
LOD = 3.0 = odds of 1000/1 for linkage Corresponds to error of 5%
Linkage analysis Great power for gene identification for mendelian
disorders
Limited success when used for multifactorial disorders
1,2 1,2
1,1 1,1 2,2 1,1
1,2 1,2
2,2 2,2 1,1 2,2
1 1
1 2
1 1
1 1
2 2
1 1 2 2
2 2
1 2
1 1 2 22 2 1 2
2 22 2
2 21 2 1 2 1 2 1 1
1 2 1 2 1 2 2 2
trios
Population studies
Association studies
Genetic Association Studies
Aim: To unravel associations between genetic data (ie alleles or genotypes) of
commonly occurring genetic variants with information regarding a trait or a
medical phenotype (ie disease) under study, using statistical analyses and a large
enough sample size (typically cases versus controls), in order to support the
statistics that these variants contribute to trait/disease risk.
Examples of complex
diseases
Type II Diabetes Mellitus
Obesity
Cardiovascular diseases
Cancer (non-hereditary)
Osteoarthritis
Autoimmune disorders
Alzheimer’s disease
….
Schizophrenia
Autism
Bipolar Disorder
Obsessive Compulsive Disorder
Learning disabilities (Dyslexia)
….13/04/2016
Linkage disequilibrium (LD)
The non random association of alleles at different loci
Essential tool for genetic association studies
Linkage DisequilibriumExample
SNP1: A 50% C 50%
SNP2: A 50% G 50%
SNP1 SNP2 expected frequency of haplotypes
A A 0,5 x 0,5
A G 0,5 x 0,5
C A 0,5 x 0,5
C G 0,5 x 0,5
If total LD exists – only 2 haplotypes will be observed (eg)
A G
C A
LD around a mutationgenerations
01 1 1 1 1 1 1 1 1 1 1
1 1
k2 2 2 1 1 1 1 1 1 2 2
1 1
2 2 1 1 1 1 1 1 2 2 2
1 1
2 2 2 2 1 1 1 1 1 2 2 2
1 1
.
.
.
11 1 1 1 1 1 1 1 1 2 2
1 1
2 2 1 1 1 1 1 1 1 1 1
1 1
2 2 2 1 1 1 1 1 1 1 1 1
1 1
g2 2 2 2 2 1 1 2 2 2 2 2
1 1
2 2 2 2 2 1 2 2 2 2 2
1
.
.
.2 2 2 2 2 2 1 1 2 2 2 2
1 1
Ancestral chromosome
Ex
tan
t ch
rom
oso
me
s
Ancestral DNA sequence
Novel DNA sequence due to recombination
mutation
Jobling, Hurles & Tyler-Smith. Human Evolutionary Genetics
mutation
Ancestral chromosome
Ex
tan
t ch
rom
oso
me
s
Ancestral DNA sequence
Novel DNA sequence due to recombination
polymorphism mutation
Ancestral chromosome
Ex
tan
t ch
rom
oso
me
s
Ancestral DNA sequence
Novel DNA sequence due to recombination
mutation
Ancestral chromosome
Ex
tan
t ch
rom
oso
me
s
Ancestral DNA sequence
Novel DNA sequence due to recombination
SNPs
Genome-wide Association Studies (GWAS)
Ideally…
Identify all SNPs(eg 15.000.000)
Collect a very large sample(eg 1,000 patients and 1,000 control individuals)
Genotype all individuals for all SNPs
30 billion genotypes!
Cost in 2002: 50 cents/genotype.
$15 billion for each disorder!
Genome structure allows the selection of tagging SNPs
Hirschhorn & Daly, Nat Rev Genet 2005
Candidate gene or GWAS signal
300.000 SNPs suffice for a rough scan of the human genome
Paschou et al. Genome Research 2007De Bakker et al. Nature Genetics 2005
mutationmutation
SNP
13/04/2016 30
Genetic Association Studies versus Linkage studies
Appropriate for complex phenotypes
Increased genetic (locus) heterogeneity (ie
multigenic variance) – many genes, many variants
Common variance (“common disease – common
variants” concept) – modest disease risk per
variant
Inadequate power to detect associations
Large numbers of cases and controls (healthy
individuals), or family-based associations (trios, ie
affected child and both parents) or extreme
phenotypes
Mostly SNPs (taggingSNPs): A single or a few
SNPs within a chromosomal region that
capture(s) (ie “tags”) most of the common DNA
variation in this particular region, owing to the
effect of Linkage Disequilibrium (LD).
Associated SNPs most often not coding (they can
be in LD with the causal variant or have a
regulatory effect)
Appropriate for Mendelian traits
Reduced genetic heterogeneity (one or a few
genes)
Typically rare variants (“rare disease – rare
variants” concept) in all affected individuals
Large, multigenerational pedigrees –
detection power (parametric, non-parametric
linkage analyses)
Typically microsatellite markers, SNPs, or
long stretches of chromosomal
homozygosity (for recessive traits only)
Note: LD is defined as the phenomenon of co-
inheritance (non-random association) of studied
genetic marker (SNPs) alleles, unlikely to be
separated by homologous recombination (“linked”
markers) within a population.
Candidate genes
1$ per SNP
1 -100 SNPs
2007
Technology reduces the cost
Genome genotyping
(GWAS)
250$
1 millionSNPs
Candidate genes
1$ per SNP
1 -100 SNPs
2007
Technology reduces the cost
Genome-wide association studies (GWAS)
Patients Controls
Genotypingeg 600,000 SNPs
Comparison of alleles – Statistical
analysis
Thousands of samples – The more
the better !
September 2012: 1.416 studies7.688 associated SNPs