Causes of regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK [email protected]
45
Embed
Causes of regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK [email protected].
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Causes of regulatory variation in the human genome Manolis
Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust
Genome Campus Cambridge, UK [email protected]
Slide 2
Human Genome: ~25,000 genes 1-1.5% of the human DNA is coding
Is the remaining 98.5% junk
Slide 3
Gene expression as a phenotype Altered patterns of gene
expression disease. e.g., Type 1 diabetes, Burkitts lymphomas.
Widespread intraspecific variation. Heritable genetic variation for
transcript levels. Familial aggregation of expression profiles
(Cheung et al. 2003). In humans, ~30% of surveyed loci exhibited a
genetic component for expression differences (Monks et al. 2004;
Schadt et al. 2003). Much of the influential variation is located
cis- to the coding locus. In humans, mouse, and maize, 35%- 50% of
the genetic basis for intraspecific differences in transcription
level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt
et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.).
Stranger and Dermitzakis 2006
Slide 4
Why study gene expression Describe and dissect regulatory
variation Annotate regulatory elements in the human genome Support
disease studies to interpret statistical signals Distribution of
molecular effects in the genome Natural selection
Slide 5
Outline Gene expression variation recent studies Analysis of
gene expression with HapMap phase II SNPs Update on CNV-expression
associations Natural selection and cis regulatory effects
Slide 6
DNA REG GENE REG GENE i) Pre-mRNA ii) mRNA iii) Protein iv) DNA
Nature of regulatory variation Stranger and Dermitzakis, Human
Genomics 2005 Expression
Slide 7
Effects of Copy Number Variation on gene expression
Slide 8
Gene expression association mapping Stranger et al. PLoS Genet
2005 AA AG GG
Slide 9
Whole-genome gene expression ~48,000 transcripts 24,000 RefSeq
24,000 other transcripts 270 HapMap individuals: CEU: 30 trios, 90
total CHB: 45 unrelated JPT: 45 unrelated YRI: 30 trios, 90 total 2
IVTs each person 2 replicate hybridizations each IVT Quantile
normalization of all replicates of each individual. Median
normalization across all individuals of a population. Cell line RNA
IVT1IVT2 rep1 rep2rep3 rep4 illumina Human 6 x 2 gene GEX
arrays
Copy Number Variation dataset Genome Structural Variation
Consortium Redon et al. Nature Nov 22, 2006 Array-CGH using a whole
genome tile path array Median clone size ~170 kb All 270 HapMap
individuals Quantitative values (log 2 ratios) representing diploid
genome copy number, not genotypes. 1117 CNVs called from log 2
ratios Calls based on standard deviation of log 2 ratios Many CNVs
experimentally verified 26,563 clones 93.7% euchromatic genome
Slide 12
Clone signal (log2 ratio) Linear regression for SNPs CNV and
expression
Slide 13
SNP cis-analysis: SNPs within 1Mb of probe midpoint 1Mb SNPs
gene probe 2Mb window
Slide 14
CNV cis-analysis: clone midpoint within 2Mb of probe midpoint
2Mb clones gene probe 4Mb window
Slide 15
Permutation g11g12g13g14g1n g21g22g23g24g2n g31g32g33g34g3n
gi1gi2gi3gi4gin Exp1 Exp2 Exp3 Expi permute GENOTYPESGENE
EXPRESSION - 10,000 permutations each time keep lowest p-value -
Null distribution of 10,000 extreme p-values - Compare observed
p-values to the tails of the null Doerge and Churchill 1996
Slide 16
Stranger et al. Science 2007 CNV vs. SNP associations
Slide 17
Slide 18
CNVs and SNPs mostly capture different effects Relative impact
on gene expression: 82% SNPs 18% CNVs Only 13% of genes with CNV
association also had a SNP association in the same population
biased toward large effect size. CNV and SNP variation are highly
correlated (p-value 0.001).
Slide 19
Custom vs. Genome-wide [Stranger et al. 2005 PLoS Genet and
Stranger et al. 2007 Science] 2 batches of 60 CEU individuals grown
independently at two different labs RNA extraction and labelling by
different labs and people Run in custom and gw illumina arrays 97%
of associations at the 0.05 permutation threshold from the custom
array analysis were also detected in gw analysis
Slide 20
HapMap phase II analysis ~ 4 million SNP genotypes made
publicly available for the 270 HapMap individuals. Density: 1 SNP/
700 bps Includes ~50% of expected common SNPs in these populations.
2.2 million SNPs analyzed (MAF>0.05)
Slide 21
phase I HapMap both phase II HapMap CEU286258299 CHB317269318
JPT337297341 YRI356310394 cis- significant genes (0.001) 90% 85%
87% 86% 85% 87% 79% Phase I vs. Phase II
Slide 22
Slide 23
Population sharing of cis- associations
Slide 24
Associated SNP position relative to TSS
Slide 25
Distribution of regulatory elements around the TSS ENCODE
Nature 2007
Slide 26
Direction of allelic effect same SNP-gene combination across
populations AGREEMENT OPPOSITE Population 1Population 2 log 2
expression
Conditional permutations Permute data within each pop
separately then perform test X 4
Slide 30
Multi-population analysis
Slide 31
Figure 2A Number of populations sharing association in cis:
single population analysis Proportion of single pop cis associated
genes detected in multi-population analysis
Slide 32
SGPP2
Slide 33
Trans- phase II HapMap association Biological hypotheses:
functional categories Regulatory SNPs identified from cis- analysis
(52%) Non-synonymous SNPs (39%) Splice site SNPs (7%) miRNA SNPs
(1%) DNA REG GENE rSNPsnsSNPs spliceSNPs miRNA SNPs Genome-wide
associations ~ 25,000 SNPs per population x 14,072 genes GENE
regulatory SNPs (cis 0.001)ns SNPssplice SNPsmiRNA SNPs
ratiop-valueratiop-valueRatio p- valueratio p- value
CEU6.053.23E-240.151.22E-210.490.0701
CHB3.697.90E-100.241.91E-090.760.7101
JPT3.152.06E-070.318.82E-070.710.5501 ! Enrichment of regulatory
SNPs and deficit of nsSNPs in trans- associations 3-6x more likely
that a cis regulatory effect explains a trans regulatory
effect
Slide 36
Multi-pop CNV analysis Combined 4 populations: 193 genes at
0.001 (48 overlap with the 99 from single population analysis)
Combined 3 populations: 173 genes at 0.001 (42 overlap with the 99
from single population analysis)
Slide 37
CNV trans effects Biological pathway Variable expression
Slide 38
Trans-position
Slide 39
Trans effects - CEU
Slide 40
Trans effects - YRI
Slide 41
Gene expression and natural selection TSS -logpval With Sridhar
Kudaravalli and Jonathan Pritchard (unpublished)
Slide 42
Gene expression and natural selection With Sridhar Kudaravalli
and Jonathan Pritchard (unpublished)
Slide 43
Co-segregating regulatory variants can drive differential
isoform expression
Slide 44
SUMMARY Cis- and trans- acting genetic variation influencing
mRNA levels. CNV effects detected are largely not captured by SNPs
Structural variation (copy number polymorphism) influences
transcript level variation. Many detected associations are shared
across human populations replication of effects Signal concentrated
within 100 Kb from the promoter symmetrically Trans-acting effects
of CNVs - interpretation Primary effects of trans associations are
largely cis regulatory effects Cis regulatory effects under
positive selection
Slide 45
Cambridge University Mark Dunning Natalie Thorne Simon Tavar
illumina Jill Orwick Mark Gibbs Acknowledgements Barbara Stranger
Alexandra Nica Antigone Dimas Christine Bird Matthew Forrest
Catherine Ingle Claude Beazley Panos Deloukas Matt Hurles Genome
Structural Variation Consortium Richard Redon, Nigel Carter,
Charles Lee, Chris Tyler-Smith, Stephen Scherer, The HapMap
Consortium Wellcome Trust for funding Stanford Daphne Koller