Inference of cis and trans regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus.
Post on 17-Dec-2015
217 Views
Preview:
Transcript
Inference of cis and trans regulatory variation in the human genome
Manolis DermitzakisThe Wellcome Trust Sanger Institute
Wellcome Trust Genome CampusCambridge, UK
md4@sanger.ac.uk
Gene expression• Altered patterns of gene expression disease.
– e.g., Type 1 diabetes, Burkitt’s lymphomas.
• Widespread intraspecific variation.
• Heritable genetic variation for transcript levels.– Familial aggregation of expression profiles (Cheung et al. 2003).– In humans, ~30% of surveyed loci exhibited a genetic component
for expression differences (Monks et al. 2004; Schadt et al. 2003).
• Much of the influential variation is located cis- to the coding locus.– In humans, mouse, and maize, 35%-50% of the genetic basis for
intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.).
DNA
REG GENE
REG GENE
i) Pre-mRNA
ii) mRNA
iii) Protein
iv) DNA
Nature of regulatory variation
Stranger and Dermitzakis, Human Genomics 2005
Expression
REG GENE
REG GENE
Additional gene copy
REG GENE
REG GENE
REG GENE
Increase of distance from regulatory element
REG GENE
REG GENE
New regulatory element
REG GENE
REG GENE
Gene interruption
Effects of Copy Number Variation on gene expression
Gene expression association mapping
Stranger et al. PLoS Genet 2005
Expression Levels
Frequency
7.56.04.53.01.50.0-1.5
100
80
60
40
20
0
AA
AG
GG
Whole-genome gene expression
~48,000 transcripts24,000 RefSeq24,000 other transcripts
270 HapMap individuals:CEU: 30 trios, 90 totalCHB: 45 unrelatedJPT: 45 unrelatedYRI: 30 trios, 90 total
2 IVTs each person2 replicate hybridizations each IVT
Quantile normalization of all replicates of each individual.Median normalization across all individuals of a population.
Cell line
RNA
IVT1 IVT2
rep1 rep2 rep3 rep4
10 2 10 3 10 4 10 5 10 6
Signal 1400410090_A
test : 1400410090_A vs. 1400410090_C
102
103
104
105
Sig
nal 1
4004
1009
0_C
2 YRI individuals
r2 (all genes) = 0.964
Detected genes (0.98 in both samples: 11,529)r2 (detected) = 0.964
10 2 10 3 10 4 10 5 10 6 10 7
Signal 1400410090_A
trial : 1400410090_A vs. 1400410096_A
102
103
104
105
Sig
nal 1
4004
1009
6_A
2 replicates; single YRI individual
r2 (all genes) = 0.990
Detected genes (0.98 in both samples: 12,076)r2 (detected) = 0.994
Within- and between- individual variation
HapMap SNPs
60 CEU45 CHB44 JPT60 YRI
Phase I HapMap; MAF > 0.05
CEU: 762,447 SNPs CHB: 695,601JPT: 689,295YRI: 799,242
~1/5kb
14,925 genes
Copy Number Variation dataset
• Genome Structural Variation Consortium– Redon et al. Nature in press
• Array-CGH using a whole genome
tile path array – Median clone size ~170 kb– All 270 HapMap individuals
• Quantitative values (log2 ratios) representing
diploid genome copy number, not genotypes.
• 1117 CNVs called from log2 ratios
– Calls based on standard deviation of log2 ratios
– Many CNVs experimentally verified
26,563 clones93.7% euchromatic genome
Association analysis
Additive association model:Linear regression e.g. CC = 0, CT = 1, TT = 2.
CC CT TT
8.0
8.5
9.0
9.5
Genotype
Exp
ress
ion
leve
l
0 1 2
- slope of line- p-value- r2
Multiple-test correction
whole-genome
cis-
whole-genome
cis-
whole-genome
cis-
Bonferroni
False Discovery RateFDR
permutations
1.
2.
3.
Permutation design
g11 g12 g13 g14 … g1ng21 g22 g23 g24 … g2ng31 g32 g33 g34 … g3n………gi1 gi2 gi3 gi4 … gin
Exp1Exp2Exp3………Expi
permute
GENOTYPES GENE EXPRESSION
- 10,000 permutations – each time keep lowest p-value- Null distribution of 10,000 extreme p-values- Compare observed p-values to the tails of the null
Significant expression – cis-SNP associations
• CEU genes 323
• CHB genes 348
• JPT genes 370
• YRI genes 411
• 888 non-redundant genes
• 67 genes in all 4 populations (8%)
• 333 genes in at least 2 populations (37%)
permutation threshold 0.001; SNP-probe distance < 1Mb
~ 6% genes exhibit significant cis- association
Significant expression – cis-CNV clones associations
• CEU genes40
• CHB genes 32
• JPT genes 40
• YRI genes 42
• 99 non-redundant genes
• 7 genes associated in all 4 populations (7%)
• 34 genes in at least 2 populations (34%)
permutation threshold 0.001; clone-probe distance < 2Mb
Table 2: Population overlap of SNP-associated genes, clone-associated genes, and genes with both SNP and clone associations.
CNV SNPCEU-CHB-JPT-YRI 7 67
CEU-CHB-JPT 4 48CEU-CHB-YRI 0 11CEU-JPT-YRI 0 12CHB-JPT-YRI 3 28
CEU-CHB 3 18CEU-JPT 0 15CEU-YRI 6 36CHB-JPT 5 51CHB-YRI 3 18JPT-YRI 3 27
CEU only 20 116CHB only 7 107JPT only 18 122YRI only 20 212
SUM 99 888
gene associations in at least 2 populations 34 331percentage of total 0.34 0.37
gene associations in single populations 65 557percentage of total 0.66 0.63
Note: clones in CNVs with freq > 1permutation threshold 10-3
Some genes
• UGT2B7, 11, 17
• GSTM1
ABC1, ABHD6, ACY1L2, ADAT1, ARNT, ARSA, ASAHL, ATP13A, B7, BBS2, BLK, C14orf130, C14orf4, C14orf52, C1orf16, C20orf22, C21orf107, C7orf13, C7orf29, C7orf31, C8orf13, C9orf95, CARD8, CAT, CD151, CD79B, CDKN1A, CDKN2B, CGI-111, CGI-62, CGI-96, CHCHD2, CHI3L2, CHRNE, CNN2, CP110, CPEB4, CPNE1, CRIPT, CSTB, CTNS, CTSH, CTSK, DCLRE1B, DCTD, DERP6, dJ383J4.3, DKFZp434N035, DKFZP566H073, DKFZP566J2046, DKFZP586D0919, DKFZp761A132, DNAJD1, DOM3Z, DPYSL4, DSCR5, DTNB, ECHDC3, EGFL5, EIF2B2, ENTPD1, ERMAP, FCGR2A, FDX1, FKBP1A, FLJ10252, FLJ10904, FLJ12994, FLJ12998, FLJ13576, FLJ14009, FLJ14753, FLJ20444, FLJ20635, FLJ21347, FLJ21616, FLJ22374, FLJ22573, FLJ22635, FLJ23235, FLJ34443, FLJ35827, FLJ36888, FLJ37970, FLJ40432, FLJ46603, FLJ90036, FUT10, GAA, GSTM1, GSTM2, GSTT1, H17, HABP4, HIBCH, HLA-C, HLA-DQA1, HLA-DQA2, hmm1412, hmm23621, hmm26268, hmm31752, hmm31999, hmm3577, hmm3587, hmm5445, hmm665, hmm8232, HNLF, Hs.119946, Hs.124623, Hs.135624, Hs.153573, Hs.158943, Hs.164463, Hs.169006, Hs.171169, Hs.212658, Hs.245997, Hs.26039, Hs.264076, Hs.311977, Hs.333841, Hs.379903, Hs.396207, Hs.400876, Hs.40696, Hs.431200, Hs.43687, Hs.453941, Hs.460359, Hs.465789, Hs.466924, Hs.467281, Hs.482037, Hs.485895, Hs.490095, Hs.495422, Hs.506072, Hs.517172, Hs.519979, Hs.5855, Hs.6637, HSRTSBETA, IFIT5, IL16, IL21R, IMAGE3451454, IMMT, IPP, IREB2, IRF5, KIAA0265, KIAA0483, KIAA0643, KIAA0748, KIAA1463, KIAA1627, LCMT1, LOC113386, LOC132001, LOC132321, LOC135043, LOC151963, LOC282956, LOC283710, LOC283970, LOC284184, LOC284293, LOC285407, LOC286353, LOC339231, LOC339803, LOC339804, LOC340435, LOC347981, LOC348094, LOC348180, LOC374758, LOC375097, LOC375399, LOC378075, LOC388918, LOC389362, LOC389763, LOC399987, LOC400410, LOC400566, LOC400642, LOC400684, LOC400933, LOC401075, LOC401135, LOC401284, LOC51240, LOC90637, LOC90693, MAN1A2, MCMDC1, MGC10120, MGC12458, MGC13186, MGC19764, MGC20235, MGC20481, MGC20781, MGC22773, MGC24665, MGC2752, MGC3794, MGC9084, MMRP19, MRPL21, MRPL43, MTERF, MYOM2, NDUFA10, NDUFS5, NMNAT3, NUDT2, OAS1, PACSIN2, PASK, PBX4, PCTAIRE2BP, PEX5, PEX6, PGS1, PHACS, PHC2, PHEMX, PIP5K1C, PIP5K2A, PKHD1L1, POLR2J, PP3856, PP784, PPA2, PPFIA1, PPIL3, PTER, QRSL1, R29124_1, RABEP1, RAPGEFL1, RDH5, RPAP1, RPL13, RPL36AL, RPL8, RPLP2, RPS16, RPS6KB2, SARS2, SERPINB10, SF1, SH3GLB2, SHMT1, SIAT4C, SIVA, SKIV2L, SNAP29, SNX11, SOD2, SPG7, SQSTM1, ST7L, STAT6, STK25, SYNGR1, SYNGR3, TAP2, TAPBP-R, TBC1D4, TCL6, TEF, TGM5, THAP5, THAP6, THOC3, TIMM10, TINP1, TMEM8, TMPIT, TRAPPC4, TRIM4, TSGA10, TSGA2, TUBB, UBE2G1, UGT2B11, UGT2B17, UGT2B7, UROS, USMG5, VPS28, WARS2, WBSCR27, WWOX, XRRA1, ZNF266, ZNF384, ZNF493, ZNF587, ZNF79, ZNF85, ZRANB1,
Distance
-log10(p
valu
e)
10000007500005000002500000
40
30
20
10
0
10000007500005000002500000
40
30
20
10
0
CEU CHB
JPT YRI
distance
-log10(p
valu
e)
2000000150000010000005000000
20
15
10
5
2000000150000010000005000000
20
15
10
5
CEU CHB
J PT YRI
pos/neg-+
Adjusted_R^2
Frequency
1.00.90.80.70.60.50.40.30.20.1
12
9
6
3
0
1.00.90.80.70.60.50.40.30.20.1
12
9
6
3
0
CEU CHB
JPT YRI
Adjusted_R^2
Frequency
1.00.90.80.70.60.50.40.30.20.10.0
40
30
20
10
0
1.00.90.80.70.60.50.40.30.20.10.0
40
30
20
10
0
CEU CHB
JPT YRI
C. D.
E. F.
SNPs CNVs
REG GENE
REG GENE
Additional gene copy
REG GENE
REG GENE
REG GENE
Increase of distance from regulatory element
REG GENE
REG GENE
New regulatory element
REG GENE
REG GENE
Gene interruption
Effects of Copy Number Variation on gene expression
POSITIVE POSITIVE OR NEGATIVE
POSITIVE OR NEGATIVE NEGATIVE
What is the overlap between SNP and CNV effects?
Do SNPs capture the CNV effects through Linkage Disequilibrium?
CNVs and SNPs mostly capture different effects
• Relative impact on gene expression: 82% SNPs18%
CNVs• Only 13% of genes with CNV association also had a SNP
association in the same population
– biased toward large effect size.– CNV and SNP variation are highly correlated (p-value 0.001).
• Lack of overlapping effects is not due to CNVs in regions of segmental duplications (few HapMap SNPs).
– Percentage of associated clones overlapping SDs does not differ from all clones overlapping SDs (p-value: 0.016).
– Also, the probability that a CNV signal is captured by SNPs does not depend on whether the CNV is in a SD (17.3%) or outside of SDs (15.9%).
Overlap of genes across populations as detected using phaseI and phaseII HapMap (10-3 threshold)
SNP phaseII SNP phaseINumber of genes percent total Number of genes percent total
CEU-CHB-JPT-YRI 71 0.077 67 0.075CEU-CHB-JPT 43 0.047 48 0.054CEU-CHB-YRI 15 0.016 11 0.012CEU-JPT-YRI 10 0.011 12 0.014CHB-JPT-YRI 33 0.036 28 0.032
CEU-CHB 23 0.025 18 0.020CEU-JPT 14 0.015 15 0.017CEU-YRI 43 0.047 36 0.041CHB-JPT 47 0.051 51 0.057CHB-YRI 24 0.026 18 0.020JPT-YRI 30 0.033 27 0.030
CEU only 116 0.126 116 0.131CHB only 97 0.106 107 0.120JPT only 123 0.134 122 0.137YRI only 228 0.249 212 0.239
SUM (Non-redundant genes) 917 888
gene associations in at least 2 populations 353 331percentage of total 0.38 0.37
gene associations in single populations 564 557percentage of total 0.62 0.63
Note: 770 genes overlap between Non-redundant associated genes using phaseI and Non-redundant associated genes using phaseII.
Phase II HapMap (2.2m SNPs)
Direction of allelic effect
Expression Levels
Frequency
7.56.04.53.01.50.0-1.5
100
80
60
40
20
0
AA
AG
GG
Expression Levels
Frequency
7.56.04.53.01.50.0-1.5
100
80
60
40
20
0
AA
AG
GGAGREEMENT
Expression Levels
Frequency
7.56.04.53.01.50.0-1.5
100
80
60
40
20
0
AA
AG
GG
Expression Levels
Frequency
7.56.04.53.01.50.0-1.5
100
80
60
40
20
0
GG
AG
AAOPPOSITE
POP1 POP2
Trans effects
DNA
REG GENE
rSNPs nsSNPs spliceSNPs mirnaSNPs
Genome-wide associations
Dissect regulatory networks
Trans analysis Nb of genes with signif associationspv 10-4 pv 10-3 pv 10-2 pv 0.05
CEU 16 45 251 1107
YRI 9 23 164 743
CHB 17 38 216 900
JPT 16 40 200 876
Trans analysis Nb of significant SNP-gene associationspv 10-4 pv 10-3 pv 10-2 pv 0.05
CEU 93 193 660 2726
YRI 16 41 253 1130
CHB 76 118 400 1777
JPT 43 104 461 1913
Statistics of RS categ used as input for Trans analysisPercentages
cis 10-3 ns splice miRNACEU 53.07 39.76 7.05 0.12YRI 51.27 41.12 7.47 0.13CHB 54.42 38.88 6.57 0.13JPT 54.53 38.76 6.59 0.12
Statistics of sgnif RS categ for thresh 10-2Percentages
cis 10-3 ns splice miRNACEU 80.84 16.02 3.14 0YRI 54.69 39.84 5.47 0CHB 72.39 22.64 4.98 0JPT 73 22.89 3.89 0.22
Statistics of sgnif RS categ for thresh 10-3Percentages
cis 10-3 ns splice miRNACEU 87.24 9.18 3.57 0YRI 58.54 39.02 2.44 0CHB 81.51 13.45 5.04 0JPT 79.05 16.19 4.76 0
Regulatory variants have the highest impact on regulatory networks
Conclusions- Large number of genes with significant expression variation within and between human population samples and strong association between individual genes and specific SNPs and CNVs.
-Little overlap between SNP and CNV signals
- Replication of significant signals across populations.
- Promising approach for identification of functionally variable regulatory regions.
- Cis regulatory variation mostly responsible for genome-wide regulatory variation
Cambridge University
Mark DunningSimon Tavaré
Cornell University
Andy Clark
illuminaJill Orwick
Mark Gibbs
Acknowledgements
Barbara StrangerMatthew ForrestCatherine IngleAntigone DimasChristine BirdAlexandra NicaClaude BeazleyPanos Deloukas
Genome Structural Variation ConsortiumMatt Hurles, Richard Redon, Nigel Carter, Charles Lee, Chris Tyler-Smith, Stephen Scherer,
The HapMap Consortium
Wellcome Trust for funding
Full information and application details at:www.wellcome.ac.uk/advancedcourses
Wellcome Trust Advanced Courses
Working with the HapMap
2-5 April 2007Closing date for applications: 10 January 2007
Wellcome Trust Genome Campus, Hinxton, Cambridge
This 4-day residential workshop will provide a comprehensive overview of the International HapMap Project, including practical experience of working with the HapMap data to map phenotypic traits to locations in the human genome. Theoretical lectures will be combined with hands-on practical sessions and introduction to relevant databases and tools.
Course instructors: Paul de Bakker (MIT), Manolis Dermitzakis (Sanger Institute), Mike Feolo (NIH/NCBI), Jonathan Marchini (Oxford University), Gil McVean (Oxford University), Steve Sherry (NIH/NCBI), Albert Vernon Smith (CSHL), Barbara Stranger (Sanger Institute), Eleftheria Zeggini (Wellcome Trust Center for Human Genetics)
Speakers: Lon Cardon (Wellcome Trust Center for Human Genetics), Panos Deloukas (Sanger Institute), John Todd (Cambridge University)
top related