Inference of cis and trans regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus.

Post on 17-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Inference of cis and trans regulatory variation in the human genome

Manolis DermitzakisThe Wellcome Trust Sanger Institute

Wellcome Trust Genome CampusCambridge, UK

md4@sanger.ac.uk

Gene expression• Altered patterns of gene expression disease.

– e.g., Type 1 diabetes, Burkitt’s lymphomas.

• Widespread intraspecific variation.

• Heritable genetic variation for transcript levels.– Familial aggregation of expression profiles (Cheung et al. 2003).– In humans, ~30% of surveyed loci exhibited a genetic component

for expression differences (Monks et al. 2004; Schadt et al. 2003).

• Much of the influential variation is located cis- to the coding locus.– In humans, mouse, and maize, 35%-50% of the genetic basis for

intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.).

DNA

REG GENE

REG GENE

i) Pre-mRNA

ii) mRNA

iii) Protein

iv) DNA

Nature of regulatory variation

Stranger and Dermitzakis, Human Genomics 2005

Expression

REG GENE

REG GENE

Additional gene copy

REG GENE

REG GENE

REG GENE

Increase of distance from regulatory element

REG GENE

REG GENE

New regulatory element

REG GENE

REG GENE

Gene interruption

Effects of Copy Number Variation on gene expression

Gene expression association mapping

Stranger et al. PLoS Genet 2005

Expression Levels

Frequency

7.56.04.53.01.50.0-1.5

100

80

60

40

20

0

AA

AG

GG

Phenotypic variation space

illumina Human 6 x 2 gene GEX arrays

Beads in Wells

Whole-genome gene expression

~48,000 transcripts24,000 RefSeq24,000 other transcripts

270 HapMap individuals:CEU: 30 trios, 90 totalCHB: 45 unrelatedJPT: 45 unrelatedYRI: 30 trios, 90 total

2 IVTs each person2 replicate hybridizations each IVT

Quantile normalization of all replicates of each individual.Median normalization across all individuals of a population.

Cell line

RNA

IVT1 IVT2

rep1 rep2 rep3 rep4

10 2 10 3 10 4 10 5 10 6

Signal 1400410090_A

test : 1400410090_A vs. 1400410090_C

102

103

104

105

Sig

nal 1

4004

1009

0_C

2 YRI individuals

r2 (all genes) = 0.964

Detected genes (0.98 in both samples: 11,529)r2 (detected) = 0.964

10 2 10 3 10 4 10 5 10 6 10 7

Signal 1400410090_A

trial : 1400410090_A vs. 1400410096_A

102

103

104

105

Sig

nal 1

4004

1009

6_A

2 replicates; single YRI individual

r2 (all genes) = 0.990

Detected genes (0.98 in both samples: 12,076)r2 (detected) = 0.994

Within- and between- individual variation

HapMap SNPs

60 CEU45 CHB44 JPT60 YRI

Phase I HapMap; MAF > 0.05

CEU: 762,447 SNPs CHB: 695,601JPT: 689,295YRI: 799,242

~1/5kb

14,925 genes

Copy Number Variation dataset

• Genome Structural Variation Consortium– Redon et al. Nature in press

• Array-CGH using a whole genome

tile path array – Median clone size ~170 kb– All 270 HapMap individuals

• Quantitative values (log2 ratios) representing

diploid genome copy number, not genotypes.

• 1117 CNVs called from log2 ratios

– Calls based on standard deviation of log2 ratios

– Many CNVs experimentally verified

26,563 clones93.7% euchromatic genome

SNP cis-analysis: SNPs within 1Mb of probe midpoint

1Mb 1Mb

SNPsgeneprobe

1Mb window

Association analysis

Additive association model:Linear regression e.g. CC = 0, CT = 1, TT = 2.

CC CT TT

8.0

8.5

9.0

9.5

Genotype

Exp

ress

ion

leve

l

0 1 2

- slope of line- p-value- r2

CNV cis-analysis: clone midpoint within 2Mb of probe midpoint

2Mb 2Mb

clones

geneprobe

1Mb window

Clone signal (log2 ratio)

Linear regression for CNV and expression

Multiple-test correction

whole-genome

cis-

whole-genome

cis-

whole-genome

cis-

Bonferroni

False Discovery RateFDR

permutations

1.

2.

3.

Permutation design

g11 g12 g13 g14 … g1ng21 g22 g23 g24 … g2ng31 g32 g33 g34 … g3n………gi1 gi2 gi3 gi4 … gin

Exp1Exp2Exp3………Expi

permute

GENOTYPES GENE EXPRESSION

- 10,000 permutations – each time keep lowest p-value- Null distribution of 10,000 extreme p-values- Compare observed p-values to the tails of the null

Significant expression – cis-SNP associations

• CEU genes 323

• CHB genes 348

• JPT genes 370

• YRI genes 411

• 888 non-redundant genes

• 67 genes in all 4 populations (8%)

• 333 genes in at least 2 populations (37%)

permutation threshold 0.001; SNP-probe distance < 1Mb

~ 6% genes exhibit significant cis- association

Significant expression – cis-CNV clones associations

• CEU genes40

• CHB genes 32

• JPT genes 40

• YRI genes 42

• 99 non-redundant genes

• 7 genes associated in all 4 populations (7%)

• 34 genes in at least 2 populations (34%)

permutation threshold 0.001; clone-probe distance < 2Mb

Table 2: Population overlap of SNP-associated genes, clone-associated genes, and genes with both SNP and clone associations.

CNV SNPCEU-CHB-JPT-YRI 7 67

CEU-CHB-JPT 4 48CEU-CHB-YRI 0 11CEU-JPT-YRI 0 12CHB-JPT-YRI 3 28

CEU-CHB 3 18CEU-JPT 0 15CEU-YRI 6 36CHB-JPT 5 51CHB-YRI 3 18JPT-YRI 3 27

CEU only 20 116CHB only 7 107JPT only 18 122YRI only 20 212

SUM 99 888

gene associations in at least 2 populations 34 331percentage of total 0.34 0.37

gene associations in single populations 65 557percentage of total 0.66 0.63

Note: clones in CNVs with freq > 1permutation threshold 10-3

Some genes

• UGT2B7, 11, 17

• GSTM1

ABC1, ABHD6, ACY1L2, ADAT1, ARNT, ARSA, ASAHL, ATP13A, B7, BBS2, BLK, C14orf130, C14orf4, C14orf52, C1orf16, C20orf22, C21orf107, C7orf13, C7orf29, C7orf31, C8orf13, C9orf95, CARD8, CAT, CD151, CD79B, CDKN1A, CDKN2B, CGI-111, CGI-62, CGI-96, CHCHD2, CHI3L2, CHRNE, CNN2, CP110, CPEB4, CPNE1, CRIPT, CSTB, CTNS, CTSH, CTSK, DCLRE1B, DCTD, DERP6, dJ383J4.3, DKFZp434N035, DKFZP566H073, DKFZP566J2046, DKFZP586D0919, DKFZp761A132, DNAJD1, DOM3Z, DPYSL4, DSCR5, DTNB, ECHDC3, EGFL5, EIF2B2, ENTPD1, ERMAP, FCGR2A, FDX1, FKBP1A, FLJ10252, FLJ10904, FLJ12994, FLJ12998, FLJ13576, FLJ14009, FLJ14753, FLJ20444, FLJ20635, FLJ21347, FLJ21616, FLJ22374, FLJ22573, FLJ22635, FLJ23235, FLJ34443, FLJ35827, FLJ36888, FLJ37970, FLJ40432, FLJ46603, FLJ90036, FUT10, GAA, GSTM1, GSTM2, GSTT1, H17, HABP4, HIBCH, HLA-C, HLA-DQA1, HLA-DQA2, hmm1412, hmm23621, hmm26268, hmm31752, hmm31999, hmm3577, hmm3587, hmm5445, hmm665, hmm8232, HNLF, Hs.119946, Hs.124623, Hs.135624, Hs.153573, Hs.158943, Hs.164463, Hs.169006, Hs.171169, Hs.212658, Hs.245997, Hs.26039, Hs.264076, Hs.311977, Hs.333841, Hs.379903, Hs.396207, Hs.400876, Hs.40696, Hs.431200, Hs.43687, Hs.453941, Hs.460359, Hs.465789, Hs.466924, Hs.467281, Hs.482037, Hs.485895, Hs.490095, Hs.495422, Hs.506072, Hs.517172, Hs.519979, Hs.5855, Hs.6637, HSRTSBETA, IFIT5, IL16, IL21R, IMAGE3451454, IMMT, IPP, IREB2, IRF5, KIAA0265, KIAA0483, KIAA0643, KIAA0748, KIAA1463, KIAA1627, LCMT1, LOC113386, LOC132001, LOC132321, LOC135043, LOC151963, LOC282956, LOC283710, LOC283970, LOC284184, LOC284293, LOC285407, LOC286353, LOC339231, LOC339803, LOC339804, LOC340435, LOC347981, LOC348094, LOC348180, LOC374758, LOC375097, LOC375399, LOC378075, LOC388918, LOC389362, LOC389763, LOC399987, LOC400410, LOC400566, LOC400642, LOC400684, LOC400933, LOC401075, LOC401135, LOC401284, LOC51240, LOC90637, LOC90693, MAN1A2, MCMDC1, MGC10120, MGC12458, MGC13186, MGC19764, MGC20235, MGC20481, MGC20781, MGC22773, MGC24665, MGC2752, MGC3794, MGC9084, MMRP19, MRPL21, MRPL43, MTERF, MYOM2, NDUFA10, NDUFS5, NMNAT3, NUDT2, OAS1, PACSIN2, PASK, PBX4, PCTAIRE2BP, PEX5, PEX6, PGS1, PHACS, PHC2, PHEMX, PIP5K1C, PIP5K2A, PKHD1L1, POLR2J, PP3856, PP784, PPA2, PPFIA1, PPIL3, PTER, QRSL1, R29124_1, RABEP1, RAPGEFL1, RDH5, RPAP1, RPL13, RPL36AL, RPL8, RPLP2, RPS16, RPS6KB2, SARS2, SERPINB10, SF1, SH3GLB2, SHMT1, SIAT4C, SIVA, SKIV2L, SNAP29, SNX11, SOD2, SPG7, SQSTM1, ST7L, STAT6, STK25, SYNGR1, SYNGR3, TAP2, TAPBP-R, TBC1D4, TCL6, TEF, TGM5, THAP5, THAP6, THOC3, TIMM10, TINP1, TMEM8, TMPIT, TRAPPC4, TRIM4, TSGA10, TSGA2, TUBB, UBE2G1, UGT2B11, UGT2B17, UGT2B7, UROS, USMG5, VPS28, WARS2, WBSCR27, WWOX, XRRA1, ZNF266, ZNF384, ZNF493, ZNF587, ZNF79, ZNF85, ZRANB1,

Genomic location of associations

SNP CNV

Distance

-log10(p

valu

e)

10000007500005000002500000

40

30

20

10

0

10000007500005000002500000

40

30

20

10

0

CEU CHB

JPT YRI

distance

-log10(p

valu

e)

2000000150000010000005000000

20

15

10

5

2000000150000010000005000000

20

15

10

5

CEU CHB

J PT YRI

pos/neg-+

Adjusted_R^2

Frequency

1.00.90.80.70.60.50.40.30.20.1

12

9

6

3

0

1.00.90.80.70.60.50.40.30.20.1

12

9

6

3

0

CEU CHB

JPT YRI

Adjusted_R^2

Frequency

1.00.90.80.70.60.50.40.30.20.10.0

40

30

20

10

0

1.00.90.80.70.60.50.40.30.20.10.0

40

30

20

10

0

CEU CHB

JPT YRI

C. D.

E. F.

SNPs CNVs

REG GENE

REG GENE

Additional gene copy

REG GENE

REG GENE

REG GENE

Increase of distance from regulatory element

REG GENE

REG GENE

New regulatory element

REG GENE

REG GENE

Gene interruption

Effects of Copy Number Variation on gene expression

POSITIVE POSITIVE OR NEGATIVE

POSITIVE OR NEGATIVE NEGATIVE

Negative or positive slope in CNV associations

80% positive 20% negative

What is the overlap between SNP and CNV effects?

Do SNPs capture the CNV effects through Linkage Disequilibrium?

A

A

A

G

G

G

Gene X Gene X

Gene X

LD between CNV and SNP

2x expression

1x expression

CNVs and SNPs mostly capture different effects

• Relative impact on gene expression: 82% SNPs18%

CNVs• Only 13% of genes with CNV association also had a SNP

association in the same population

– biased toward large effect size.– CNV and SNP variation are highly correlated (p-value 0.001).

• Lack of overlapping effects is not due to CNVs in regions of segmental duplications (few HapMap SNPs).

– Percentage of associated clones overlapping SDs does not differ from all clones overlapping SDs (p-value: 0.016).

– Also, the probability that a CNV signal is captured by SNPs does not depend on whether the CNV is in a SD (17.3%) or outside of SDs (15.9%).

Overlap of genes across populations as detected using phaseI and phaseII HapMap (10-3 threshold)

SNP phaseII SNP phaseINumber of genes percent total Number of genes percent total

CEU-CHB-JPT-YRI 71 0.077 67 0.075CEU-CHB-JPT 43 0.047 48 0.054CEU-CHB-YRI 15 0.016 11 0.012CEU-JPT-YRI 10 0.011 12 0.014CHB-JPT-YRI 33 0.036 28 0.032

CEU-CHB 23 0.025 18 0.020CEU-JPT 14 0.015 15 0.017CEU-YRI 43 0.047 36 0.041CHB-JPT 47 0.051 51 0.057CHB-YRI 24 0.026 18 0.020JPT-YRI 30 0.033 27 0.030

CEU only 116 0.126 116 0.131CHB only 97 0.106 107 0.120JPT only 123 0.134 122 0.137YRI only 228 0.249 212 0.239

SUM (Non-redundant genes) 917 888

gene associations in at least 2 populations 353 331percentage of total 0.38 0.37

gene associations in single populations 564 557percentage of total 0.62 0.63

Note: 770 genes overlap between Non-redundant associated genes using phaseI and Non-redundant associated genes using phaseII.

Phase II HapMap (2.2m SNPs)

Direction of allelic effect

Expression Levels

Frequency

7.56.04.53.01.50.0-1.5

100

80

60

40

20

0

AA

AG

GG

Expression Levels

Frequency

7.56.04.53.01.50.0-1.5

100

80

60

40

20

0

AA

AG

GGAGREEMENT

Expression Levels

Frequency

7.56.04.53.01.50.0-1.5

100

80

60

40

20

0

AA

AG

GG

Expression Levels

Frequency

7.56.04.53.01.50.0-1.5

100

80

60

40

20

0

GG

AG

AAOPPOSITE

POP1 POP2

Direction of allelic effects

95% have the same direction

Trans effects

DNA

REG GENE

rSNPs nsSNPs spliceSNPs mirnaSNPs

Genome-wide associations

Dissect regulatory networks

Trans analysis Nb of genes with signif associationspv 10-4 pv 10-3 pv 10-2 pv 0.05

CEU 16 45 251 1107

YRI 9 23 164 743

CHB 17 38 216 900

JPT 16 40 200 876

Trans analysis Nb of significant SNP-gene associationspv 10-4 pv 10-3 pv 10-2 pv 0.05

CEU 93 193 660 2726

YRI 16 41 253 1130

CHB 76 118 400 1777

JPT 43 104 461 1913

Statistics of RS categ used as input for Trans analysisPercentages

cis 10-3 ns splice miRNACEU 53.07 39.76 7.05 0.12YRI 51.27 41.12 7.47 0.13CHB 54.42 38.88 6.57 0.13JPT 54.53 38.76 6.59 0.12

Statistics of sgnif RS categ for thresh 10-2Percentages

cis 10-3 ns splice miRNACEU 80.84 16.02 3.14 0YRI 54.69 39.84 5.47 0CHB 72.39 22.64 4.98 0JPT 73 22.89 3.89 0.22

Statistics of sgnif RS categ for thresh 10-3Percentages

cis 10-3 ns splice miRNACEU 87.24 9.18 3.57 0YRI 58.54 39.02 2.44 0CHB 81.51 13.45 5.04 0JPT 79.05 16.19 4.76 0

Regulatory variants have the highest impact on regulatory networks

Conclusions- Large number of genes with significant expression variation within and between human population samples and strong association between individual genes and specific SNPs and CNVs.

-Little overlap between SNP and CNV signals

- Replication of significant signals across populations.

- Promising approach for identification of functionally variable regulatory regions.

- Cis regulatory variation mostly responsible for genome-wide regulatory variation

Pre-publication data releasewww.sanger.ac.uk/genevar/

Cambridge University

Mark DunningSimon Tavaré

Cornell University

Andy Clark

illuminaJill Orwick

Mark Gibbs

Acknowledgements

Barbara StrangerMatthew ForrestCatherine IngleAntigone DimasChristine BirdAlexandra NicaClaude BeazleyPanos Deloukas

Genome Structural Variation ConsortiumMatt Hurles, Richard Redon, Nigel Carter, Charles Lee, Chris Tyler-Smith, Stephen Scherer,

The HapMap Consortium

Wellcome Trust for funding

Full information and application details at:www.wellcome.ac.uk/advancedcourses

Wellcome Trust Advanced Courses

Working with the HapMap

2-5 April 2007Closing date for applications: 10 January 2007

Wellcome Trust Genome Campus, Hinxton, Cambridge

This 4-day residential workshop will provide a comprehensive overview of the International HapMap Project, including practical experience of working with the HapMap data to map phenotypic traits to locations in the human genome. Theoretical lectures will be combined with hands-on practical sessions and introduction to relevant databases and tools.

Course instructors: Paul de Bakker (MIT), Manolis Dermitzakis (Sanger Institute), Mike Feolo (NIH/NCBI), Jonathan Marchini (Oxford University), Gil McVean (Oxford University), Steve Sherry (NIH/NCBI), Albert Vernon Smith (CSHL), Barbara Stranger (Sanger Institute), Eleftheria Zeggini (Wellcome Trust Center for Human Genetics)

Speakers: Lon Cardon (Wellcome Trust Center for Human Genetics), Panos Deloukas (Sanger Institute), John Todd (Cambridge University)

top related