Top Banner
The tangled genome Gil McVean
52
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 2: The tangled genome Gil McVean. The real heroes.
Page 3: The tangled genome Gil McVean. The real heroes.

The real heroes

Page 4: The tangled genome Gil McVean. The real heroes.

PanMap – Genome sequencing of 10 Western Chimpanzees

• Patterns of small insertion and deletion are quite different and reveal details of DNA repair pathways

• Patterns of recombination in humans and chimpanzees are highly diverged at the fine-scale, but largely conserved at broad scales

• There are a surprising number (6+ now ‘confirmed)’) of trans-specific polymorphisms, probably maintained through host-pathogen interactions

Page 5: The tangled genome Gil McVean. The real heroes.

A tangle of sequence

Page 6: The tangled genome Gil McVean. The real heroes.
Page 7: The tangled genome Gil McVean. The real heroes.

Difficulties of working with an incomplete reference

Page 8: The tangled genome Gil McVean. The real heroes.

Using de novo assembly to find variants

Page 9: The tangled genome Gil McVean. The real heroes.

Entire populationEntire population

Page 10: The tangled genome Gil McVean. The real heroes.

Sample 1

Page 11: The tangled genome Gil McVean. The real heroes.

Sample 2

Page 12: The tangled genome Gil McVean. The real heroes.

Chromosome 1

Page 13: The tangled genome Gil McVean. The real heroes.

Using Cortex leads to a high quality set of variants

Page 14: The tangled genome Gil McVean. The real heroes.

Diversity in Western Chimpanzees

• Similar diversity as humans of European origin (0.06%-0.08%)• Excess of common variants• 1% variants shared with humans

Page 15: The tangled genome Gil McVean. The real heroes.

Non-slippage indels are strongly biased to deletions

13:1 bias toward deletions.Unexpected peak at 4bp

Page 16: The tangled genome Gil McVean. The real heroes.

Indels as indicators of DNA repair processes

Insertions deletions

5 10 2015 25

5

10

20

15

25

5

10

20

15

25

5 10 2015 25Indel size Indel size

Longest word agreement

Page 17: The tangled genome Gil McVean. The real heroes.

TGACGAACTTATACTGCTTGAATA

TGACGAAC

ATTGAATA

TGAC--ATACTGAATATGACTTAT

Losing GAAC

Page 18: The tangled genome Gil McVean. The real heroes.

A tangle of trees

Page 19: The tangled genome Gil McVean. The real heroes.

Myers et al. 2005

Page 20: The tangled genome Gil McVean. The real heroes.

The zinc-finger protein PRDM9 determines hotspot location

Myers et al. 2010

Page 21: The tangled genome Gil McVean. The real heroes.

PRDM9 Zinc fingers are radically different between humans and chimps

Perhaps the most diverged gene between humans and chimpanzees

Repeatedly hit by adaptive evolution across mammals

Only known ‘speciation gene’ in mammals

Polymorphic in humans – leads to variation in hotspots and genome instability

Page 22: The tangled genome Gil McVean. The real heroes.

Questions

• We know from previous work in a few regions that hotspot locations tend not to be shared between humans and chimpanzees

• Calculations suggested that only 40% of human hotspots were driven by PRDM9 binding

• But..– Is there any hotspot sharing?– Do we conservation of recombination rates at any scale?– What features determine hotspot location in chimpanzees?

Page 23: The tangled genome Gil McVean. The real heroes.

The first genome-wide fine-scale map of recombination for a non-reference organism

Auton et al. 2012

Page 24: The tangled genome Gil McVean. The real heroes.
Page 25: The tangled genome Gil McVean. The real heroes.

Chimpanzee recombination is dominated by hotspots in a manner similar to humans

Page 26: The tangled genome Gil McVean. The real heroes.

But the hotspots are not in the same locations

Page 27: The tangled genome Gil McVean. The real heroes.

Fine-scale profiles around genes are similar

Page 28: The tangled genome Gil McVean. The real heroes.

As is rate variation around CpG islands

Page 29: The tangled genome Gil McVean. The real heroes.

Substantial PRDM9 diversity, but overlap in predicted binding sequences

Page 30: The tangled genome Gil McVean. The real heroes.

No signal for predicted binding sequences

Page 31: The tangled genome Gil McVean. The real heroes.

Similarities at 1Mb scale

Page 32: The tangled genome Gil McVean. The real heroes.

Human and chimp recombination rates are correlated at the chromosomal scale

Page 33: The tangled genome Gil McVean. The real heroes.

Human and chimp recombination rates are only correlated at broad scales

Page 34: The tangled genome Gil McVean. The real heroes.

Lower correlation in structural rearrangements

• All, bar one, of the inverted regions are pericentric so change in position wrt to centromere does not contribute

• Change in proximity to telomere is important

Page 35: The tangled genome Gil McVean. The real heroes.

chimphuman

C.A.

2a

2b

2a

2b

2

t

A natural experiment: chromosomal fusion

Page 36: The tangled genome Gil McVean. The real heroes.

Fusion region shows 3-fold decrease in recombination rate

Page 37: The tangled genome Gil McVean. The real heroes.

Fusion region shows 3-fold decrease in recombination rate

Page 38: The tangled genome Gil McVean. The real heroes.

A tangle of histories

Page 39: The tangled genome Gil McVean. The real heroes.

Distribution of sickle allele

Of malaria

Page 40: The tangled genome Gil McVean. The real heroes.
Page 41: The tangled genome Gil McVean. The real heroes.

How many variants are shared through descent?

Page 42: The tangled genome Gil McVean. The real heroes.

SNPs shared by humans and chimpanzees (33,906 autosomal and 527 X chromosome)

Human polymorphism 9.4 million autosomal and 261,000 X chromosome SNPs from 1000 genomes Pilot 1 YRI (59 individuals)

Chimpanzee polymorphism3.8 million autosomal and 102,000 X chromosome SNPs from PanMap Pan troglogdytes verus (10 individuals)

Human-chimpanzee shared haplotypesAt least two shared SNPs in 4kb with the same

LD

reduce recurrentmutation

Human-chimpanzee shared coding SNPs

identify potentially functional coding variants

reduce artifactual sharing due to known or cryptic paralogs by filtering out SNPs with low 50 bp mappability, with high read depth, or not found in 1000 Genomes Phase 1

130 regions with shared haplotypes

outside the MHC

135 shared non-synonymous SNPs1 shared premature stop SNP200 shared synonymous SNPs

outside the MHC

7 resequenced using Sanger sequencing

8 with more than two pairs in LD

Page 43: The tangled genome Gil McVean. The real heroes.

Outside of the MHC, six clear-cut cases of trans-species polymorphisms

All non-coding and putatively regulatory

FREM3/GYPE MTRR IGFBP7

Page 44: The tangled genome Gil McVean. The real heroes.

In intron of IGFBP7

TFBS conserved in human/mouse/rat

Chromatin state segmentationby HMMDNaseI hypersensitive sites

Human-Chimpanzee shared SNPs

Primate phastCons score

TFBS identified by ChIP-seq

IGFBP7 gene structure

RelACUTL1

4kb

Regulatory region in HUVEC Regulatory region in NHEK and HMECWeak

enhancerWeak

enhancerStrong

enhancerStrong

enhancer

SRF Bach1

STAT3GATA-2

ISGF-3

Weak enhancer

20kb

Aver

age

pairw

ise

diffe

renc

esOpen chromatin by FAIRE

Page 45: The tangled genome Gil McVean. The real heroes.

• In total, 130 regions with shared human-chimpanzee haplotypes. Six clear-cut cases of ancient balanced polymorphisms.

• None are protein-coding. Eleven occur in non-coding genes (e.g., 7 in lincRNAs). Eleven compelling cases of regulatory regions.

• What do these regions have in common?

Page 46: The tangled genome Gil McVean. The real heroes.

SNPs shared by humans and chimpanzees

Shared haplotypesShared coding SNPs

Closest gene within 20 kb of a human-chimp shared haplotype (n=26, p=2x10-5, FDR=0.03)

Genes human-chimp coding shared SNP (n=99, p=0.017, FDR=0.20)

Enrichment of membrane glycoproteins-> host-pathogen interactions

Glycoproteins Glycoproteins

Page 47: The tangled genome Gil McVean. The real heroes.

Project Participants

• University of OxfordAdam AutonRory BowdenPeter HumburgZam IqbalGerton LunterJulian MallerSimon MyersSusanne PfeiferIsaac TurnerOliver VennPeter Donnelly (PI)Gil McVean (PI)

• Biomedical Primate Research CentreRonald Bontrop

• University of ChicagoAdi Fledel-AlonRyan Hernandez (UCSF)Ellen LefflerCord MeltonLaure SegurelMolly Przeworski (PI)

• FundersHoward Hughes Medical InstituteNational Institute of HealthRoyal SocietyWellcome Trust

Page 48: The tangled genome Gil McVean. The real heroes.

Where next?

Page 49: The tangled genome Gil McVean. The real heroes.

Remarkable structural and sequence diversity in chimp PRDM9

Page 50: The tangled genome Gil McVean. The real heroes.

Variation greater than in human populations

Page 51: The tangled genome Gil McVean. The real heroes.

Little correlation in fine-scale structure around DNA repeat elements

Page 52: The tangled genome Gil McVean. The real heroes.

No activating motif discovered in chimp

CCTCCCT