The tangled genome
Gil McVean
The real heroes
PanMap – Genome sequencing of 10 Western Chimpanzees
• Patterns of small insertion and deletion are quite different and reveal details of DNA repair pathways
• Patterns of recombination in humans and chimpanzees are highly diverged at the fine-scale, but largely conserved at broad scales
• There are a surprising number (6+ now ‘confirmed)’) of trans-specific polymorphisms, probably maintained through host-pathogen interactions
A tangle of sequence
Difficulties of working with an incomplete reference
Using de novo assembly to find variants
Entire populationEntire population
Sample 1
Sample 2
Chromosome 1
Using Cortex leads to a high quality set of variants
Diversity in Western Chimpanzees
• Similar diversity as humans of European origin (0.06%-0.08%)• Excess of common variants• 1% variants shared with humans
Non-slippage indels are strongly biased to deletions
13:1 bias toward deletions.Unexpected peak at 4bp
Indels as indicators of DNA repair processes
Insertions deletions
5 10 2015 25
5
10
20
15
25
5
10
20
15
25
5 10 2015 25Indel size Indel size
Longest word agreement
TGACGAACTTATACTGCTTGAATA
TGACGAAC
ATTGAATA
TGAC--ATACTGAATATGACTTAT
Losing GAAC
A tangle of trees
Myers et al. 2005
The zinc-finger protein PRDM9 determines hotspot location
Myers et al. 2010
PRDM9 Zinc fingers are radically different between humans and chimps
Perhaps the most diverged gene between humans and chimpanzees
Repeatedly hit by adaptive evolution across mammals
Only known ‘speciation gene’ in mammals
Polymorphic in humans – leads to variation in hotspots and genome instability
Questions
• We know from previous work in a few regions that hotspot locations tend not to be shared between humans and chimpanzees
• Calculations suggested that only 40% of human hotspots were driven by PRDM9 binding
• But..– Is there any hotspot sharing?– Do we conservation of recombination rates at any scale?– What features determine hotspot location in chimpanzees?
The first genome-wide fine-scale map of recombination for a non-reference organism
Auton et al. 2012
Chimpanzee recombination is dominated by hotspots in a manner similar to humans
But the hotspots are not in the same locations
Fine-scale profiles around genes are similar
As is rate variation around CpG islands
Substantial PRDM9 diversity, but overlap in predicted binding sequences
No signal for predicted binding sequences
Similarities at 1Mb scale
Human and chimp recombination rates are correlated at the chromosomal scale
Human and chimp recombination rates are only correlated at broad scales
Lower correlation in structural rearrangements
• All, bar one, of the inverted regions are pericentric so change in position wrt to centromere does not contribute
• Change in proximity to telomere is important
chimphuman
C.A.
2a
2b
2a
2b
2
t
A natural experiment: chromosomal fusion
Fusion region shows 3-fold decrease in recombination rate
Fusion region shows 3-fold decrease in recombination rate
A tangle of histories
Distribution of sickle allele
Of malaria
How many variants are shared through descent?
SNPs shared by humans and chimpanzees (33,906 autosomal and 527 X chromosome)
Human polymorphism 9.4 million autosomal and 261,000 X chromosome SNPs from 1000 genomes Pilot 1 YRI (59 individuals)
Chimpanzee polymorphism3.8 million autosomal and 102,000 X chromosome SNPs from PanMap Pan troglogdytes verus (10 individuals)
Human-chimpanzee shared haplotypesAt least two shared SNPs in 4kb with the same
LD
reduce recurrentmutation
Human-chimpanzee shared coding SNPs
identify potentially functional coding variants
reduce artifactual sharing due to known or cryptic paralogs by filtering out SNPs with low 50 bp mappability, with high read depth, or not found in 1000 Genomes Phase 1
130 regions with shared haplotypes
outside the MHC
135 shared non-synonymous SNPs1 shared premature stop SNP200 shared synonymous SNPs
outside the MHC
7 resequenced using Sanger sequencing
8 with more than two pairs in LD
Outside of the MHC, six clear-cut cases of trans-species polymorphisms
All non-coding and putatively regulatory
FREM3/GYPE MTRR IGFBP7
In intron of IGFBP7
TFBS conserved in human/mouse/rat
Chromatin state segmentationby HMMDNaseI hypersensitive sites
Human-Chimpanzee shared SNPs
Primate phastCons score
TFBS identified by ChIP-seq
IGFBP7 gene structure
RelACUTL1
4kb
Regulatory region in HUVEC Regulatory region in NHEK and HMECWeak
enhancerWeak
enhancerStrong
enhancerStrong
enhancer
SRF Bach1
STAT3GATA-2
ISGF-3
Weak enhancer
20kb
Aver
age
pairw
ise
diffe
renc
esOpen chromatin by FAIRE
• In total, 130 regions with shared human-chimpanzee haplotypes. Six clear-cut cases of ancient balanced polymorphisms.
• None are protein-coding. Eleven occur in non-coding genes (e.g., 7 in lincRNAs). Eleven compelling cases of regulatory regions.
• What do these regions have in common?
SNPs shared by humans and chimpanzees
Shared haplotypesShared coding SNPs
Closest gene within 20 kb of a human-chimp shared haplotype (n=26, p=2x10-5, FDR=0.03)
Genes human-chimp coding shared SNP (n=99, p=0.017, FDR=0.20)
Enrichment of membrane glycoproteins-> host-pathogen interactions
Glycoproteins Glycoproteins
Project Participants
• University of OxfordAdam AutonRory BowdenPeter HumburgZam IqbalGerton LunterJulian MallerSimon MyersSusanne PfeiferIsaac TurnerOliver VennPeter Donnelly (PI)Gil McVean (PI)
• Biomedical Primate Research CentreRonald Bontrop
• University of ChicagoAdi Fledel-AlonRyan Hernandez (UCSF)Ellen LefflerCord MeltonLaure SegurelMolly Przeworski (PI)
• FundersHoward Hughes Medical InstituteNational Institute of HealthRoyal SocietyWellcome Trust
Where next?
Remarkable structural and sequence diversity in chimp PRDM9
Variation greater than in human populations
Little correlation in fine-scale structure around DNA repeat elements
No activating motif discovered in chimp
CCTCCCT