Top Banner
www.sciencemag.org/content/354/6311/477/suppl/DC1 Supplementary Materials for Chimpanzee genomic diversity reveals ancient admixture with bonobos Marc de Manuel,* Martin Kuhlwilm,* Peter Frandsen,* Vitor C. Sousa, Tariq Desai, Javier Prado-Martinez, Jessica Hernandez-Rodriguez, Isabelle Dupanloup, Oscar Lao, Pille Hallast, Joshua M. Schmidt, José María Heredia-Genestar, Andrea Benazzo, Guido Barbujani, Benjamin M. Peter, Lukas F. K. Kuderna, Ferran Casals, Samuel Angedakin, Mimi Arandjelovic, Christophe Boesch, Hjalmar Kühl, Linda Vigilant, Kevin Langergraber, John Novembre, Marta Gut, Ivo Gut, Arcadi Navarro, Frands Carlsen, Aida M. Andrés, Hans. R. Siegismund, Aylwyn Scally, Laurent Excoffier, Chris Tyler- Smith, Sergi Castellano, Yali Xue, Christina Hvilsom,† Tomas Marques-Bonet,† *These authors contributed equally to this work. †Corresponding author. Email: [email protected] (C.H.); [email protected] (T.M.-B.) Published 28 October 2016, Science 354, 477 (2016) DOI: 10.1126/science.aag2602 This PDF file includes: Materials and Methods Figs. S1 to S58 Tables S1 to S19 References Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/354/6311/477/suppl/DC1) Data S1
129

Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

Apr 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

www.sciencemag.org/content/354/6311/477/suppl/DC1

Supplementary Materials for Chimpanzee genomic diversity reveals ancient admixture with bonobos

Marc de Manuel,* Martin Kuhlwilm,* Peter Frandsen,* Vitor C. Sousa, Tariq Desai, Javier Prado-Martinez, Jessica Hernandez-Rodriguez, Isabelle Dupanloup, Oscar Lao,

Pille Hallast, Joshua M. Schmidt, José María Heredia-Genestar, Andrea Benazzo, Guido Barbujani, Benjamin M. Peter, Lukas F. K. Kuderna, Ferran Casals, Samuel Angedakin,

Mimi Arandjelovic, Christophe Boesch, Hjalmar Kühl, Linda Vigilant, Kevin Langergraber, John Novembre, Marta Gut, Ivo Gut, Arcadi Navarro, Frands Carlsen,

Aida M. Andrés, Hans. R. Siegismund, Aylwyn Scally, Laurent Excoffier, Chris Tyler-Smith, Sergi Castellano, Yali Xue, Christina Hvilsom,† Tomas Marques-Bonet,†

*These authors contributed equally to this work.†Corresponding author. Email: [email protected] (C.H.); [email protected] (T.M.-B.)

Published 28 October 2016, Science 354, 477 (2016)

DOI: 10.1126/science.aag2602

This PDF file includes:

Materials and Methods Figs. S1 to S58 Tables S1 to S19 References

Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/content/354/6311/477/suppl/DC1)

Data S1

Page 2: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

2 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Table of Contents

Materials and Methods 3  

1. Data generation 3  1.1  Novel  sequencing:   3  1.2.  Mapping  and  SNP  calling   3  1.3.  Phasing  of  the  genotype  data   4  1.4.  Ancestral  allele  calls   5  

2. Genetic diversity in chimpanzees and bonobos 5  2.1.  Kinship  between  the  sampled  individuals   5  2.2.  Y  chromosome  analysis   6  2.3.  Heterozygosity   7  2.4.  Haplotype  sharing   7  2.5.  Linkage  disequilibrium   8  2.6.  ψ  range-­‐expansion  statistic   8  

3. Population structure and historical effective population sizes 9  3.1.  Principal  component  analysis   9  3.2.  sNMF,  fineSTRUCTURE  and  ADMIXTURE   10  3.3.  Pairwise  Fst  in  the  chimpanzee  subspecies   12  3.4.  PSMC,  MSMC2,  ancestral  effective  population  size  and  gene  flow   13  

4. Gene flow within the Pan clade 17  4.1  Inference  of  migration  with  TreeMix   17  4.2  Population-­‐wise  and  individual  D-­‐statistics   18  4.3  Frequency-­‐stratified  D-­‐statistics  (Dj  and  Djx)   19  4.4  D-­‐statistics  in  the  X  chromosome   20  4.5  Comparison  to  published  data   20  4.6  Divergence  in  windows  of  50kb   22  4.7  Putatively  introgressed  regions  in  the  chimpanzee  genomes   23  4.8  Estimating  the  age  of  introgressed  segments  using  ARGweaver   25  4.9  Simulations  support  genome-­‐wide  observations   28  4.10  Alternative  scenarios.   31  

5. Demographic modelling and inference based on the Site Frequency Spectrum 34  5.1  Likelihood  inference  of  demographic  models  based  on  the  Site  Frequency  Spectrum.   34  5.2  Estimates  for  models  with  western  chimpanzees.   37  5.3  Estimates  for  models  with  Nigeria-­‐Cameroon  chimpanzees.   38  5.4  Assessing  the  fit  to  the  observed  SFS   40  

Figures 41  

Tables 99  

References 122  

Page 3: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Materials and Methods

1. Data generation

1.1 Novel sequencing: A total of 65 chimpanzees and 10 bonobos (25 chimpanzee and 10 bonobo individuals

previously published (8) and 40 novel chimpanzee genomes) were analysed in this study. The distribution of individuals within chimpanzee subspecies is summarized as follows: 12 western chimpanzees (Pan troglodytes verus), 10 Nigeria-Cameroon (Pan troglodytes ellioti), 20 central chimpanzees (Pan troglodytes troglodytes), and 23 eastern chimpanzees (Pan troglodytes schweinfurthii). Details in the geographical distribution of the chimpanzee samples collected and sequenced can be found in Fig. S1, Table S1, and Data file S1. Blood samples were collected in sanctuaries across Africa and from wild-born chimpanzees in European zoos. Conscious of the sparse origin information and uncertainties inherent to such samples, we collected all possible information about confiscation or capture site, in order to make the best qualified judgment of the geographical place of origin. The detail in our reported sites of origin (Data file S1) thus reflects our confidence, i. e. the more geographically detailed information on origin, the more confidence in the precision.

All samples analysed here were wild-born (except for the western chimpanzees Donald and Clint, which are the donors of the chimpanzee reference genome assembly), and DNA was extracted from blood in each case. All blood samples were taken during routine health checks, and Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) permits were obtained. Sequencing was carried out on Illumina sequencing machines using standard library preparation protocols.

Additionally, we sequenced a panel of six chimpanzee genomes (highlighted in Table S1) that were produced specifically to test our findings in chimpanzee population structure (Suppl. Mat. 3). We also captured and sequenced four chromosome 21 fecal samples from Loango (central chimpanzees – Gabon) and Ngogo (eastern chimpanzees - Uganda). One microgram of DNA from each sample was sheared to an average fragment length of 300bp using Covaris’ S2 focused ultrasonicator in a volume of 130µl. DNA libraries were prepared following the protocol of Meyer and Kircher 2010 (26) using 50µl of sheared DNA. Sample-specific indices were introduced into both library adapters by amplification with 5’ tailed indexing primers (27). Following purification with carboxylated magnetic beads (26), 2µg of amplified library were subjected to one round of hybridization capture using chromosome 21 baits derived from the human reference genome sequence (28). Amplified libraries were then pooled and sequenced for 125 cycles from both ends using 1 lane of Illumina’s HiSeq 2500 and a recipe for double-indexed sequencing (27). Base calling was performed using IBIS. When possible, full-length molecule sequences were reconstructed from overlapping forward and reverse reads using LeeHom (29).

1.2. Mapping and SNP calling All newly sequenced genomes listed in Table S1 and data from Prado-Martinez et al.

(2013) (8) were mapped to the chimpanzee genome reference sequence CHIMP2.1.4 (http://www.ensembl.org/Pan_troglodytes). Alignment was carried out using BWA-MEM v0.7.5a-r405 (30) with default parameters. After marking and removing PCR duplicates using

Page 4: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

4 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

PICARD v1.91 (http://picard.sourceforge.net.), single nucleotide polymorphisms (SNPs) were called using FREEBAYES v0.9.14 (31) using the following parameters: --standard-filters --no-population-priors -p 2 --report-genotype-likelihood-max --standard-gls --prob-contamination 0.05.

We then constructed a mappability mask to identify positions in the reference genome where variants could not be confidently called. We began by applying the mappability module in GEM (32), which uses a k-mer approach to identify genomic regions that are duplicated and therefore likely to be problematic for SNP calling. We retrieved duplications with up to 4 mismatches. Then we identified the regions of the genome where all samples had between 3-fold and 100-fold depth of aligned read coverage (callable sites), and excluded any site outside the intersection of all such regions. Conservatively, we excluded polymorphic sites which were not biallelic or where quality < 30. Finally, to avoid false positive heterozygous calls due to mapping errors, we excluded any site where more than 80% of the samples had non-reference alleles in a heterozygous state. After excluding these filtered positions, the total callable genome length was 1,721,192,217 sites. Table S2 shows summary statistics of the SNP calling. We observe that the increased sample size in comparison to the previous dataset (8), allowed us to detect 32% more variable sites (from 16,452,732 to 22,081,627 SNVs).

Additionally, all reads were aligned to the human genome reference sequence hg19 (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/) and variants were called with the same parameters described above.

We also mapped the new (low coverage, 6 genomes) sequences and performed a new joint variant calling against pantro4 with the original samples and the new “test” individuals (Table S1). Parameters to call SNPs and select high quality genotypes were identical to the ones described above. It is important to note that all analyses are based on the original set of 59 chimpanzee genomes if the contrary is not specified in the beginning of each section. Also, the western individual Donald, a second generation of a central-western hybrid, was not used in the analysis if not specified.

1.3. Phasing of the genotype data Genotype data from all chimpanzee subspecies were phased using SHAPEIT (33).

Only high quality SNPs were used for haplotype phasing (biallelic, polymorphic and without missing genotypes). This filtering resulted in 8,426,604 SNPs. SHAPEIT was run with the following parameters: 100 states, window size of 0.5 Mbp, effective population size of 15,000 and the only available fine-scale chimpanzee recombination map was provided to estimate haplotypes (34). Since the chimpanzee genetic maps were generated using a different version of the chimpanzee reference genome (panTro2) we used liftOver (UCSC) (35) in order to convert the coordinates to panTro4. Given that no available chain exists to directly convert panTro2 to panTro4, we performed an intermediate conversion from panTro2 to panTro3. After this intermediate step, we mapped panTro3 coordinates to panTro4.

SHAPEIT provides a way to assess the accuracy of the estimated phased haplotypes. An internal sampling algorithm was run 100 times to output a set of haplotypes for each individual. This way, we estimated phasing uncertainty by counting heterozygous sites

Page 5: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

5 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

resolved differently by SHAPEIT at each of the replicates. Uncertainty score can vary from 0 (homozygous sites and heterozygous sites are resolved equally) to 0.5 (SHAPEIT flips the heterozygous sites to each chromosome with the same probability).

We summarised the results of the phasing uncertainty by averaging the uncertainty score across all sites from individuals of the same subspecies. Phasing uncertainty was higher in central chimpanzees, with an average score of 0.0088, followed by Nigeria-Cameroon chimpanzees (0.0062), eastern chimpanzees (0.0047) and western chimpanzees (0.0032). Phasing uncertainty was vastly low, although periodically interrupted by regions with higher scores (probably regions with high recombination rate). Nevertheless, it is important to note that phasing uncertainty is not a direct indicator of the phasing performance but a sign of consistency between phasing runs.

1.4. Ancestral allele calls Variant orientation for each SNP was performed following two different approaches. In the pantro4 mapped data, we obtained the ancestral allele from the 6-primate EPO alignment (36, 37), while in the hg19 mapped data the human allele was defined as an outgroup and therefore used as ancestral state. The inferences from the EPO pipeline may generate a bias in the data, most probably due to the lower reliability in the ancestral calls when the reference genome sequences in the alignment carry different alleles (ftp.ensembl.org/pub/release-65/fasta/ancestral_alleles/pan_troglodytes_ancestor_CHIMP2.1.4_e65.README). The EPO-pipeline reconstructs the ancestral allele based on an alignment of reference genomes, possibly resulting in a lower amount of ancestral inferences at derived alleles in a given reference sequence. Given that the chimpanzee reference genome sequence was assembled from a western chimpanzee individual (38), the total amount of derived sites shared by non-western chimpanzees and bonobos may be overrepresented. Indeed, we show in Supp. Mat. 4.2 that using the EPO ancestral calls leads to an amplification of the genetic distance between western chimpanzees and the rest of Pan populations. In order to avoid this bias, all the analyses to investigate admixture have been performed using the hg19 mapped data considering the human allele as ancestral and also using the pantro4 mapped data and the ancestral state, to give consistency to our results.

2. Genetic diversity in chimpanzees and bonobos

2.1. Kinship between the sampled individuals Kinship among the sampled individuals was inspected with KING (39). Genotype

data segregating in each subspecies was retrieved from the original VCF file and filtered by a minor allele frequency of 0.05 using PLINK (40). Kinship coefficient was estimated with the –kinship command from KING 1.4, which reflects the proportion of SNPs with identical state (IBS0, identity by state zero) between individuals. Negative coefficients indicate unrelated relationships, while positive values may point to genealogy links between individuals. As expected, no relatedness was found in any pair-wise comparisons between all individuals due to our sample selection criteria (Fig. S2).

Page 6: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

6 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

2.2. Y chromosome analysis All male samples (11 chimpanzees and two bonobos from (8) and 10 newly-

sequenced chimpanzees) were used for the analysis of the male-specific region of the Y chromosome (MSY). The distribution of male samples within chimpanzee subspecies is: Five central, six eastern, four Nigeria-Cameroon and six western (Table S1).

The SAMtools (41) mpileup v1.3 multisample option was used to call all confident sites, single nucleotide variants and indels with minimum base quality 20 and minimum mapping quality 20. Regions included in the calling have been previously defined as X-degenerate and human-orthologous (panTro4 chrY:16914621-21568628; chrY:21664388-23510221; chrY:23539251-23799038; chrY:24570160-26342871); additionally, we included single-copy spacer regions from palindromes P6, P7 and P8 (chrY:24161503-24207672; chrY:23518416-23531050; chrY:21614822-21618184), a total of approximately 8.59 Mb (42). Variant calling and filtering were done separately for two datasets: one containing only male chimpanzees (n=21) and one containing all male Pan samples (bonobos and chimpanzees, n=23).

VCFtools v.0.1.14 (43) and in-house Perl scripts were used to remove all indels, sites with quality score less than 30, and all heterozygous calls unless ≥ 90 % of high quality reads supported a single allele, then the major allele was kept in a homozygous state. We further excluded regions according to mapability mask (Suppl. Mat. 1.2) and sites with a minimum mean depth of ≥ 1 in female samples. Finally, we allowed for no missing data and we removed sites in the top and bottom 1 % of the total filtered depth distribution, translating to <158 and >320 (7.5× to 15.2×) for chimpanzees-only and to <176 and >355 (7.6× to 15.4×) for all-Pan data. A total of 5,849,678 bp (including 22,955 SNVs) were left after filtering for chimpanzee-only and 5,720,892 bp (including 46,475 SNVs) for all-Pan datasets.

In order to check the quality of the final calls we took advantage of the fact that the two western chimpanzees Clint and Donald are included in the current dataset. The majority of the chimpanzee Male Specific part of the Y chromosome (MSY) reference sequence was constructed by sequencing Clint, although some regions also originate from Donald (42). Within both call sets (chimpanzee-only and all-Pan), only eight positions in Clint differ from the panTro4 assembly. In all cases the alternative allele was supported by 10 to 25 filtered reads in Clint, and furthermore matched the reference allele in Donald, suggesting that these specific regions in the reference assembly originate from Donald. Nucleotide diversity and its standard deviation were calculated using Arlequin v3.5.1.2 (44). The MSY of the Pan genus is characterized by high diversity. Nucleotide diversity in bonobos and chimpanzees (excluding western chimpanzees) is higher than in other great-ape species (45) (Table S3) and with at least 3 times higher nucleotide diversity in central chimpanzees compared to any of the other chimpanzee subspecies. Markedly, the MSY shows very low levels of nucleotide diversity in the western subspecies (>7X lower than other subspecies). We used PHYLIP v3.69 to build maximum parsimony phylogenetic trees using both chimpanzee-only and all-Pan datasets (46). Three independent trees were constructed using randomisation of input order, each 10 times. Output trees from these runs were used to build a consensus tree with the consensus program included in PHYLIP package. The chimpanzee-only tree was rooted using calls combined from bonobos Desmond and Bono. To root the all-Pan tree, human

Page 7: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

7 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

alleles were extracted from the Blastz axt alignments of panTro4 and hg19 reference sequences. FigTree v1.4.0 was used to visualise the tree (tree.bio.ed.ac.uk/software/figtree/).

The Bayesian Markov chain Monte Carlo phylogenetics software BEAST v1.8.1 (47) was used to estimate the time-to-most-recent common ancestor (TMRCA) for nodes of interest as described (45). In brief, due to the absence of mutation rate estimates for chimpanzees, we used the human rate of 3.07 (95% CI: 2.76–3.40) × 10−8 mutations/nucleotide/generation (48). This was scaled according to the generation time of 25 years for chimpanzees (49) and bonobos (assumed). Three independent MCMC runs were performed with 20,000,000 generations, logging every 1000 steps; the first 2,000,000 generations were discarded as burn-in. Three runs were combined using LogCombiner v1.8.1. The GTR substitution model was identified as the best fit to our data according to the corrected Akaike Information Criterion (AICc) as implemented in MEGA6 (50). We used a strict clock and constant-sized coalescent tree prior for the chimpanzee-only dataset, and a Yule speciation prior for dating the Pan MSY root, and applied a normal distribution as a prior based on the 95% CI of the substitution rate. In the runs, only the variant sites were used; the composition of invariant sites was specified in the BEAST xml file. TMRCAs were estimated in a single run including all samples, and assigning samples to specific nodes according to MP tree in Fig. S3.

2.3. Heterozygosity To assess genetic diversity within and between chimpanzee populations, we analysed

the distribution of heterozygosity values (heterozygous calls per bp) along the genome of each individual (Fig. S4 and Table S4). Individual heterozygosity values were then combined into subspecies and population estimates. Donald, a second-generation hybrid individual was excluded to estimate the western subspecies heterozygosity.

Western chimpanzees and bonobos showed the lowest heterozygosity (0.63 per 1000 bp), followed by Nigeria-Cameroon chimpanzees (0.94), eastern chimpanzees (1.28) and central chimpanzees (1.47). These values are consistent with previously reported estimates (8, 19, 20, 51).

2.4. Haplotype sharing We performed haplotype-sharing analysis to inspect differentiation among

chimpanzee subspecies. We scanned the computationally phased chimpanzee genomes (Suppl. Mat. 1.3) with windows of lengths from 1 Kb to 200 Kb with a custom python script. We applied a filter excluding windows with a callable fraction below 40%. The definition of callable sites is provided in Suppl. Mat 1.2. To account for differences in the number of individuals sampled for each subspecies, we randomly selected 8 individuals from each subspecies in 40 replicates. Number of different haplotypes was averaged for all window lengths and replicates, as well as the number of private haplotypes (haplotypes for a particular window that are exclusively found within individuals of the same subspecies). To ascertain that our analysis was not biased by phasing uncertainty, we repeated the genotype phasing with BEAGLE v4 (52). We found very consistent haplotype-sharing profiles (data no shown), indicating that our results are not biased by the phasing methodology.

Page 8: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

8 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

In agreement with previous observations (53), central chimpanzees harbour the highest amount of the species diversity, displaying higher values at short windows and a rapid increase of private haplotypes in comparison to the rest of common chimpanzees (Fig. S5). Interestingly, eastern and Nigeria-Cameroon populations reflected very similar trends despite their differences in heterozygosity (see Suppl. Mat. 2.3 and Fig. S4). This is consistent with a recent divergence between central and eastern chimpanzees around 120,000 years ago (Suppl. Mat. 3.4 and Suppl. Mat. 5.2), so that the eastern subspecies has undergone few generations since the split from central chimpanzees, thus reducing the amount of private diversity that has accumulated in the eastern population.

2.5. Linkage disequilibrium We explored the patterns of linkage disequilibrium (LD) in all chimpanzee subspecies

and bonobos. In order to avoid biases in terms of sample size, we randomly selected 6 individuals of each population across 40 replicates. Before looking into LD, we applied several filters to the original VCF using VCFTOOLS v 0.1.12 (43). We kept those sites segregating within each population with allele frequencies below 0.15 and above 0.85, as well as a maximum of 20% missing genotypes. LD patterns were explored using PLINK v.2 (40) as the mean r2 correlation coefficient between pairs of SNPs. Calculations of LD as a function of distance along chromosomes were done with a pair-wise calculation of 2.000 SNPs and a maximum distance of 200 Kb, and averaged over 1 Kb windows.

We found that central chimpanzees show the fastest decay in the genome, followed by eastern, Nigeria-Cameroon and western chimpanzees (Fig. S6). The rate of LD decay is faster in central chimpanzees, reflecting larger historical effective population sizes. Across the rest of the chimpanzee subspecies, LD is increasingly higher in eastern, Nigeria-Cameroon and western chimpanzees, consistent with the notion of recent bottlenecks associated with splits of these populations from the source population. Whereas patterns of linkage disequilibrium in human populations can be used to trace an origin to Africa (54, 55), similar approaches are not easily translated to species with more intricate and complex population histories (56). However, LD between nearby autosomal markers often reveal population history features, where populations with large population sizes show faster decay rates and bottlenecked populations show elevated LD (57).

2.6. ψ range-expansion statistic We applied a method developed to infer the origin of a range expansion (58). The

method is based on the observation that during an expansion, alleles tend to become either lost or common, and simultaneously increasing the frequency of shared alleles. Comparing the frequency of shared alleles are used to infer in which order populations were colonised and the origin of expansion, by using a devised statistic for pairs of populations, denoted ψ, that increases linearly with the difference in distance from the origin between the two populations.

For every analysis, we grouped all chimpanzee samples or split them into western chimpanzees and the three other chimpanzee subspecies. These groupings were chosen because western chimpanzees are more distantly related to all other subspecies and little geographic information was available for the Nigeria-Cameroon and western chimpanzee

Page 9: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

9 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

samples, making analyses for just these subspecies infeasible. As the bonobo samples had only a single estimated location, no spatial analyses could be carried out, and they were therefore excluded.

This spatial method requires knowledge of sampling locations. Conscious of the sparse origin information and uncertainties when using sanctuary and zoo individuals, we used prior information to estimate a single location per sample and a paired measure of certainty. Specifically, to measure the certainty quantitatively, we used the standard-deviation of a bi-variate normal distribution without correlation between latitude and longitude. This can be interpreted as a standard error on the location estimate or as specifying a prior distribution on the location. The uncertainty was propagated by replicating and drawing locations from the bi-variate normal distributions randomly for each run. For all analyses, we performed 100 replicate runs, each of which consisted of a burn-in of 100,000 iterations, and recorded 90 iterations each at a thinning proportion of 0.1%. We then collated the samples from all MCMC-chains and produced contour plots of the posterior means of each parameter over space. The allowed habitat was chosen by overlapping a convex hull around all sampling locations with the landmass of Africa, plus adding a 200km buffer to avoid placing samples too close to the boundary.

The range expansion analysis was performed using the implementation from Peter and Slatkin (2013) (58), which, besides the origin, also estimates the strength of the founder effect observed (Fig. S7). All three analyses point towards a most likely species origin in the western DRC, although no significant signal for an expansion was found in the western chimpanzees possibly due to the low sample size. This is consistent with the analyses in Sup Mat 3, which find no discernible structure or change in coalescence rate in western chimpanzees. The estimated founder effect is higher when including western chimpanzees, with diversity decreasing by approximately 1% every 33.0 km and weaker when excluding, with diversity decreasing 1% every 56.4 km. This could indicate a faster eastwards expansion. Just considering eastern chimpanzees gives consistent results, with an inferred founder effect of a 1% decrease in diversity every 60.1 km, and a predicted origin to the east of the current range.

3. Population structure and historical effective population sizes

3.1. Principal component analysis In order to detect population structure within and between the chimpanzee subspecies

sampled we used principal component analysis (PCA). We performed analyses including all chimpanzees, as well as separate analyses for each chimpanzee subspecies. In each case, all variants segregating in the subspecies analysed were extracted from the VCF file and used as initial input. Moreover, in the within chimpanzee subspecies analyses, the resulting variant sites were pruned to remove polymorphisms in linkage disequilibrium (LD) with an r2 coefficient above 0.5. This filter was not applied in the PCA of all chimpanzee samples to avoid the exclusion of fixed sites between subspecies.

The PCA of over 22 million single-nucleotide polymorphisms across all samples showed a hierarchical structure consistent with the accepted chimpanzee taxonomy (8) (Fig.

Page 10: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

10 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

S8), where PC1 separates western chimpanzees from the rest of common chimpanzees, and PC2 highlights the genetic differentiation of Nigeria-Cameroon chimpanzees (59). PC3 clearly separates central and eastern chimpanzees, the most genetically similar chimpanzee subspecies (60). We also identified several individuals showing some degree of genetic distinction in comparison to their fellow subspecies individuals; Donald (a second generation of a western-central hybrid), as well as Tobi, Julie and Banyo in Nigeria-Cameroon chimpanzees. These latter three, are known to be distinct compared to the rest of Nigeria-Cameroon chimpanzees (59). Prado-Martinez et al. (2013) (8) showed that this differentiation is not the result of a recent admixture and may represent genetically distinct populations. We hypothesise that these populations could be similar to the ones recently described in Mitchell et al. (2015) (61), suggesting different origins for these samples (central Cameroon or eastern Nigeria/western Cameroon).

We further explored substructure within chimpanzee subspecies (Fig. S9). One striking observation is how consistent the population structure analysis and the geography of central and eastern chimpanzees fall out. In order to formally interrogate the similarity between spatial maps and genetic variation in chimpanzees, we carried out a Procrustes analysis. Procrustes is a classical multivariate statistical technique in shape analysis that given two sets of coordinates, estimates the transformations that minimize the distance between two sets, providing a similarity score between the transformed maps. The statistical significance of the transformation can be estimated by permutation test. Similarity between PCA analysis and on-earth coordinates was assessed using the protest function in the vegan R package (62).

A Procrustes-transformed PCA (P < 1×10-4) focusing only on the eastern subspecies reveals a notable resemblance to a geographic map of East Africa (Fig. S10). In central chimpanzees, Procrustes-transformed PCA (P < 1×10-4) also separates samples by country and produces a shape in close correspondence to a map of central Africa (Fig. S10). Nevertheless, western chimpanzees do not show clear patterns of geographic and genetic differentiation (Fig. S9), probably due to their low genetic diversity compared to central and eastern chimpanzees. We could not assess the phylogeography of Nigeria-Cameroon chimpanzees due to the lack of information about their sampling locations (Table S1). These results point to a close correspondence between genetic and geographical distances in central and eastern chimpanzees, and show that accurate geographical origin should be possible to infer from genetic information.

3.2. sNMF, fineSTRUCTURE and ADMIXTURE Population structure within chimpanzee subspecies was further explored with sNMF

(63). To carry out this analysis 1 million SNPs segregating in the 59 chimpanzees individuals were randomly sampled from the original VCF (--thin option in plink). The resulting set of sites was filtered by a minor allele frequency of 0.05. We further reduced the number of sites to 168.078 by applying an LD pruning filter using PLINK (40) (--indep-pairwise 50 5 0.5). sNMF was run 20 times at all K values between 1 to 15. Among those runs with a difference to the lowest observed cross validation (CV) error of less than 0.1 units, we reported the replicate with the highest biological meaning, i.e. runs that resolved substructure among different sampling areas rather than identifying clusters within sampling areas (Fig. S11). In

Page 11: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

11 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

order to provide further support for our findings, we also explored population structure with other algorithms such as FRAPPE (64) and ADMIXTURE (65), as well as fineSTRUCTURE (66). These results were in good agreement with the sNMF analysis, suggesting that the inferred population structure is indeed well-supported by the data. Replicates showing lowest cross-entropy values are shown in Fig S12.

The inferred population clustering by sNMF is in good agreement with the PCA analysis (Supp Mat 3.1). The first division predicted (K=2), clearly separates the two main monophyletic chimpanzee clades composed by Nigeria-Cameroon-western and eastern-central. It is interesting to note that Nigeria-Cameroon individuals contain relatively high amounts of central chimpanzee ancestry (~30%) in contrast to western chimpanzees, which reflects the known gene flow between Nigeria-Cameroon and the eastern and central chimpanzee lineages (8). This central component vanishes from Nigeria-Cameroon individuals at higher K values, although it persists in Julie, Tobi and Banyo. Based on the cross-entropy of all values of K, the best division of the data is explained by K=4, where the four chimpanzee subspecies are accurately divided in different groups (Fig. S12). Subsequent K values identify geographic substructure within eastern and central chimpanzee subspecies remarkably well. K=10 is especially striking, since it depicts an almost perfect clustering of individuals according to their inferred regional clusters (Table S1).

Chimpanzee population structure was also explored with fineSTRUCTURE. This method takes advantage of dense genotype data to infer subtle population structure on the basis of haplotype similarity (67). When using ChromoPainter (66), we first ran 10 EM iterations to infer the “global mutation” and “switch rate” parameters. These inferred values were averaged across all chromosomes and weighted by the number of SNPs. The final ChromoPainter analysis was performed using these weight-averaged values. fineSTRUCTURE was run with the chimpanzee fine-scale recombination map (34) and the output from ChromoPainter. Our linked fineSTRUCTURE analysis estimates K=32 as the best clustering of data (Fig. S13) and in almost perfect agreement with sample locations except for one misplaced individual from Uganda (Harriet). This individual does not show similar trends in the previous analysis, suggesting that phasing uncertainty or chromosome painting might be playing as a confounding factor in the clustering process of this individual. Moreover, multiple subtle differences between individuals that were mildly detected in PCA and sNMF are strongly supported by fineSTRUCTURE (for example, Donald in western chimpanzees and Julie, Banyo and Tobi in Nigeria-Cameroon, or Guinea individuals from the western chimpanzee subspecies). We note that the best “K” is estimated differently in sNMF and fineSTRUCTURE, and that the best K in fineSTRUCTURE should not be interpreted as the number of groups in natural populations. Although fineSTRUCTURE detects population substructure between Guinea individuals and the rest of western locations, PCA and sNMF analyses do not clearly reflect this pattern.

An independent variant calling was performed with the read sequences from fecal samples in Loango (central chimpanzees, Gabon) and Ngogo (eastern chimpanzees, Uganda), and the low-coverage testing genomes (Table S1). Reads from fecal samples were mapped to the chimpanzee reference sequence of chromosome 21. Furthermore, alignments from whole-genome sequences of chimpanzees were subset for reads mapped to chromosome 21. A joint variant calling of both datasets was performed following the same procedure explained in

Page 12: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

12 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Suppl. Mat. 1.2. After calling variants, sites were filtered by a minimum depth of 4 in order to get high confidence genotypes. Additionally, we only kept sites with 80% missing data in whole data set and where at least one of the fecal-captured samples showed a high-quality genotype. This stricter filtering (in comparison to the one described in Suppl. Mat. 1.2) was applied to avoid the inclusion of genotypes putatively wrongly called due to the enrichment process required for fecal samples. This procedure resulted in 72,711 SNPs, of those 34,664 segregating in eastern chimpanzees and 45,786 in central chimpanzees. sNMF analyses and PCA were performed as described for the original dataset. In order to integrate all evidences supporting regional sub-structure in the central and eastern subspecies, we combine sNMF and PCA analyses in Fig. S10.

We finally estimated the individual ancestries for bonobos and chimpanzees together with ADMIXTURE (65). The analysis was done on two separate data sets, one including the data from 59 chimpanzees exclusively, and one that includes the 59 chimpanzees along with the 10 bonobo individuals. After excluding sex chromosomes, both sets where subjected to the same standard filters using PLINK v. 1.07 (40) excluding minor allele frequencies along with all sites that are polymorphic in only one individual (--maf 0.05) as well as all sites including missing data (--geno 0.0). Both sets were further LD pruned (--indep-pairwise 50 5 0.1), leaving 372,250 sites for analysis in the chimpanzee data set and 475,886 sites when including bonobo samples. We ran ADMIXTURE v. 1.2 with an EM optimization algorithm in a range of K values (2-10), setting a termination criterion of reaching a log-likelihood increase less than 10-5 between iterations. For each value of K, runs were replicated 100 times with random seeds to check for convergence between independent runs and standard ADMIXTURE cross-validation (CV) procedures were used to identify the most likely value of K.

The shared ancestry between chimpanzees inferred from the ADMIXTURE analysis on chimpanzees exclusively, only showed converging results for values of K ranging from two to five with general results similar to previous clustering methods. When including the bonobo samples, we get similar results to the above, with the addition of robust evidence for shared ancestry between chimpanzees and bonobos (Fig. S14). Based on the CV values for this data set, K = 5 is clearly the most likely clustering of the data. At K = 5, additional to a shared ancestry with the northernmost eastern subspecies, Marlin and Doris (central) shows to harbour 3-5 % bonobo ancestry.

3.3. Pairwise Fst in the chimpanzee subspecies We calculated Fst between each pair of individuals in the sample, i.e. the proportion of

variation that is within vs. between individuals. This is calculated from the pairwise number of differences between individuals (πBetween) and the number of heterozygous positions within an individual (πWithin). Individual Fst is then: [πBetweenAB - (πWithinA + πWithinB)/2]/ πBetweenAB. This provides a measure of drift experienced by each genome. Such an estimate of Fst is sensitive to both the divergence time between individuals, which increases πBetween, and to the long-term Ne of the populations these individuals belong to. For instance Ne reduction in the population of one of the two individuals can both lower πWithin and increase πBetween. All positions considered needed to be consistent across all

Page 13: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

13 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

individuals (see below). Therefore, we excluded all positions that did not have a called genotype in each individual i.e. N=69 in the chimpanzee vs. bonobo comparisons.

Considering the tree based on divergence, πBetween (Fig. S15), we find that each chimpanzee subspecies is monophyletic. The depth of the branches within chimpanzee sub-species clades is proportional to the number of differences, both homozygous and heterozygous, between individuals for positions polymorphic in that subspecies. As the number of polymorphic positions within a subspecies is reflective of Ne, the length of within sub-species branches is expected to follow the relative order of Ne. Indeed, we find internal branch lengths follow the pattern central > eastern > Nigeria-Cameroon > western. Considering the NJ tree based on individual Fst distances (Fig. S16), the topology of subspecies follows the known phylogenetic relationships, but it is important to note that mid-point rooting of the tree would result in western chimpanzees being an out group to the other three subspecies. This is reflective of the significant effect of drift that has occurred in this subspecies. Overall we see a pattern where the total branch lengths follow the order western > Nigeria-Cameroon > eastern > central that reflect genetic drift.

To investigate substructure, we turn again to Fig. S15 and Fig. S16, which suggest some level of substructure within subspecies (as observed in Suppl. Mat. 3.1 and Suppl. Mat. 3.2). These match the groupings inferred by PCA for central and eastern chimpanzees. Considering the geographic origin of the samples, notwithstanding the fact that this information is not uniform in specificity, suggests that internal groupings follow geography. Central chimpanzees exhibit three main groupings, belonging to Equatorial Guinea, Gabon East and Gabon West. Eastern chimpanzees show a differentiation between DRC-South individuals, Tanzania individuals and those individuals from the rest of the DRC-North, Uganda and Rwanda. For western chimpanzees, in contrast to other methods, we find that Berta is an independent diverging lineage in the western chimpanzee clade, which may suggest greater sub-structure exists in these subspecies. There also appears to be substructure within the bonobos, with a group consisting of Chipita, Kosana and Nataliea, another consisting of Kombote, Desmond, Catherine, Bono, Hermien and Dzeeta, and finally Hortense as an outlying individual.

3.4. PSMC, MSMC2, ancestral effective population size and gene flow The Pair-wise Sequential Markovian Coalescent (PSMC) model is a method which

estimates ancestral effective population size (11). It is a specialisation to a diploid genome of a Markovian approximation to the full sequential coalescent with recombination. The method implements a hidden Markov model (HMM), inferring recombination break points and pair-wise coalescent times of non-recombining segments along the length of the genome using the local density of segregating sites. It uses these coalescent times to infer rates of coalescence at discrete time intervals in the past and hereby determines an effective population size Ne. We converted the sequence data from VCF format into the required input type using scripts available at https://github.com/aylwyn.aostools. At the same time we incorporated the mapability mask described above (Suppl. Mat. 1.2) in order to minimise spurious variant calls. We used discrete temporal binning parameters of -p “4+25*2+4+6”, while the number of iterations was set to 25, the maximum time to most recent common ancestor (TMRCA) was set to 15 (in units of 2N0) and the ratio of theta to rho was 5. These values were

Page 14: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

14 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

previously found to be appropriate for great apes (11). Output was scaled to real time using an autosomal mutation rate and generation time described in the figure legend (Fig. S17).

In the autosomal PSMC analysis (Fig. S17) we see a decline in ancestral Ne from almost the earliest population estimates, lasting to about 2Mya in the ancestors of the chimpanzees and about 600Kya in those of the bonobos. Observe that the ancestral populations sizes of the chimpanzee subspecies track each other closely until approximately 1Mya, after which the trajectories begin to disperse. On the other hand, the ancestral population trajectory of the bonobos seems to depart from the path of the chimpanzees between 3-4Mya. At about 800Kya the ancestors of both the Nigeria-Cameroon and western chimpanzees undergo a significant decrease in Ne, while the ancestors of the central and eastern chimpanzees continue to grow in population size, before starting to decline again at about 300Kya.

Most of the populations exhibit significant humps in ancestral Ne curves at some time points after 200 Kya. The bonobo ancestral Ne has a local maximum at around 150 Kya and the western, central and eastern populations show similar increases over the next 100 Kya. These humps, as with earlier apparent population expansions, might reflect a genuine increase in population size, although we caution that since Ne in this context is a measure of the rate of coalescence, these could also be a result of population substructure (11).

The second multiple sequential Markovian coalescent (MSMC2) model (https://github.com/stschiff/msmc2) is a method which extends PSMC to run on more than two haplotypes. In contrast to MSMC (68), MSMC2 effectively reduces to PSMC when run on a pair of haplotypes (or a diploid genome). When run on more than two haplotypes it constructs a systematic average of each pair-wise analysis of haplotypes, unless otherwise indicated. We use MSMC2 to infer ancestral gene flow between species and subspecies. Li & Durbin (2011) (11) argued that PSMC can informatively be run on a pseudo-diploid genome constructed from a pair of haplotypes selected from two different populations. The inferred “effective population size” is an inverse measure of gene flow between ancestral populations, since it estimates the rate of coalescence between lineages in distinct populations. In other words, if two populations diverged at some time T in the past with no subsequent gene flow between them, there will be no coalescence between lineages in the pair of populations at any time more recently. We expect PSMC to infer an effectively infinite Ne at all times closer to the present than T. On the other hand, at times earlier than T, the inferred historical Ne should correspond to that obtained from a pair of haplotypes selected from the same population, within for example a single individual. In general, this signal will be less clear in histories featuring periods of complex demography, such as when divergence between two populations occurs via an intermediate third population. To limit uncertainty due to phasing, we restrict ourselves to male X chromosomes. We used the capability of MSMC2 to run on multiple haplotypes to apply the above analysis on all the male X chromosomes in pairs of subspecies. We prepared input files as per the requirements of MSMC (which can be found at https://github.com/stschiff/msmc, using Python scripts based on those found at https://github.com/stschiff/msmc-tools. Default time discretisation parameters were used throughout. Pair-wise cross-population comparisons were handled with the -P flag. If we had, for example, two haplotypes from one subspecies and four from another, the ancestral gene flow between subspecies would have been estimated running

Page 15: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

15 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

MSMC2 with -P 0,0,1,1,1,1. In order to test the ability of MSMC2 to infer historic gene flow we also ran it on the output of simulated histories using the coalescent simulator scrm (69). The scrm commands used to simulate the two population histories inferred by fastsimcoal2 (Suppl. Mat. 5.2) are as following (N0=10000):

Model with bonobos, eastern, central and Nigeria-Cameroon chimpanzees:

scrm 16 1 -t 9600 -r 7680 20000000 -I 4 4 4 4 4 0.0 -n 1 0.0742 -n 2 0.3181 -n 3 0.3092 -n 4 0.0386 -m 1 2 0 -m 1 3 0 -m 1 4 0 -m 2 1 0 -m 2 3 1.8181960943074 -m 2 4 0 -m 3 1 0 -m 3 2 2.02290154800773 -m 3 4 0 -m 4 1 0 -m 4 2 0 -m 4 3 0 -en 0.001 1 1.83290809268 -en 0.001 2 1.161030985567 -en 0.001 3 3.66914400056 -en 0.001 4 1.23640124358 -em 0.020875 1 2 0 -em 0.020875 1 3 0 -em 0.020875 1 4 0 -em 0.020875 2 1 0 -em 0.020875 2 3 1.8181960943074 -em 0.020875 2 4 1.12888460726286 -em 0.020875 3 1 0 -em 0.020875 3 2 2.02290154800773 -em 0.020875 3 4 0.514005225416364 -em 0.020875 4 1 0 -em 0.020875 4 2 0.61034918826118 -em 0.020875 4 3 2.77081002950074 -em 0.042025 1 2 0 -em 0.042025 1 3 0.0447270935214584 -em 0.042025 1 4 0.00204350937063846 -em 0.042025 2 1 0 -em 0.042025 2 3 1.8181960943074 -em 0.042025 2 4 1.12888460726286 -em 0.042025 3 1 0.0340892941439601 -em 0.042025 3 2 2.02290154800773 -em 0.042025 3 4 0.514005225416364 -em 0.042025 4 1 0.00878072013784504 -em 0.042025 4 2 0.61034918826118 -em 0.042025 4 3 2.77081002950074 -en 0.104325 2 0.0402577179646081 -en 0.104325 3 0.192594746352967 -en 0.106325 3 8.73162876459514 -ej 0.106325 2 3 -em 0.106325 1 2 0 -em 0.106325 1 3 0.0177338314347154 -em 0.106325 1 4 0.00204350937063846 -em 0.106325 2 1 0 -em 0.106325 2 3 0 -em 0.106325 2 4 0 -em 0.106325 3 1 0.00723425109237692 -em 0.106325 3 2 0 -em 0.106325 3 4 0.193855714034029 -em 0.106325 4 1 0.00878072013784504 -em 0.106325 4 2 0 -em 0.106325 4 3 0.00771007640703268 -en 0.41955 1 0.158405393915496 -en 0.42155 1 0.299481445247702 -en 0.473075 4 0.0306317427630759 -en 0.475075 4 2.79429564470655 -en 0.480625 4 0.0872103733618782 -em 0.480625 1 2 0 -em 0.480625 1 3 0.0177338314347154 -em 0.480625 1 4 0.00204350937063846 -em 0.480625 2 1 0 -em 0.480625 2 3 0 -em 0.480625 2 4 0 -em 0.480625 3 1 0.00723425109237692 -em 0.480625 3 2 0 -em 0.480625 3 4 0.193855714034029 -em 0.480625 4 1 0.00878072013784504 -em 0.480625 4 2 0 -em 0.480625 4 3 0.00771007640703268 -en 0.482625 3 1.66920782430592 -ej 0.482625 4 3 -em 0.482625 1 2 0 -em 0.482625 1 3 0.241282075772286 -em 0.482625 1 4 0 -em 0.482625 2 1 0 -em 0.482625 2 3 0 -em 0.482625 2 4 0 -em 0.482625 3 1 0.0101771164248256 -em 0.482625 3 2 0 -em 0.482625 3 4 0 -em 0.482625 4 1 0 -em 0.482625 4 2 0 -em 0.482625 4 3 0 -en 1.5988 3 0.00336130452736601 -en 1.6008 3 1.47105091660349 -ej 1.6008 1 3 -em 1.6008 1 2 0 -em 1.6008 1 3 0 -em 1.6008 1 4 0 -em 1.6008 2 1 0 -em 1.6008 2 3 0 -em 1.6008 2 4 0 -em 1.6008 3 1 0 -em 1.6008 3 2 0 -em 1.6008 3 4 0 -em 1.6008 4 1 0 -em 1.6008 4 2 0 -em 1.6008 4 3 0

Model with bonobos, eastern, central and Nigeria-Cameroon chimpanzees:

scrm 16 1 -t 9600 -r 7680 20000000 -I 4 4 4 4 4 0.0 -n 1 0.1636 -n 2 0.2168 -n 3 0.30865 -n 4 0.081 -m 1 2 0 -m 1 3 0 -m 1 4 0 -m 2 1 0 -m 2 3 1.26647232666457 -m 2 4 0 -m 3 1 0 -m 3 2 2.26397610393913 -m 3 4 0 -m 4 1 0 -m 4 2 0 -m 4 3 0 -en 0.001125 1 1.57097089776 -en 0.001125 2 1.404710323488 -en 0.001125 3 4.3158739382 -en 0.001125 4 1.0742217327 -em 0.11655 1 2 0 -em 0.11655 1 3 0 -em 0.11655 1 4 0 -em 0.11655 2 1 0 -em 0.11655 2 3 1.26647232666457 -em 0.11655 2 4 0.79048180986804 -em 0.11655 3 1 0 -em 0.11655 3 2 2.26397610393913 -em 0.11655 3 4 0.00440335360192708 -em 0.11655 4 1 0 -em 0.11655 4 2 0.00666829030333492 -em 0.11655 4 3 0.108325487900296 -em 0.1205 1 2 0 -em 0.1205 1 3 0.0404882853109968 -em 0.1205 1 4 0.0060307968152018 -em 0.1205 2 1 0 -em 0.1205 2 3 1.26647232666457 -em 0.1205 2 4 0.79048180986804 -em 0.1205 3 1 0.0688517134577088 -em 0.1205 3 2 2.26397610393913 -em 0.1205 3 4 0.00440335360192708 -em 0.1205 4 1 0.0115754432490942 -em 0.1205 4 2 0.00666829030333492 -em 0.1205 4 3 0.108325487900296 -en 0.168375 2 0.172841376512123 -en 0.168375 3 0.157675036183995 -en 0.1706 3 8.35342687509545 -ej 0.1706 2 3 -em 0.1706 1 2 0 -em 0.1706 1 3 0.077739053851208 -em 0.1706 1 4 0.0060307968152018 -em 0.1706 2 1 0 -em 0.1706 2 3 0 -em 0.1706 2 4 0 -em 0.1706 3 1 0.0101745515682769 -em 0.1706 3 2 0 -em 0.1706 3 4 0.203210242128393 -em 0.1706 4 1 0.0115754432490942 -em 0.1706 4 2 0 -em 0.1706 4 3 1.30532712910343 -en 0.21195 4 0.143861004427068 -en 0.214175 4 0.229328737963366 -en 0.51205 4 0.163429675293312 -em 0.51205 1 2 0 -em 0.51205 1 3 0.077739053851208 -em 0.51205 1 4 0.0060307968152018 -em 0.51205 2 1 0 -em 0.51205 2 3 0 -em 0.51205

Page 16: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

16 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

2 4 0 -em 0.51205 3 1 0.0101745515682769 -em 0.51205 3 2 0 -em 0.51205 3 4 0.203210242128393 -em 0.51205 4 1 0.0115754432490942 -em 0.51205 4 2 0 -em 0.51205 4 3 1.30532712910343 -en 0.5143 3 1.69172360590542 -ej 0.5143 4 3 -em 0.5143 1 2 0 -em 0.5143 1 3 0.00187056444537476 -em 0.5143 1 4 0 -em 0.5143 2 1 0 -em 0.5143 2 3 0 -em 0.5143 2 4 0 -em 0.5143 3 1 0.00144484023958745 -em 0.5143 3 2 0 -em 0.5143 3 4 0 -em 0.5143 4 1 0 -em 0.5143 4 2 0 -em 0.5143 4 3 0 -en 0.57125 1 0.00125055964958021 -en 0.573475 1 2.47905585034123 -en 1.343775 3 0.00527072359478981 -em 1.343775 1 2 0 -em 1.343775 1 3 0.00187056444537476 -em 1.343775 1 4 0 -em 1.343775 2 1 0 -em 1.343775 2 3 0 -em 1.343775 2 4 0 -em 1.343775 3 1 0.00144484023958745 -em 1.343775 3 2 0 -em 1.343775 3 4 0 -em 1.343775 4 1 0 -em 1.343775 4 2 0 -em 1.343775 4 3 0 -en 1.346025 3 1.5950192271541 -ej 1.346025 1 3 -em 1.346025 1 2 0 -em 1.346025 1 3 0 -em 1.346025 1 4 0 -em 1.346025 2 1 0 -em 1.346025 2 3 0 -em 1.346025 2 4 0 -em 1.346025 3 1 0 -em 1.346025 3 2 0 -em 1.346025 3 4 0 -em 1.346025 4 1 0 -em 1.346025 4 2 0 -em 1.346025 4 3 0

Since 2/3 of an X chromosome's history is spent in females, male-mutation bias

causes X chromosomes to have a lower mutation rate than autosomal loci which, on average, divide their histories evenly between males and females. For a given ratio α of male-to-female mutation rates, we determine the X chromosome mutation rate µX using the expression µX = µA (2/3)(2 + α)/(1+ α), where µA is the autosomal mutation rate (70, 71). We could determine alpha using an estimate of the Y chromosome mutation rate mu_Y, so we are able to derive an equivalent expression for µX in terms of µA and µY. However, here we used the expression stated above with the values α = 7.8 and µA = 1.2e-08 per base pair per generation as derived in (72). These determine an X chromosome mutation rate µX = 0.89e-08 per base pair per generation. We used a generation time of 25 years (49). These values for the generation time and autosomal mutation rate were also used to scale the output of the autosomal PSMC analysis.

Using MSMC2 we obtain a direct signal of the divergence between the ancestors of the bonobos and those of the chimpanzees, which suggests that gene flow between the ancestors of the species declined rapidly from around 3.5 Mya and completely ceased by 1.5 Mya)(Fig S18). As expected, we see strong consistency in the gene flow pattern between the ancestors of the bonobos and the chimpanzee subspecies. Looking exclusively at analyses run on the real data, it appears as if gene flow between the ancestors of central and eastern chimpanzees ceases ~150 Kya while between central and Nigeria-Cameroon, and between central and western chimpanzees gene flow stops at around the same time ~500 Kya. This is also the time at which gene flow between the ancestors of the eastern chimpanzees and both the Nigeria-Cameroon and western chimpanzees stops. Overall, the observed curves suggest that ~500 Kya the chimpanzees split into two populations, one of which split at around 100 Kya to become the central and eastern subspecies, the other which split at around 250 Kya to become the western and Nigeria-Cameroon subspecies. These estimates are in very good agreement with those inferred by fastsimcoal2 (Suppl. Mat. 5.2), and we note that they differ substantially from those estimated by PSMC. Given that our SFS-based demographic modelling could not be performed in models with 5 populations, we estimate by MSMC2 that gene flow between western and Nigeria-Cameroon ancestors stop at ~250 Kya (Fig. S23 and Figure 3A). Subsequent to these major divergence events, we do not detect gene flow in any of the cross-population comparisons. However, this is also the case in Fig. S19-23 where we know that the “becn” and “becw” curves are derived from simulated histories which feature

Page 17: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

17 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

post-separation gene flow between chimpanzees and bonobos. Therefore, the hypothesised degree of gene flow in other analyses is most likely undetectable using PSMC/MSMC2.

4. Gene flow within the Pan clade

4.1 Inference of migration with TreeMix Previously, migration between the Pan lineages has been analysed (8), but due to a

smaller sample size and differences in variant calling the results did not conclusively provide evidence for gene flow between bonobos and chimpanzees. For our analysis of migration events within the Pan lineage, we analysed 68 individuals of chimpanzees and bonobos, grouped into subspecies and species. We constructed a maximum likelihood (ML) tree using TREEMIX v. 1.12 (24) accounting for LD by grouping sites in blocks of 1,000 SNPs (-k 1000). For this analysis the inferred ancestor was not used (Suppl. Mat. 1.4), and bonobo was set as root. A round of rearrangements was performed after all populations were added to the tree (-global). Standard errors (-se) and bootstrap replicates (-bootstrap) were used to evaluate the confidence in the inferred tree topology and the weight of migration events. After constructing a ML tree for the Pan lineage, migration events were added (-m) and iterated 50 times for each value of 'm' (1-5) to check for convergence in terms of the likelihood of the model as well as the explained variance in each addition of a migration event. The inferred ML trees and corresponding residuals where visualised with the in-build R script plotting functions in TREEMIX v. 1.12 (24).

The inferred topology of species and subspecies in the Pan lineage without migration is in accordance with previous findings (Fig. S24A). From the residuals in the model, positive standard errors reflect where two pairs of populations are too far apart when fitting the model typology to our data. From a model with no migration 99.87 % of the variance is explained, while the residuals indicate strong candidates for migration by high standard errors, particularly between the Nigerian-Cameroon and eastern subspecies (Fig. S24B). When adding one migration, this is also the event that is being inferred with a migration weight of 0.41 ± 0.007 standard errors (P<2.22x10-308) (Fig. S24C). This is consistent with previous findings of migration between these subspecies (8), which have been suggested to be a historic migration event at a time where the distribution area of these two subspecies might have overlapped. After adding one migration event, the model explains 99.97 percent of the variance in our data, yet the residuals indicate at least two more candidates for improving the model by adding migration events either between the western and the eastern subspecies or between the central chimpanzees and bonobos (Fig. S24D).

In our analysis, we find the best model fit includes the scenario of migration from bonobo to the central chimpanzees with a migration weight of 0.38 ± 0.05 standard errors (P<2.89x10-15), explaining 99.99 % of the variance in the data (Fig. S24E). The model with two migrations gives little indication of A further improvement of the model by adding additional events of migration (m = 3 - 5) could not be observed (Fig. S24F), and both the likelihood and the percentage of explained variance dropped slightly or remained equal to the model including two migration events, and indicated convergence issues between iterations (Fig S25).

Page 18: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

18 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

4.2 Population-wise and individual D-statistics To perform a model-free test of unbalanced allele sharing between chimpanzees and

bonobos, we performed a D-statistic test, following Patterson et al. (2012) (73) on a set of 68 individuals within the Pan lineage. In the four population test of Patterson et al. (2012) (73), populations are denoted W,X,Y,Z and the D-statistic (D) is defined as num/den, where num = (w - x)(y - z) and den = (w + x – 2wx)(y + z – 2yz). The standard error (SE) is calculated as D/Z, where the Z-score is an indicator of the direction of allele sharing. A positive Z-score denotes an excess of allele sharing between W and Y or X and Z, while a negative score shows higher sharing between W and Z or X and Y (73). D-statistics are insensitive to many demographic events (e.g. genetic drift, effective population size changes), but we caution that they may incorrectly infer gene flow in situations with ancestral subdivision (74).

To test whether the D-statistics might be biased depending on alignment to either the chimpanzee reference genome (pantro4) or to the human reference genome (hg19), reads were mapped to the human reference genome (hg19) (Suppl. Mat. 1.2). In the human mapped data we applied an additional set of strict filters (DP > 7, allele balance between 0.3 and 0.7 in heterozygous calls and non-missing data) to avoid potential issues related to human contamination in our samples. This strict filtering resulted in 3,784,850 genotypes of very high quality in all 68 genomes.

We computed a D-statistic test for asymmetries in allele sharing between any chimpanzee subspecies and bonobos using the form D(Chimpanzee population1, Chimpanzee population2; Bonobos, EPO-AA/Human). We find that the ancestral allele inferred from 6-primate EPO alignment introduces a bias to our results, most likely underestimating the amount of derived alleles in western chimpanzees (Suppl. Mat. 1.4, Table S5). However, we note that the observed asymmetry in allele sharing between non-western chimpanzees and bonobos compared to western chimpanzees is also very well supported when using the human allele as an outgroup. In order to avoid such a bias, we used the human allele in all subsequent analyses.

For the non-western individuals, we tested each individual from the eastern, central and Nigerian-Cameroon subspecies (denoted X) in relation to a grouping of western chimpanzees, bonobos, and the ancestral allele (Human). We have excluded the individual Donald from the western subspecies group to evade any potential bias by the shared ancestry to the central subspecies found in this individual. We find a strong signal of imbalanced allele sharing between bonobos and all individuals of the eastern and central subspecies (Fig. S26). In order to provide further evidence for this asymmetry, we computed all possible comparisons of the form D(Chimpanzee1,Chimpanzee2 ; Bonobo1,Human). Our results are consistent across the whole space of comparisons, so that all central-eastern chimpanzees share more alleles with any bonobo than any western chimpanzee does (Fig. S 27). These values are similar to estimates in the Homo clade (14, 15), but we note that bidirectional gene flow, multiple admixture events and deeper divergence times between chimpanzees and bonobos may influence the extent of the D-statistics. On the other hand, D-statistic values depend on demographic parameters in a complex way (75) and should not be interpreted as direct estimates of admixture proportions.

Page 19: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

19 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

In contrast to eastern and central chimpanzees, Nigeria-Cameroon individuals show a different pattern in the amount of alleles shared with bonobos in comparison to western chimpanzees, with some individuals not significantly deviating from zero. Although D- statistics do not provide insights in the directionality of gene flow, these results would support a scenario of indirect spread of bonobo-like alleles from central and eastern chimpanzees to the Nigeria-Cameroon subspecies. Assuming this hypothesis is a true feature of the chimpanzee demographic history, only those Nigeria-Cameroon groups with a history of intense admixture with the central and eastern subspecies may carry a substantial amount of bonobo-like alleles. In Suppl. Mat. 4.1 and Suppl. Mat. 5.2 we show that corridors of migration between central-eastern and Nigeria-Cameroon chimpanzees have existed in the past, supporting the possibility of subsequent spread of these alleles into Nigeria-Cameroon chimpanzees. Unfortunately, we cannot formally test this hypothesis given the scarce information of population structure within Nigeria-Cameroon chimpanzees. Alternatively, if the amount of introgression into Nigeria-Cameroon chimpanzees was small, even small amounts of human contamination may skew the D-statistics (Suppl. Mat. 4.10) in these individuals.

4.3 Frequency-stratified D-statistics (Dj and Djx) We calculated frequency-stratified D statistics (Dj), as described in the Supplementary

information Section 16a in (15), and in (17), to examine the excess of shared derived alleles between non-western chimpanzee subspecies and bonobos over the western subspecies. We computed D statistics binning the data by derived allele frequency intervals in bonobos, using 10 bins of equal size. We observed an increase particularly at sites that are at high frequency in bonobos (Fig. S28) shared with central and eastern chimpanzees, which is expected under a scenario of gene flow from bonobos to these chimpanzee subspecies. In contrast, Nigeria-Cameroon chimpanzees show a different trend, where low frequency sites in bonobos tend to be more shared with Nigeria-Cameroon chimpanzees. Even though such pattern might suggest gene flow from Nigeria-Cameroons into bonobos, we explored the consistency of this observation computing Dj in the chimpanzee mapping using the EPO inferences of ancestral alleles (Fig. S29). While our observations are largely consistent in central and eastern chimpanzees, Nigeria-Cameroon chimpanzees show an opposite trend, with a pattern similar to that found in central and eastern chimpanzees. Since gene flow from Nigeria-Cameroon to bonobos is also not supported by the rest of analyses, this signal might be an artefact.

We also computed D-statistics stratified by both the bonobo and chimpanzee derived allele frequencies (Djx) (Fig. S30). For the calculation of Djx, the derived allele frequency (DAF) was required to be above 0 in only one of two chimpanzee subspecies compared, and chimpanzee frequencies were stratified in three intervals. We reduced the number of intervals dividing the chimpanzee frequency (3 in chimpanzees in comparison to 10 in bonobos) to avoid an abundance of combinations that resulted in zero or very few observations in the data. In Fig. S30, we show each pairwise comparison between chimpanzee subspecies. Each of the subplots comparing chimpanzee subspecies should be interpreted as follows: Combinations of the DAF are represented in a cell, with the bonobo allele frequency on the X axis and the allele frequency in chimpanzees on the Y axis; while the colour of each cell describes the value of the D-statistic. For instance, in the heatmap comparing eastern and

Page 20: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

20 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

western chimpanzees (em), the cell with coordinates em11 (1st row and 1st column) shows the D-statistic for the subset of the data involving: 1) 0 < DAF < 0.1 in bonobos, 2) 0 < DAF > 0.25 in either eastern or western chimpanzees. The colour of em11 reflects how the D-statistic deviates from 0 in a gradient form, where yellow colours reveal higher share of alleles between western chimpanzees and bonobos than between eastern chimpanzees and bonobos, and blue colours vice-versa. Hence, this cell would represent an excess of shared derived alleles at low frequency in bonobos and at high frequency in one chimpanzee population.

Unlike similar estimates in the Homo clade (22), elevated sharing of derived alleles is not only found in sites at low frequency in the recipient subspecies, but also at intermediate frequencies, although at lower numbers. This would be consistent with older admixture events between bonobos and chimpanzees than between modern and archaic humans, where the long time after the gene flow would influence the frequency of introgressed alleles. Over time, drift will lead to a loss or an increase (and eventually fixation) of introgressed alleles, while periods of population bottlenecks might further influence the introgressed allele frequencies. Importantly, the amount of sites at low frequency is 2-10-fold higher than sites at medium frequency, i.e. the majority of shared derived sites is still at low frequency. The observation of shared sites at high frequency in chimpanzees in populations with small effective population sizes (first row of the heatmaps) could also be interpreted as signals of gene flow from chimpanzee to bonobos, especially for those sites at high frequency in chimpanzees and low or intermediate in bonobos. Here, we cannot distinguish between these two scenarios, hence we consider the Djx statistics for high frequencies in chimpanzees to be inconclusive (see simulations in section Suppl. Mat. 4.9 below). However, the number of observed sites at high frequencies is low in all comparisons (<1,000 sites, compared to several 10,000 sites at low frequencies). The comparisons of low and intermediate frequencies provide further evidence of substantial sharing of derived alleles between non-western chimpanzees and bonobos.

4.4 D-statistics in the X chromosome To explore if the excess of allele sharing between non-western chimpanzees and

bonobos is also found in the X chromosome, we applied similar filters as the ones described in Suppl. Mat. 4.2 (DP > 5 and allele balance between 0.3 and 0.7 in heterozygous positions). Additionally, we only kept sites segregating in the 46 female individuals sampled in this study, and any position in the genome where at least 1 male individual carries a heterozygous genotype was excluded. This strict set of filters resulted in 694,628 SNPs. We explored every configuration of the form D(X,Western,Bonobos,Human), with results suggesting that non-western chimpanzees do not share more alleles with bonobos than western chimpanzees in the X chromosome (Fig. S 31). This is in good agreement with similar findings in humans (22), where it has been shown that the X chromosome of non-African populations carry substantially less Neandertal ancestry than the autosomes.

4.5 Comparison to published data The first sequencing of the bonobo genome (18) explored the possibility of gene flow

between chimpanzees and bonobos using divergence and site-pattern based statistics. These analyses did not support any admixture event between the Pan species. In contrast, our

Page 21: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

21 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

findings clearly suggest that the non-western subspecies, especially central and eastern chimpanzees, are genetically closer to bonobos than western chimpanzees. To identify the source of these discrepancies, we analysed the low coverage (1-2-fold) resequencing data from Prüfer et al. (http://www.ebi.ac.uk/ena/data/view/ERP000602; 3 bonobos and 2 western, 7 central and 6 eastern chimpanzees). In addition, we also randomly selected from our data set 5 bonobos (Kosana, Dzeeta, Hermien, Desmond and Natalie) and 2 individuals from each chimpanzee subspecies (Koto, Negrita, Akwaya Jean, Doris, Kidongo, Bwambale, Jimmie and SeppToni), downsampling the coverage to 2-fold to match the average depth in (18).

We tested if low coverage data or variant calling methodologies have an impact on the power to detect admixture. First, we mapped all the data mentioned above to the human reference genome (hg19) using BWA-MEM v0.7.5a-r406 with default parameters. We also removed read duplicates using biobambam (https://github.com/gt1/biobambam) in the alignments of individuals from the present study. In SI10 of (18) it is described how variants were called by randomly sampling reads for every position in the genome with sufficient quality (>30 base and mapping quality) and falling outside of repeat masked regions or CpG islands. We replicated this approach both in the resequencing data from (18) and our downsampled genomes. We find that indeed calling variants under these circumstances does not result in substantial differences in allele sharing among chimpanzee subspecies and bonobos, findings that are in agreement with the conclusions reached in (18) (Fig. S 32). These observations are also seen in the majority of our samples, where we have shown that high coverage genomes of the same individuals support a different conclusion (Suppl. Mat. 4.2). Therefore, the use of low coverage genomes, variant calling methodologies or a combination of both factors certainly diminishes the power to detect gene flow. We also called variants in our data with Freebayes, using the same methodology described in Suppl. Mat. 1.2. We find that the results from D-statistics are very consistent across variant calling methodologies (Fig. S 32), strongly suggesting that low coverage was the strongest limitation to detect admixture in (18).

To further explore the impact of coverage in the D statistic computation, we performed an assisted variant calling in the low coverage genomes. In order to guide the discovery of SNPs, we used a set of 3,784,850 genotypes of very high quality in our high-coverage genomes (Suppl. Mat. 4.2). We extracted all reads mapping to these ~3.5 million sites in the low coverage genomes, obtaining the number of observations for each nucleotide at each position and individual. Only alleles present in our high confidence data were considered to perform the final genotype calling. By restricting the calling to positions with well supported polymorphisms, we substantially reduced the presence of spurious SNPs wrongly inferred due to sequencing errors, contamination and any other issues related to the inherent stochasticity of the original procedure in (18). Indeed, we find that under this scenario, most of the central and eastern chimpanzees share more alleles with bonobos than western chimpanzees in all possible pairwise comparison of the form D(Central or Eastern, Western, Bonobo, hg19) (Fig. S32C). Nevertheless, we found that some comparisons resulted in extreme values on the D-statistic. We investigated this by plotting the combined coverage (sum of coverages for the 3 individuals involved in the computation of the D-statistic) against the D-statistic, colouring by the number of SNPs used in each comparison (Fig. S32C). The

Page 22: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

22 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

results strongly suggest that the uncertainty on the D-statistics is highly influenced by coverage, with values stabilizing around 0.04 with increasing coverage, but exhibiting a large variance at low coverage (Fig. S32).

Finally, differences in joint variant calling and individual variant calling, as well as filtering repeat masked region and copy number variant regions (35) did not influence the high-coverage D-statistics, for example D(central, western; bonobo, human) is 0.0432 (Z-score=5.170) when using the procedure described in Suppl. Mat. 4.2, and 0.0403 (Z-score=4.475) after individual variant calling and masking these repeats and CpG sites.

4.6 Divergence in windows of 50kb The larger proportion of shared derived alleles between non-western chimpanzee

populations and bonobos (Suppl. Mat. 4.2) could either be caused by introgression from bonobos to chimpanzees, from chimpanzees to bonobos or by bi-directional gene flow. The lack of an outgroup to the bonobo lineage complicates the study of gene flow from chimpanzees, since we cannot contrast the results of our analyses to a putatively un-admixed population. Nonetheless, the different pattern on genome-wide D-statistics (Fig. S26 and Table S5) and stratified D-statistics (Fig. S28 and Fig. S30) between western chimpanzees and other chimpanzee subspecies, suggests that western chimpanzees have received substantially less genetic material from bonobos, if any. Under the assumption that introgression from bonobo to non-western chimpanzees occurred, non-western chimpanzee genomes should carry introgressed genetic material from bonobos. Bonobo-like haplotypes in these genomes should show an unusually low divergence to bonobo, coupled with an unusually high divergence to western chimpanzees. Since the rate of introgression is likely to be low, introgressed alleles would be at low/intermediate frequency in the population, thus more often in heterozygous state in the individual genome.

In order to test these expectations, we scanned the genome of chimpanzees and bonobos with 50kb windows following an approach recently used to demonstrate gene flow from modern humans into eastern Neanderthals (17). We used the data mapped to the human reference genome (hg19) and the ancestral state was inferred from the human allele. In order to use an equal amount of samples from each chimpanzee subspecies, we randomly selected 10 individuals from each subspecies. Each window was required to have at least 50% callable sites, being defined as positions in the genome where all individuals analysed have sufficient coverage and quality (Suppl. Mat. 1.2). This resulted in a total of 40,285 windows. In each window, we calculated the divergence to bonobos in those high-quality genotypes with a derived allele frequency ≥0.9 in bonobos as follows: 1) in the absence of reliable information about the phase of the chimpanzee alleles, we choose the alleles within each chimpanzee genome that give the minimum divergence to bonobos. That is, only homozygous ancestral positions count as a difference, while heterozygous sites carrying the “bonobo-like” derived allele do not contribute to the total number of differences; and 2) the total number of differences is divided by the number of high-quality genotypes in each window (callable sites). By using the minimum divergence we facilitated the detection of segments introgressed from bonobos into the chimpanzee genomes.

We first compared the distribution of minimum divergences in each chimpanzee subspecies. We find that western chimpanzees have a consistently higher divergence to

Page 23: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

23 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

bonobo than the other subspecies. While all non-western subspecies show distributions skewed towards lower divergence, central chimpanzees seem to be the population with lowest divergence to bonobo. This is in good agreement with the genome-wide and stratified D-statistics (Fig. S26, Table S5, Fig. S28 and Fig. S30).

We tested whether there are regions in the chimpanzee genomes meeting the two criteria mentioned above: low divergence to bonobo and high divergence to a putatively unadmixed subspecies (western chimpanzees). To do this, we plotted the windows in each chimpanzee subspecies according to their minimum divergence to bonobos against their divergence to the rest of subspecies (in a pair-wise fashion). Divergence estimates of individuals of the same subspecies were averaged to represent a single value for each population. We computed divergence between chimpanzee subspecies as follows: Considering a pair of individuals, one from each population: 1) Homozygous sites for the derived allele in one population and homozygous for the ancestral allele in the other population, count as a difference. Heterozygous sites count as half a difference. 2) The total number of differences is divided by the number of high-quality genotypes in the window. 3) Total divergence of an individual to an external subspecies is the result of averaging all possible pairwise comparisons of that individual to the external subspecies individuals.

Indeed, we find that windows least divergent to bonobos in the genomes of non-western chimpanzees show an elevated divergence to western chimpanzees (Fig. S36), although this difference is less pronounced in Nigeria-Cameroon chimpanzees. Despite the non-significant difference between Nigeria-Cameroon and western chimpanzees, discordance between the two lines can be observed. We note that increasing the sample size for this analysis could help to elucidate the significance of this difference. Similar disparities can be observed when comparing central and Nigeria-Cameroon chimpanzees, although to a much lesser extent. This analysis demonstrates a clustering of shared derived alleles, although very different demographic scenarios might potentially cause similar patterns.

Finally, if the windows least divergent to bonobos in the non-western chimpanzee genomes are due to introgression from bonobos, then we expect these windows to show elevated heterozygosity. The introduction of a bonobo haplotype would lead chimpanzee-specific and bonobo-specific derived alleles both to be in heterozygous state, thus increasing the heterozygosity of these regions compared to other regions of the same divergence. We find this to be the case for the three leftmost bins of the distribution, where windows tend to show higher genetic diversity (Fig. S37). Although higher heterozygosity in the three leftmost bins can also be observed in western chimpanzees, altogether these results support past bonobo introgression mainly or exclusively into the non-western chimpanzee subspecies.

4.7 Putatively introgressed regions in the chimpanzee genomes In order to identify putatively introgressed regions in the different chimpanzee

individuals, we performed a screen for such segments following a strategy recently applied to Neandertals (17). Briefly, we calculated the derived allele frequency in bonobos at sites that are heterozygous in a given chimpanzee, and homozygous ancestral in an outgroup chimpanzee population. That way, we were able to retrieve haplotypes that resemble bonobos on one chimpanzee chromosome, while the other chromosome of the same individual is similar to a different chimpanzee population (a chimpanzee haplotype). We used western

Page 24: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

24 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

chimpanzee as outgroup for non-western chimpanzees and central chimpanzees as outgroup when performing the analysis on western chimpanzees. We used a locally weighted polynomial regression to fit the derived allele frequencies along the chimpanzee genomes, in local windows of 20 sites in which the fitted curve stayed over 0.25. We further selected segments that were longer than 5Kb, in which derived alleles were not further apart than 5Kb, and did not contain incompatible sites (fixed differences between all chimpanzee populations and bonobos).

This approach makes use of the differentiation of chimpanzee and bonobo-like haplotypes in the same individual, based on outgroup populations as “reference panels” for each haplotype. We note that this is also a conservative approach, because it will only reveal introgressed segments in heterozygosity in a given individual, while some segments might be homozygous in a given individual.

We observe almost an order of magnitude more such segments in the genomes of central chimpanzees compared to western chimpanzees (Table S6 and Fig. 4A), and less in eastern and Nigeria-Cameroon chimpanzees. We calculated the callable proportion of the genome as introgressed to be ~2.4% across the central chimpanzee population. However, we do not detect significant differences in the length distribution of these segments. This may be because this strategy selects haplotypes with introgressed features. If there was secondary contact between non-western and western chimpanzees, the small amount of such segments would show similar features. Furthermore, haplotypes might be generally longer in western chimpanzees due to smaller numbers of heterozygous sites. Since the demographic model infers an old age of introgression (likely > 200 kya, Suppl. Mat. 5), recombination would have reduced the length of such introgressed segments over time also in non-western chimpanzee populations. The segments with putatively introgressed features amount to less than 0.25% of the individual genomes in each subspecies, consistent with a scenario of low gene flow, compared to more than 1% in modern humans. However, segments in homozygosity are not assessed here and might harbour more introgressed DNA.

We estimated the chromosome-wise fraction of introgressed segments (Table S7). Interestingly, some chromosomes (2B, 9, 19 and 22) seem to be depleted in bonobo introgression across different chimpanzee subspecies. It is possible that these chromosomes carry alleles that are constitutive for chimpanzees and tolerate less bonobo introgression. Then, these chromosomes would be more often rejected in hybridisation. This phenomenon of “introgression deserts” has been described for introgression from archaic human populations into modern humans (22, 23, 76). On the other hand, different chromosomes (4, 6, 10 and 21) are enriched for introgressed segments.

Across all subspecies, these regions span ~4% of the whole chimpanzee genome. Among the regions found in central chimpanzees, 37% are unique to this subspecies, i.e. they do not overlap with introgressed regions in any other subspecies, compared to 33% in eastern and 31% in Nigeria-Cameroon chimpanzees. This enrichment is significant when comparing central and eastern (P = 0.008; G-test, Benjamini-Hochberg corrected for multiple testing) and central and Nigeria-Cameroon (P = 0.0003) chimpanzees, but not between eastern and Nigeria-Cameroon chimpanzees (P = 0.17). This implies that central chimpanzees are enriched for bonobo-like haplotypes that are unique to this population, consistent with a

Page 25: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

25 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

second pulse of introgression into this subspecies. However, the longest regions after merging those in close proximity in the genome are only slightly longer than 50kb (Table S8).

We performed a liftover to human genome coordinates (hg19), and assessed the genomic features of these regions. It has been demonstrated in previous analyses of Neandertal introgression into modern humans (22) and human introgression into Neandertals (17), that introgressed regions show a decrease in regions of the genome under strong purifying selection. This can be estimated by analysing the amount of background selection in these segments (21). We calculated the average background selection across the segments suggested to be introgressed from bonobo into chimpanzee subspecies, and found their distribution of B scores to be significantly higher than those of 100 sets of random genome regions (P < 2.2x10-16, Wilcoxon rank test). This strong enrichment in neutral regions may indicate that often bonobo alleles were not tolerated on a chimpanzee background in regions under purifying selection, but persisted rather in more neutrally evolving regions of the genome. These observations are very similar to what has been observed on the Homo branch, and support a scenario of gene flow from bonobo to chimpanzee populations. We performed a Gene Ontology enrichment test using the software FUNC (77) to discover potentially enriched categories among the genes contained in these regions (Table S9). We find the most significant enrichment for “positive regulation of actin filament polymerization”, “axon guidance” and “regulation of systemic arterial blood pressure by baroreceptor feedback”, suggesting diverse functional roles of such introgressed segments.

Biases from the use of a reference genome and inferred ancestor from western chimpanzee (Suppl. Mat. 1.4) seem unlikely to have an influence on this analysis, because missing derived mutations in western chimpanzees would not cause an occurrence of heterozygous segments in non-western genomes (and these would not be at low frequency in chimpanzees while at high frequency in bonobos). However, to demonstrate that such a bias does indeed not apply here, we performed the same screen on genotypes from data mapped to the human genome. The results are largely consistent, showing several thousand putatively introgressed haplotypes in central chimpanzees, less in eastern and Nigeria-Cameroon chimpanzees, while these are almost absent in western chimpanzees (Fig S46). Also, the proportion of unique segments in central chimpanzees is significantly higher than in eastern and Nigeria-Cameroon chimpanzees (P < 0.05, G-test, Benjamini-Hochberg corrected for multiple testing). We conclude that when using this method, the use of different reference genomes does not have a systematic effect on the results.

4.8 Estimating the age of introgressed segments using ARGweaver It has been shown previously (17) that the program ARGweaver (25), a Bayesian

method for sampling ancestral recombination graphs (ARGs), can be used to explore signatures of introgression. This program obtains local trees along the genome, which can be scanned for long genomic blocks at which an introgressed lineage is inferred to coalesce within the introgressing population. This way, modern human “African” lineages could be detected in the Altai Neanderthal genome, suggesting an age of this introgression event earlier than 230 Kya (17).

Here, we applied ARGweaver genome-wide to three bonobo genomes (Kosana, Dzeeta, Hermien) and one chimpanzee genomes from each of the four subspecies (Central:

Page 26: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

26 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Doris; Eastern: Bwambale; Nigeria-Cameroon: Koto; Western: SeppToni). We used only this subset of genomes due to computational constraints. Segregating sites from the alignment to the chimpanzee genome (pantro4) were used, with the inferred ancestor as outgroup. Parameters were consistent with those used for human introgression (17): A maximum coalescence time of 100,000 generations, 20 discrete time steps, 5,000 MCMC iterations, and a compression rate of 10 were used, but a constant recombination rate of 1.6x10-8 and mutation rate of 1.25x10-8 per bp per generation. The integration over phase was used for all diploid genomes. The genome was divided into chunks of 5 Mb with 1 Mb overlap, and ARGs were sampled every 40th iteration, starting from the 2,000th iteration.

For each tree in the ARGweaver output, bonobo lineages in the chimpanzee individuals were determined by the time to the most recent common ancestor (TMRCA) in each local tree, as reported previously (17). The smallest TMRCA between the target lineage and all bonobos was required to be smaller than the smallest TMRCA between the target lineage and other chimpanzee lineages, as well as the largest TMRCA between the target lineage and all bonobos (inset in Figure 3B). Such discordant trees are more closely related to bonobo than to other chimpanzees, and fall within the range of bonobo variation. We note that this is a limitation under the assumption of ancient gene flow (substantially more ancient than the coalescence within bonobos), since ancient bonobo lineages might not be present in present-day bonobos. In this case, introgressed segments may fall outside the bonobo subtree, possibly breaking local trees into smaller pieces and reducing the power to detect these regions.

We calculated all pairwise comparisons between chimpanzee individuals, i.e. central vs. western, eastern vs. western, Nigeria-Cameroon vs. western, central vs. eastern, central vs. Nigeria-Cameroon and eastern vs. Nigeria-Cameroon. Adjacent local trees within 100 bp were joined, the divergence between the two chimpanzees was required to exceed 32,000 generations to exclude shallow trees, and more than 95% of MCMC replicates had to support a local genealogy. Finally, segments containing less than 1 segregating site per 1,000 sites were excluded. This provides a set of segments with discordant topologies, called with high confidence across MCMC replicates. We note that even at a low divergence time between bonobos and chimpanzees of 40,000 generations (1 Mya), the expected length of shared sequences due to ancestral polymorphism would be 1,562 bp (3). Hence, the probability of a haplotype of only 20,000 bp having persisted from ancestral polymorphism would be 1.8x10-

9. The age distributions of segments longer than 50,000 bp are shown in Figure 3B. In

each subspecies, including western chimpanzees, discordant segments are inferred, which might be the result of incomplete lineage sorting (17), limited power to infer the age of short segments, or small amounts of gene flow from bonobos into western chimpanzees. However, the genome of the central chimpanzee harbours 4.4 times as many such segments, and these tend to be longer than those in central and eastern than in western and Nigeria-Cameroon chimpanzee (Fig. 4B, Fig. 47, Table S10). The amount and size of shared segments with bonobo is highest in the central chimpanzees, smaller in eastern chimpanzees and Nigeria-Cameroon chimpanzees, and smallest in western chimpanzees. This is in agreement with a scenario of gene flow into the ancestors of central and eastern chimpanzees, subsequent gene

Page 27: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

27 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

flow into the ancestors of central chimpanzees, and possibly to a lower extent into Nigeria-Cameroon chimpanzees.

All non-western chimpanzees show a higher amount of haplotypes inferred at 200-350 Kya and 350-500 Kya. This is consistent with an introgression event into the ancestors of central and eastern chimpanzees before 200 Kya and after their split from western and Nigeria-Cameroon chimpanzees less than 500 Kya. The exact date cannot be estimated, because an introgression event closer to 200 Kya would introduce a distribution of young haplotypes, as previously described (17). Additionally, the central chimpanzee carries a slightly larger fraction of young bonobo haplotypes (< 200 kya) than the other chimpanzee subspecies (Table S10). This lends further support for a scenario of more recent introgression event from bonobos into this population, as inferred by the demographic model (Suppl. Mat. 5.2), which is also supported by the gene flow estimates from genome-wide summary statistics (Suppl. Mat. 4.3), and an excess of introgressed segments specific to this subspecies (Suppl. Mat. 4.7).

A remarkable difference to human introgression into Neanderthals is the near-absence of very young segments (<100kya). This suggests that gene flow at very recent times was probably small, while most of the introgression may be dated further back in time, in strong agreement with gene flow estimates from the demographic model as well as the observation that shared derived alleles segregate not only at low but also intermediate frequencies. Since this is substantially more ancient than gene flow from Neanderthals into modern humans (~65 Kya) and modern humans into Neanderthals (~120 Kya, while the Neanderthal individual died >50 Kya), the length of introgressed haplotypes is shorter in chimpanzees, as expected. Although it is not possible to distinguish sustained gene flow from several single pulses of gene flow, the wide distribution of the excess of haplotype ages in non-western chimpanzees might suggest a longer time of low-level interaction between chimpanzees and bonobos.

In order to test whether the use of the ancestral allele may introduce a bias into the observations presented here (Suppl. Mat. 1.4), we used segregating sites from the alignment to the human genome (hg19), and the human reference allele as outgroup for the same bonobos and different chimpanzees (western: Mike; eastern: Padda; central: Doris; Nigeria-Cameroon: Koto). We used 2,000 MCMC replicates, sampling every 50th iteration, starting from the 200th iteration, with the same parameters as above. Using a divergent reference genome and fewer replicates as well as higher human contamination (Suppl. Mat. 4.10) might increase the noise in this comparison. However, the overall distribution of haplotype ages resembles the distribution obtained from mapping to pantro4 (Fig. S48 and Fig. 4B). The central chimpanzee carries an excess of discordant segments of ~4-fold over western chimpanzee, and these are significantly longer in central and eastern chimpanzees than in western and Nigeria-Cameroon chimpanzees (P < 0.01, Wilcoxon rank test; Benjamini-Hochberg corrected), but not between central and eastern or between western and Nigeria-Cameroon chimpanzees (P > 0.05). We conclude that the results presented here are not caused by the use of the reference genome and outgroup used by ARGweaver.

Page 28: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

28 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

4.9 Simulations support genome-wide observations We tested whether the observations above are expected under the topology of the

demographic scenario inferred using the joint site frequency spectrum (Suppl. Mat. 5). To do so, we used the software ms (79) to simulate windows of 50Kb under four main models: 1) no gene flow, 2) gene flow from bonobos to chimpanzees, 3) gene flow from chimpanzees to bonobos and 4) gene flow from bonobos to chimpanzees and from chimpanzees to bonobos.

Simulation parameters were chosen to be consistent with the SFS-based estimates for divergence times, effective population sizes and rates of gene flow. Only the major events of gene flow between chimpanzee subspecies were kept, as well as the major population bottlenecks inferred by the population modelling, since bottlenecks with a substantial reduction of Ne that can lead to a considerable amount of coalescence events. The mutation rate was set to 0.5x10-9 mutations per bp and year, and generation time to 25 years. We assumed that admixture occurred in one-generation pulses, and happened 240,000 years ago for the bonobo gene flow into the ancestor of central-eastern chimpanzees (1%) and Nigeria-Cameroon (0.5%), and 350,000 years ago for the central-eastern ancestor into the bonobo (1%). We note that recent introgression from bonobos into central chimpanzees was not explored by these simulations. We also simulated admixture at half the amount for each event.

For each of these scenarios, we simulated 40,000 sequence windows of 50Kb. The number of simulated chromosomes was 10 for each of the populations: bonobos, central, eastern, Nigeria-Cameroon and western chimpanzees. The divergence of populations set to 1,800,000 years ago for bonobos and common chimpanzees, 700,000 years ago for western-Nigeria-Cameroon and central-eastern ancestors, 250,000 years ago for western and Nigeria-Cameroon chimpanzees, and 200,000 years ago for central and eastern chimpanzees. Mutation rate was allowed to fluctuate at 15% between window replicates to account for different local mutation rates in the genome.

The full model including all gene flow events was simulated with this setup:

ms 50 1 -seeds X X X-t 2.4 (+/- 15%) -r 2.6 50000.0 -I 5 10 10 10 10 10 -n 1 10.0 -n 2 15.0 -n 3 49.0 -n 4 8.0 -n 5 15.0 -em 0.05 2 3 0.210526315789 -em 0.05 3 2 0.126315789474 -em 1.6 5 2 1.0 -em 1.6 2 5 0.2 -em 1.65 5 2 0 -em 1.65 2 5 0 -em 1.0 2 3 0 -em 1.0 3 2 0 -en 1.975 3 1.8 -en 1.975 2 1.5 -ej 2.0 2 3 -en 2.0 3 160.0 -em 2.3 5 3 2.0 -em 2.3 3 5 0.8 -em 2.35 5 3 0 -em 2.35 3 5 0 -em 2.38 3 1 0.0 -em 2.38025 3 1 0 -em 2.38 5 1 0.0 -em 2.38025 5 1 0 -en 2.475 4 1.17 -ej 2.5 4 5 -en 2.5 5 38.0 -em 3.25 5 3 1.0 -em 3.25 3 5 0.5 -em 3.35 5 3 0 -em 3.35 3 5 0 -em 3.5 1 3 0.0 -em 3.50025 1 3 0 -en 5.0 1 3.1 -en 5.025 1 20.2 -en 6.975 5 3.5 -ej 7.0 5 3 -en 7.0 3 17.5 -em 10.0 1 3 0.03 -en 17.5 3 12.0 -em 17.9 1 3 0 -en 17.975 3 1.0 -ej 18.0 1 3 -en 18.0 3 9.0

Where N0 was set to 1,000. The other models were created by successively removing

gene flow events. We also simulated ancient gene flow 1,000,000 years ago, from the ancestors of

chimpanzees into the ancestors of bonobos, in agreement with the inferences from our population modelling (Suppl. Mat. 5.2). We note that such admixture is difficult to contrast given that chimpanzee subspecies still had not diverged and no outgroup to bonobos is available. In our population modelling, scenarios with old divergence coupled with high gene flow are almost undistinguishable from those with more recent divergence and less gene

Page 29: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

29 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

flow. However, we tested the effect of ancient gene flow between the ancestors of chimpanzees and bonobos to our divergence statistics by removing such an admixture event. We find that besides an overall shift towards higher minimum divergences in all chimpanzee populations, there is little to no difference between both simulations.

We calculated the statistics described in the Suppl. Mat. 4.3 (Dj and Djx) to test if our observations are expected under simulations with gene flow between chimpanzees and bonobos. We observe on average elevated Dj when comparing western to non-western chimpanzees in models with gene flow from bonobos to chimpanzees (Fig. S33). Central-eastern ancestor introgression into bonobos causes a higher sharing of derived alleles at sites at low and moderate frequency in bonobos. The same intensity of gene flow (0.5 %) for this direction of migration results in less pronounced Dj. As expected, models that do not consider migration events between the two species produce Dj=0. Only models including gene flow from bonobos into chimpanzees produce a shift of Dj towards non-western chimpanzees (Fig. S28).

In the Djx analysis we find that introgressed sites from bonobos to chimpanzees are not only segregating at low frequencies, but also at intermediate frequencies in admixed chimpanzee populations (Fig. S30). Simulations without gene flow show a lack of shared derived sites at low frequencies in both chimpanzees and bonobos (Fig. S34), which is expected considering the divergence between these species. Periods of high genetic drift would lead to either the fixation or the loss of sites that were already segregating in the ancestral Pan population and do not result from introgression events. Only simulations with gene flow are scenarios where a considerable amount of sites segregating at low frequencies are found to be shared between both species, and the share of these alleles with bonobos is strongly increased in the non-western subspecies (Fig. S34). Therefore, we find that only models considering gene flow between chimpanzees and bonobos explain the data. However, we note that there are discrepancies between our simulations and the real observations. Notably, we observe that the correlation between effective population sizes and Djx at high frequencies in chimpanzees is substantially more pronounced in our simulations (Fig. S34, first row in each subplot), and in contrast to the real data, it comprises all comparisons with high allele frequencies in chimpanzees across the frequencies in bonobos. This might be due to an underestimation of population bottlenecks in the chimpanzee populations, or because older gene flow than simulated might have led to some alleles segregating to medium and high frequencies.

Signatures of gene flow between chimpanzees and bonobos have been detected before, although such evidence was attributed to high genetic drift in the western subspecies (6, 19). In order to test if intense genetic drift in the form of strong population bottlenecks could reproduce the patterns observed in the real data, we performed simulations in models without gene flow between any chimpanzee subspecies and bonobos. We introduced a bottleneck in the western subspecies 227,500 years ago (coinciding with the split from Nigeria-Cameroon chimpanzees) by largely reducing the effective population size predating the split event 100 generations (-en 2.275 4 0.015 in ms). Such a strong population bottleneck certainly leads to a large amount of coalescence events and a higher genetic drift in the western subspecies. We find that the observed Dj patterns cannot be explained by this scenario (Fig. S35), and we note that D-statistics are known to be robust to changes in

Page 30: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

30 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

effective population sizes and most demographic events happening after the split of the populations being tested (75).

For each of the four possible scenarios we computed the divergence between chimpanzee subspecies and the minimum divergence to bonobos (derived allele frequency ≥0.9). We observe that a model with no gene flow cannot explain the divergence and heterozygosity patterns found in the chimpanzee genomes. We note that the distribution of minimum divergence to bonobos in the simulated data is narrower than the real data (X-axis in Fig. S38-S43). Therefore, windows with extreme divergence values to bonobos (both highly divergent and extremely low divergent segments) were not reproduced by our simulations, which may be a consequence of a greater variation in mutation rate along the real genome. Additionally, the differences in divergence between chimpanzee subspecies found in the real windows most divergent to bonobo were not fully captured by our simulations. We hypothesize that this might be caused by an underestimation of population bottlenecks, especially in the western subspecies, which agrees with our conjectures in the simulations of Dj and Djx. The genetic drift resulting from strong population bottlenecks could produce an effect similar to our observations in the real data, where fixation of alleles would result in increased divergences in both dimensions (minimum divergence to bonobos and divergence to the other chimpanzee subspecies). This idea would be in good agreement to the correlation we observe with effective population size for windows with high divergence to bonobos, which is clearly portrayed by windows least divergent to bonobos (Fig. S36 and Fig. S38-S43). Additionally, we find that only simulations allowing gene flow from bonobos into chimpanzees mimic the increased heterozygosity in the windows least divergent to bonobos (Fig. S44).

Additionally, we tested whether introgression from an unknown archaic species, which split from the Pan clade 2.5 Mya, into western chimpanzees could reproduce our observations in the real data. We find that such an archaic introgression event introduces an excess of haplotypes very divergent to both bonobos and the non-western subspecies (Fig. S45). We also tested whether such an introgression event would reproduce our observations in the real data Dj statistics. We find that such archaic introgression introduces an excess of haplotypes very divergent to both bonobos and the non-western subspecies, but does not create a shift across a distribution of frequencies (Fig. S45).

We also simulated 100 Mb of genomic data using ms in order to demonstrate that ARGweaver should in principle detect the proposed gene flow event. Computational constraints prohibit the use of full genome simulations, but it has been shown in previous work (17) that an excess of young haplotypes could only be detected as the result of gene flow, not of incomplete lineage sorting or gene flow from an unknown population (archaic gene flow). We analysed the simulated data as described above (Suppl. Mat. 4.8), and find that under a model without gene flow, few discordant haplotypes are expected to occur (typically less than ten), while under a model with gene flow we observe an increase in such haplotypes, for example in case of central vs. western chimpanzee, 35 haplotypes compared to four in a simulation without gene flow (Table S11). Finally, simulated gene flow from a population outside the Pan clade does not cause the emergence of discordant haplotypes. This is expected because the discordant putatively introgressed haplotypes are falling close to the

Page 31: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

31 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

variation of bonobos and outside the variation of two chimpanzees, while such archaic introgressed segments would fall outside the variation of both subclades.

Simulations of ten chromosomes of 50 Mbp with the given parameters also demonstrated that without gene flow almost no heterozygous regions with the feature of introgression (Suppl. Mat. 4.7) are expected. Only if gene flow is introduced, such regions with harbour a bonobo-like and a chimpanzee-like haplotype occur at a level similar to what is observed in the real data (Fig. S46B-C). The obtained values are around twice as high, hence it seems reasonable that the simulated 1% of introgression into the ancestor of central and eastern chimpanzees is an overestimate, but the real amount of introgression was smaller than that. This would be in agreement with the small amounts of shared derived sites (Suppl. Mat. 4.2, 4.3) and the model-based inferences (Suppl. Mat. 5).

We conclude that in simulations we cannot reproduce all features of the data simultaneously without gene flow from bonobos into chimpanzees.

4.10 Alternative scenarios. Contamination. Contamination of sequencing libraries by humans handling the samples would introduce mutations that are shared between the contaminated individual and modern humans. This would introduce a bias into D-statistics which use human reference alleles as an outgroup, driven by sites at low coverage with human sequence fragments. Indeed, individual D-statistics show more variation when allowing low-quality sites or using low-coverage data (Suppl. Mat. 4.5). In comparison to low-coverage data from Neandertals, the impact of human contamination would be much higher. The reason is that here the divergence between the contaminant and the tested populations is higher than the divergence between the tested populations (~8Mya vs. ~1.5Mya), which is opposite to the scenario in Neandertals (~200 Kya vs. ~700 Kya). That means that at each site the probability to observe a mutation shared with the outgroup is increased for each individual, which results in increased noised and could possibly confound or even mask the effect of introgression. However, D-statistics calculated from high-quality sites, i.e. sites that are not at low coverage, show an excess of shared alleles between non-western chimpanzees and bonobos compared to western chimpanzees, hence the signal is not caused by such low-quality sites (Suppl. Mat. 4.2). Additionally, we find that low coverage genomes independently sequenced in (18) also show the introgression signal when performing an assisted variant calling using prior information from our population-wise SNPs, strongly suggesting that our observations are not due to human contamination. Genotypes which are potentially the result of contamination are heterozygous sites with single human reads at low coverage where >90% of all other individuals carry the alternative allele. We investigated the number of such sites, revealing 1-38,098 (mean 3,231) sites as imbalanced and potentially contaminated across the samples. However, these samples were treated and sequenced in a similar way, hence certain chimpanzee populations do not systematically contain more such human contamination than others. Importantly, all these sites were excluded from all analyses of gene flow, but otherwise they would be a source of noise. This would especially be the case if statistics are calculated without filtering or in the situation of low coverage where such filtering is not possible. Even after strict filtering, it seems that the eastern chimpanzee Padda does show the lowest values in D-statistics (Fig. S26), and across all non-western chimpanzees those

Page 32: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

32 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

samples with several 1,000 sites indicating contamination seem to be slightly shifted towards zero. This demonstrates that human contamination does influence D-statistics. Furthermore, we apply several methods which use clustering of shared derived alleles (ARGweaver, windowed summary statistics, screening for introgressed segments, see Suppl. Mat. 4.6, 4.7, 4.8), a feature that cannot be explained by a random occurrence of contaminant alleles. Rather, despite the noise introduced by human alleles, we are able to detect these signals of introgression. Finally, the SFS-based demographic model used individuals that showed very low amounts of contamination (< 1,000 sites) except for two western chimpanzee individuals with slightly higher contamination (Jimmy and Alice with 3,300 and 1,700 sites, respectively). Reference bias. The use of the chimpanzee reference genome, which has been assembled from a western chimpanzee individual, and of the EPO-pipeline to infer the ancestral state of alleles, which uses the same reference genome, may introduce a bias resulting in increased allele sharing between non-western Pan groups (Suppl. Mat. 1.4). However, using the human reference genome for mapping and the human reference sequence as outgroup, our analyses deliver results largely consistent with data using the chimpanzee reference genome. Indeed, D-statistics seem to be inflated by using the chimpanzee reference genome (Suppl. Mat. 4.2), while the signal is also observed when using the human reference genome (Suppl. Mat. 4.2, 4.3). Furthermore, ARGweaver shows the same order of magnitude of differences when using the human reference (Suppl. Mat 4.8), and the same is observed in a screen for heterozygous introgressed segments (Suppl. Mat. 4.7). Finally, the demographic model uses the unpolarised site frequency spectra of the populations, hence minimizing putative biases (Suppl. Mat. 5). Incomplete lineage sorting (ILS). ILS, i.e. shared ancestry since the split from the common ancestor of bonobos and chimpanzees, could be a possible explanation for the observed excess of allele sharing. D-statistics might be confounded by non-random mating in the ancestral population (75)(74), a situation where subdivision in the ancestral population can lead to asymmetries in the frequencies of discordant gene trees. In the Pan clade, this scenario would seem unlikely, since the subdivision between lineages should have been maintained over long periods of time (from the split of chimpanzees and bonobos 1.5-2.1 Mya to the split of Nigeria-Cameroon and western chimpanzees 250-300 Kya), and survived two major population splits (central-eastern ancestors and Nigeria-Cameroon-western ancestors 400-650 Kya). Generally, however, shared alleles from ILS would very likely be fixed after a divergence time of more than 1 Mya as the result of drift (80), but we observe that the shared alleles segregate at low and medium frequencies in the non-western chimpanzee populations (Suppl. Mat. 4.3), while they are ancestral in western chimpanzees. The screen for introgressed segments (Suppl. Mat. 4.7) demonstrates the presence of bonobo-like genomic segments at low frequencies in non-western chimpanzees, but it would not be expected that a haplotype of 20 kb is shared after a population split of at least 1 Mya, which has a probability of 1.8x10-9 (78). It seems highly unlikely that haplotypes shared by ILS would be maintained at such a length in the genome, and segregate at low frequency. Using ARGweaver, we observe >600 haplotypes longer than 50 Kb for non-western chimpanzees (Suppl. Mat. 4.8), which is far beyond these theoretical expectations for ILS. Additionally, these segments are inferred to be of young shared ancestry between bonobos and non-western chimpanzees. Furthermore, in the case of modern and archaic humans, ancestral subdivision

Page 33: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

33 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

has been rejected based on analyses of the joint site frequency spectra (81), a very similar approach to our population modelling (Suppl. Mat. 5). As discussed below, models without gene flow between chimpanzees and bonobos cannot reproduce the joint site frequency spectrum, and show statistically lower likelihoods. We conclude that non-random mating in the ancestral species of chimpanzees and bonobos is an unlikely scenario given the evidences we discuss in this study. Different population histories in chimpanzee populations. We performed extensive simulations on different models of chimpanzee population history, accounting for population sizes and bottlenecks, and conclude that these alternative histories are not sufficient to explain the frequency-stratified D-statistics or the window-based summary statistics (Suppl. Mat. 4.9). Furthermore, in the demographic model, the joint site frequency spectrum cannot fully be reproduced without gene flow (Suppl. Mat. 5). We acknowledge that a theoretically infinite parameter space of possible demographic histories exists, each of which might explain certain aspects of the genomic data, but several different lines of evidence strongly favour a model including gene flow between bonobos and chimpanzees. Importantly, none of the models we considered and discussed above could explain simultaneously the features of the data presented here, and no simulation could cause all features of the data without gene flow. Superarchaic admixture. One such model would include admixture into western chimpanzees from a population outside the Pan clade. Such an event has been found to have taken place for the Denisovan, given the remarkable observation that this individual carries significantly less sites that are fixed derived in present-day Africans compared to the Neandertal individual Figure S16.a1 in (15). In simulations, it has been shown that this pattern could not be explained by alternative scenarios other than such archaic gene flow. In our data, we do not find this excess of fixed derived sites (Figure 2B, first row), but instead an increase of shared derived sites increasing with bonobo-frequency, making it unlikely that the allele sharing patterns are caused by such archaic gene flow from an unknown population that branched off earlier. Our simulations, including such an early diverging population, do support this observation for frequency-stratified D-statistics, and also demonstrate that such an event would not cause the patterns in window-based summary statistics (Suppl. Mat. 4.9). Archaic gene flow into western chimpanzee individuals would not make non-western chimpanzee individual haplotypes more closely related to bonobos. The analysis based on window-based statistics (Suppl. Mat. 4.6), the screen for introgressed segments (Suppl. Mat. 4.7), and the ARGweaver analysis (Suppl. Mat. 4.8) all indicate an excess of haplotypes and sites with such a closer relationship of non-western individuals to bonobos. The alternative explanation that these alleles and haplotypes would be old derived alleles (>1Mya) shared between bonobos and non-western chimpanzees due to ILS is unlikely, as they would be expected to become fixed or reach high frequencies in both populations, but instead tend to be almost fixed ancestral in western chimpanzees. Furthermore, these alleles and haplotypes occur at low and medium frequency in non-western chimpanzees (Suppl. Mat. 4.3). Finally, it has been demonstrated before (17) that archaic gene flow in one group does not cause an excess of “young” haplotypes in the other group as inferred by ARGweaver, and that the two categories of haplotypes do not significantly overlap. We explicitly tested this in simulations, and find that superarchaic admixture does not cause the emergence of such “young”

Page 34: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

34 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

haplotypes under the demographic scenario of the Pan clade (Suppl. Mat. 4.8). Hence, we rule out such a scenario of archaic introgression as a main explanation for our observations. However, we cannot completely rule out that events other than the gene flow presented here might have happened additionally, since such events are not exclusive to each other. Positive selection. Another possibility might be positive selection on bonobo-like haplotypes in certain chimpanzee populations, possibly from standing variation in the ancestral variation. Although it appears implausible that such a strong selection signal would maintain hundreds of shared haplotypes across the genome, such an unexpected difference to patterns in humans (82) would be theoretically possible. However, if selection drove these signals, we would expect a high frequency of selected alleles in the non-western populations, since positive selection would increase the fixation rates of these alleles. On the contrary, we find bonobo-like alleles and haplotypes at low and medium frequencies, dispersed across the individual genomes (Suppl. Mat. 4.3, 4.7). Finally, the screen for introgressed segments shows that these segments on average have a significantly lower background selection (P < 2.2x10-16, Wilcoxon rank test) than random regions across the genome (Suppl. Mat. 4.7). This implies that these segments are most likely the result of introgression, which was more often tolerated in neutral regions of the genome than in functional regions, while selection would be expected to play a role rather in functional regions. Furthermore, all the demographic modelling based on the site frequency spectrum was done by filtering the sites to minimize the effects of positive and background selection, and we still find evidence for gene flow between bonobo and common chimpanzees (Suppl. Mat. 5).

5. Demographic modelling and inference based on the Site Frequency Spectrum

5.1 Likelihood inference of demographic models based on the Site Frequency Spectrum. The parameters of alternative demographic scenarios were inferred using the Site

Frequency Spectrum (SFS) (82, 83) by approximating the likelihood of a given model with coalescent simulations (82). All computations were done with an extension of the fastsimcoal2 simulation software (12). Coalescent simulations are performed under specific parameters values θ of a given model to estimate the expected entries of the SFS , and the likelihood is then obtained as

(1)

where X = {m1,…,mn-1} is the observed (multidimensional) SFS, n is the total number of entries in the SFS given by the product of

ˆ ip

1

( 1)d

ii

n n=

= +∏

1

0 01

ˆPr( | ) (1 ) i

nmL S S

full ii

L X P P pθ−

=

= ∝ − ∏

Page 35: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

35 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

where ni is the number of gene copies in the i-th sampled population, S is the number of polymorphic sites, L is the length of the observed sequence data, and P0 is the probability of no mutation on the gene trees, obtained as P0 = e -µT assuming a Poisson distribution of mutations occurring at rate µ, where T is the expected tree length. Note that if the data contains linked SNPs, then this is a composite likelihood estimator. The likelihood function in equation (1) thus accounts for the number of monomorphic sites and all the entries of the SFS. An alternative likelihood function can be obtained by only focus on the entries of the SFS for polymorphic sites, discarding information on the number of monomorphic sites, i.e.

(2)

As described before (12), the likelihood is maximised using a conditional

maximization algorithm (ECM) (84), which is an extension of the EM algorithm where each parameter of the model is maximised in turn, while keeping the other parameters fixed at their last estimated value. We start with initial random parameter values, and perform a series of ECM optimisation cycles until estimated values stabilise or until we have reached a predefined number of ECM cycles (65, unless specified otherwise). We used a strategy where we begin by optimising the full likelihood Lfull (eq. 1) for a given number of cycles (i.e. 25) and then optimise LSFS (eq. 2) for the remaining (40) cycles. This strategy aims at maximising the fit between the expected and the observed SFS. At the end of the run, a rescaling factor is computed as RF = Sobs / Sexp , where Sobs and Sexp are the observed and expected numbers of polymorphic sites, respectively. Sexp is obtained as Sexp = µT̂, with T̂ being the expected tree length for the maximum-likelihood parameters. The final maximum-likelihood parameters are then rescaled by RF in order to produce a number of polymorphic sites equal to those observed: the effective population sizes N’s and times of events T’s (including the age of samples) are multiplied by RF, whereas migration rates m’s are divided by RF, such that Nm parameters are left unchanged.

For our analyses we selected five individuals from each subspecies with the highest depth of coverage (Table S12). To minimize the confounding effects of natural selection on our demographic estimates, we focused on non-functional regions of the genome by discarding genic regions (as defined by Ensembl version 82, September 2015 (85) and CpG islands (as defined on the UCSC platform (86)). We built blocks of 1 Mb of coverage in all selected individuals by concatenating segments where sites could be called in all individuals (i.e., no missing genotypes), given the quality criteria defined previously. We identified 1,084 such blocks on the autosomes. Furthermore, we focused on regions which are probably evolving “neutrally” by filtering sites based on the Genomic Evolutionary Rate Profiling (GERP) scores (86), which quantify the level of evolutionary constraints at a given locus (88). GERP (or RS) scores larger than 2 indicate a substitution deficit, which is expected for sites under selective constraints, whereas GERP scores smaller than -2 could be indicative of accelerated rate of evolution. Thus, to avoid any potential biases that sites under selective constraints or accelerated rate of evolution could introduce in the demographic inferences, we selected sites with GERP scores between -2 and 2. To obtain GERP scores at each site, we

1

1

ˆ i

nm

SFS ii

L p−

=

∝∏

Page 36: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

36 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

converted the panTro4 genomic coordinates into the hg19 system with the liftover tool (35) and extracted the scores using the hg19 coordinates. We kept a total of 750,377,037 sites; 6,206,394 of which carried biallelic SNPs. To avoid any potential biases due to the misspecification of the ancestral state at each locus, we considered the minor allele frequency spectrum (folded site frequency spectrum), where the minor allele was defined using its frequency in the total set of samples.

We used two sets of four populations for our demographic inferences: (i) bonobo, eastern, central, western; and (ii) bonobo, eastern, central, Nigeria-Cameroon. The two datasets are called “western” and “Nigeria-Cameroon”, respectively. The rationale for considering models with only four and not five populations (i.e. four-dimensional SFS) is that i) the joint multidimensional SFS has a tractable size, a total of 114=14,641 entries (instead of 115=161,051 entries for models with five populations), and ii) that the models still have a reasonable number of parameters. Arlequin software v. 3.5.2.2 (44) was used to generate the multidimensional SFS. We inferred confidence intervals for the demographic parameters using a non-parametric block bootstrap approach by re-sampling, with replacement, the 1,084 autosomal blocks of 1 Mb of coverage in all selected individuals. The bootstrap datasets were very similar to the original dataset, both in terms of the cumulative length (defined as the number of sites with GERP scores between -2 and 2), ranging from 760,590,870 to 767,538,060, and the numbers of biallelic SNPs, ranging from 6,264,663 to 6,420,687. To quantify the level of population substructure, we performed an AMOVA analysis including the individual level, treating the measured FIS coefficients as evidence for deviations from Hardy-Weinberg equilibrium due to population substructure (Wahlund effect). These FIS coefficients were used in our models to incorporate the effects of population substructure (see details below).

We aimed to test whether or not there was gene flow between bonobos and chimpanzee subspecies, as well as quantifying the historical levels of gene flow, by explicitly accounting for demographic factors (e.g. bottlenecks and times of split). Thus, these models account for potential confounding factors in other analyses, such as the differential drift levels across populations and incomplete lineage sorting. Since other analyses to detect gene flow and introgressed haplotypes relied on western as an outgroup (Suppl. Mat. 4), we focused on the models with bonobos, central, eastern and western chimpanzees (“western” dataset). We inferred the population split times of a fixed population tree topology, assuming an earlier split between bonobo and chimpanzees, followed by the split of western, and finally the split of central and eastern, i.e. a population tree (bonobo,((central,eastern),western))). We posited that the origin of chimpanzees would be a subspecies with a larger historical effective size (higher genetic diversity), and that populations derived from the origin would show evidence of stronger historical drift (smaller effective sizes) and/or of founder events (bottlenecks). We thus considered models with potential bottlenecks associated with each population split event, mimicking potential founder effects. Also, we allowed for the effective sizes to change, assuming that each branch of the population tree had a specific effective size. Because branches leading to western and bonobo were expected to be much longer than the branches of eastern and central, we allowed for the possibility of a resize and a second potential bottleneck that could happen any

Page 37: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

37 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

time along the bonobo and western branches. We also considered that all bonobo and chimpanzee subspecies experienced a recent population decline, allowing for a population resize at a fixed time of 50 generations ago.

To investigate whether the data supported gene flow with bonobo we evaluated the likelihood of two similar models, only differing by their migration rates with bonobo: (i) no gene flow with bonobo; (ii) continuous gene flow between bonobo and ancestral populations of chimpanzees, and more recently between bonobo and central chimpanzees. Note that under the model (ii) we can predict some bonobo alleles to be shared with other chimpanzee subspecies (and vice-versa), either due to old gene flow into the ancestral chimpanzee populations or due to diffusion of bonobo genes from central chimpanzees into the other populations via gene flow.

To investigate the migration patterns among the chimpanzee subspecies, we considered models with continuous gene flow, assuming that migration could occur until present only between central and eastern subspecies, and that migration between central/eastern and western, as well as between bonobo and central stopped at some point (Fig. S49). We therefore also dated the times when migration stopped between bonobos and common chimpanzees, and between western and the other chimpanzee subspecies (dotted lines). All divergence times were estimated assuming a constant mutation rate of 1.20e-8 per generation per site (71) and a generation time of 25 years (49).

To account for the population sub-structure within bonobos and the chimpanzee subspecies, we assumed that deviations from Hardy-Weinberg equilibrium within each sub-species were due to the Wahlund effect. This was quantified through FIS coefficients, which were fixed to the values estimated for each sub-population with our filtered dataset. We modelled this excess of homozygotes by considering that with a probability equal to FIS any pair of lineages from a given population could coalesce in the previous generation.

We estimated the set of parameters that maximise the likelihood for each model by specifying the search ranges defined in the Table S13. Each ECM optimization run comprised 65 cycles. For the original dataset we performed 100 optimisation runs starting from different initial conditions and selected the run leading to the highest likelihood to get parameter estimates. The expected SFS used for the computation of the likelihood of a given set of parameter was obtained by performing 500,000 coalescent simulations.

We estimated confidence intervals for the model having the maximum likelihood by estimating parameters from 100 bootstrap datasets. Since we only performed two runs per bootstrap dataset due to computational constraints, and to avoid local maxima, we set the initial parameter values to those values that maximised the likelihood with the original dataset, rather than starting the optimisation procedure from random values. The 95% confidence intervals for each parameter were computed based on the percentile method (interval [Q0.025,Q0.975], where Qa is the a percentile of the bootstrap distribution) (89), as implemented in the R boot package.

5.2 Estimates for models with western chimpanzees. The model without gene flow with bonobo showed a much lower likelihood than

models with direct gene flow between bonobo and chimpanzees (Table S14, Fig. S50). The

Page 38: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

38 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

likelihood of the model allowing gene flow between bonobos and ancestors of central chimpanzees is 7,987 log10 units higher than the model without gene flow for models with Western (Table S14, Fig. S50). Because our data contains linked sites, the composite likelihoods cannot be strictly compared via Akaike information criterion (AIC) without computationally prohibitive simulations. However, the distribution of the likelihood values does not overlap (Fig. S50), clearly suggesting that the difference in the log likelihoods among models cannot be explained by the noise of our approximation to the SFS, and hence that the model with continuous gene flow with bonobo (Fig. S51) is much better supported by the data. Furthermore, the model with gene flow is able to reproduce many of the features of the observed SFS (see details below Suppl. Mat. 5.4). The point estimates for the parameters obtained under the two models, as well as the 95% Confidence Intervals (95% CI) obtained under the best model, are shown in Table S15. As expected, for models including gene flow with bonobo, the split times of bonobo and the ancestor of chimpanzees were older than in models without such gene flow.

We considered models with continuous migration between populations, and hence the levels of migration were quantified with the scaled migration rates (2Nm), which can be seen as the average number of immigrants that enter a population each generation. We infer that gene flow between the ancestors of central chimpanzees and bonobos is old (125 kya, 95% CI: 82-146, TMigStopBonobo in Table S15), and relatively low (95% CI 2Nm<0.22, Table S15). When comparing the gene flow levels with bonobos for the more recent time periods, we find larger point estimates for migration involving the ancestors of central and eastern chimpanzees (2Nm ~0.05-0.06), i.e. migration events older than the eastern split (period 2, Table S16). For this period of time, the confidence intervals for the scaled migration rates are relatively narrow, suggesting similar levels of gene flow from bonobos into the ancestors of central and eastern chimpanzees (2Nm 0.06, 95%CI 0.04-0.10), and vice-versa (2Nm 0.05, 95% CI 0.03-0.08). We note that this is not the case for the more recent migration between bonobos and central chimpanzees, whereas the confidence intervals for the scaled migration rates are very wide, suggesting that with our SFS dataset there is a high uncertainty regarding gene flow during this time period. Setting these migration rates to zero affects considerably the fit to the observed SFS, suggesting that including migration rates with bonobo is required better fit the observed SFS (Fig. S53). These results are in agreement with the ARGweaver results, suggesting that the major period of admixture involved gene flow between the ancestral of central and eastern chimpanzees, but a more recent contact between central chimpanzees and bonobos cannot be fully discarded.

5.3 Estimates for models with Nigeria-Cameroon chimpanzees. We further tested whether the evidence for bonobo gene flow was robust to using

Nigeria-Cameroon instead of western as an outgroup to central and eastern chimpanzees. For models with Nigeria-Cameroon the higher likelihood is again found for the model with bonobo gene flow (Table S17, Fig. S52), which is 8,817 log10 likelihood units higher than models without gene flow, and fits reasonably well the observed SFS (see details below in Suppl. Mat. 5.4). The parameter estimates are shown in Tables S18 and S19. The confidence

Page 39: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

39 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

intervals obtained under the best model are relatively wide, but in this case we find higher point estimates for gene flow with Bonobo for the most recent period, with 2Nm values of 0.24 (95% CI 0.07-0.35) for migration from Bonobo into central chimpanzees, and 0.14 (95% CI 0.07-0.29) in the other direction (Table S19). Still these migration events are relatively old, as when using Nigeria-Cameroon we infer migration with bonobos to have stopped 55-96 kya.

We note that there are some differences among the parameter values across the datasets using western or Nigeria-Cameroons (Tables S15 and S18). These can be explained by differences in migration patterns and by the fact that different parameter combinations can attain similar likelihoods. For instance, the amount of drift experienced by a large population that went through a bottleneck could be explained by a combination of parameters suggesting a large ancestral size and a very small size during the bottleneck period (strong bottleneck), or by a long-term small ancestral size without a reduction in size during the bottleneck period. These scenarios can thus lead to similar likelihoods, and distinguish them depends on the intensity of the bottleneck and the amount of data available. We find a similar case when comparing the inferred effective sizes of the ancestral population (Nanc) and the bottleneck intensity associated with the split of common chimps (NBotlSplitCChimp), which are inferred to differ for the dataset with Western and Nigeria-Cameroon. The same amount of drift can also be explained by older split times with larger effective sizes, or more recent split times and smaller effective sizes. Again, this seems to be the case here, as for the models with western chimpanzees we infer slightly larger effective sizes and older split times than for models with Nigeria-Cameroon.

Obtaining a clear picture of the complex demography of bonobos and common chimpanzees would require models with the five populations, which is outside of the scope of this study. Nevertheless, taken together, our results suggest a scenario where bonobo would have split from the ancestral chimpanzee population 1.47-2.11 Mya, followed by limited and continuous gene flow between bonobo and the ancestors of all common chimpanzees. This was followed by the split of western/Nigeria-Cameroon 398-644 kya, and finally the split between central and eastern chimpanzees 85-182 kya. We found that western and Nigeria-Cameroon chimpanzees exhibit signatures of stronger drift (lower effective sizes, and signals of bottlenecks associated with population split event), suggesting that these populations likely went through periods of small effective sizes, such as expected after founder events. Our estimates also suggest a moderate bottleneck associated with the split of eastern chimpanzees. In contrast, we consistently found less evidence for a bottleneck associated with the split event in central chimpanzees, and that central chimpanzees are also estimated to have the highest effective population sizes (NCent 95%CI 35,664-50,515, and 26,420-40,800 for models with western and Nigeria-Cameroon chimpanzees, respectively), suggesting that this was the population that maintained more diversity and experienced less drift. Our estimates suggest that western chimpanzees became isolated from central and eastern chimpanzees ~112 Kya (95% CI 50-123), with very limited migration between central and western chimpanzees (upper 95% CI 2Nm<0.21). In contrast, we infer that Nigeria-Cameroon chimpanzees likely exchanged migrants with central and eastern chimpanzees until very

Page 40: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

40 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

recently, approximately ~15 Kya (95% CI 4-22), with higher migration rates (2Nm 95% CI 0.09-2.35).

The main aim of our modelling approach was to test for gene flow between bonobo and common chimpanzees. In both datasets, we found that models with gene flow between bonobo and central chimpanzees greatly improve the fit to the actual data, reaching much higher likelihoods. Even though the levels of gene flow with bonobos are very low, it suggests that there was historical contact between the two species. Furthermore, we infer that gene flow occurred in both directions between Central chimpanzees and bonobo (2NmBC 0.08-0.35, 0.00-0.17, and 2NmCB 0.07-0.32, 0.00-0.27 for models with Nigeria-Cameroon and western, respectively). All these migration patterns are consistent with the present-day geographic distribution of the species and subspecies, and with the other lines of evidence presented above (Suppl. Matl. 4). Finally, we infer gene flow between bonobos and the ancestors of chimpanzees, indicating that the speciation process might have occurred in the face of gene flow.

5.4 Assessing the fit to the observed SFS To visualise how well our model could reproduce the observed data, we compared the marginal distribution of the observed and expected minor allele SFS (Fig. S55 and Fig. S56). Overall, we have a very good fit between the expected and the observed marginal SFS, suggesting that our model and the corresponding parameter estimates capture relevant aspects of the data. We also looked in more detail at the joint (4 dimensional, 4D) SFS to find the entries that could not be well explained by our model. Overall, even the entries with the worst fit are relatively well predicted (Fig. S57 and Fig. S58). The singletons and doubletons (e.g. entries (0,0,0,1), (0,1,0,0), (2,0,0,0)) were in general the entries showing the worst relative fit. Furthermore, even though these were rare entries, the shared singletons between bonobo, eastern, central and western/Nigeria-Cameroon chimpanzees (entries (1,1,0,0) ,(1,0,1,0), (1,0,0,1)) were difficult to fit, with more than 60% less predicted SNPs under the best models for western and Nigeria-Cameroon chimpanzees than observed (Fig. S57). Such high proportion of shared rare variants can be seen as an indication of gene flow between the two subspecies.

Page 41: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

41 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Figures

Fig. S1

Distribution range of chimpanzee (Pan troglodytes) subspecies across equatorial Africa. Dots represent sampling locations (S1) coloured by subspecies. Blue: P. t. verus. Red: P. t. ellioti. Green: P. t. troglodytes. Orange: P. t. schweinfurthii. Pink: Pan paniscus. Because of the sample origin and the uncertainties related to using confiscated chimpanzees, we have used regional clusters representing the region of the samples with the same origin label (arrows in the map); e.g. all Democratic Republic of Congo (DRC)-South individuals were grouped to a single coordinate placed within the eastern range in southern DRC (Table S1).

Page 42: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

42 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S2

Kinship among samples. The pair-wise kinship coefficient is built on the basis of the proportion of identity-by-state 0 (IBS0) SNPs. Coloured points correspond to different pairwise comparisons between chimpanzees of the same subspecies. None of the individual pairs shows a kinship coefficient higher than 0 threshold for 3rd degree relationships.

−0.5

−0.4

−0.3

−0.2

−0.1

0.0

0.010 0.015 0.020 0.025 0.030 0.035Proportion of IBS0 SNPs

Kins

hip

coef

ficie

nt Subspecies

Pan_troglodytes_ellioti

Pan_troglodytes_schweinfurthii

Pan_troglodytes_troglodytes

Pan_troglodytes_verus

Page 43: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

43 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S3

Time to most recent common ancestor listed in thousand years ago (KYA), with 95% HPD interval given adjacent to nodes for Pan root and bonobo internal.

Basho

SeppToni

Washu

Koby

Alfred

Yogui

Vaillant

Gamin

Koto

Bosco

Mike

Desmond

Vincent

Tongo

Bono

Padda

Donal

Athanga

Damian

Akwaya_Jean

Brigitte

Clint

Bwambale

400 mutations

Central

Eastern

Nigeria-Cameroon

Western

Chimpanzees

Bonobos

1,735 (1,535 - 1,963)

396 (350 - 449)

Page 44: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

44 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S4

Averaged heterozygosity (per bp) along the genomes of each individual from the four subspecies of chimpanzees and bonobos, Donald, the western-central hybrid, was excluded from the analysis.

A926_NatalieA914_HortenseA918_HermienA915_KosanaA924_Chipita

A922_CatherineA925_Bono

A917_DzeetaA923_KomboteA919_Desmond

A991_BertaA993_MikeA992_Annie

X00100_Koby9668_Bosco

ClintA956_JimmieN016_AliceN014_CindyB006_Linda

B005_SeppToniBanyo

KopongoPaquitaBashoDamian

TobiAkwaya_Jean

KotoTawehJulie

B007_CindyN017_Bihati9729_Harriet

100040_Andromeda100037_Vincent

B002_PaddaA912_NakuuB010_Ikuru

B013_AthangaA911_KidongoA996_DianaB014_CocoN019_MayaN013_Tongo

A910_BwambaleB012_Washu

B011_FrederikeN015_CleoN018_Trixie

A990_NoemieA960_ClaraB025_Marlin

12420_GaminA959_JulieA958_Doris12311_UlaB022_Tibe12348_LukyB021_Yogui12320_Lara

B023_Blanquita13656_BrigitteB024_Negrita10964_Cindy11528_Alfred

11352_MirindaA957_Vaillant

0.00050 0.00075 0.00100 0.00125 0.00150

Heterozygosity

PopulationBonoboNigeria−CameroonEasternCentralWestern

Page 45: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

45 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S5

Haplotype sharing among chimpanzee subspecies. Central chimpanzees carry the highest amount of diversity only found within their lineage, followed by eastern and Nigeria-Cameroon, which show similar values, and then western chimpanzees. Colours represent chimpanzee subpecies. Green; central chimpanzee. Orange; eastern chimpanzee. Red; Nigeria-Cameroon chimpanzee. Blue; western chimpanzee.

0

5

10

15

25 50 75Window length (Kb)

Num

ber o

f priv

ate

hapl

otyp

es

CentralEasternNCWestern

0.2

0.3

0.4

0.5

0.6

1 5 50 150Inter−SNP distance (Kb)

mea

n LD

( r2 )

BonoboCentralEasternNCWestern

Page 46: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

46 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S6

Decay of linkage disequilibrium (LD) in the Pan populations. Note that X-axis is in log10 scale. Colours represent: Purple; bonobo. Green; central chimpanzee. Orange; eastern chimpanzee. Red; Nigeria-Cameroon chimpanzee. Blue; western chimpanzee.

0

5

10

15

25 50 75Window length (Kb)

Num

ber o

f priv

ate

hapl

otyp

es

CentralEasternNCWestern

0.2

0.3

0.4

0.5

0.6

1 5 50 150Inter−SNP distance (Kb)

mea

n LD

( r2 )

BonoboCentralEasternNCWestern

Page 47: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

47 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S7

Inferred origin of expansion. Panels show inferred origin based on all data (A), only western chimpanzees (B) and without western chimpanzees (C), respectively. Yellow colors indicate closer fit to origin location, with the 𝑋 marking the most likely origin. Below, the p-value for equilibrium isolation-by-distance and, d1 the distance over which 1% of diversity is lost, are given along with the r2-value for the best-fitting origin. Circle colors denote heterozygosity from low (dark) to high (white).

Page 48: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

48 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S8

PCA plot for all chimpanzee SNP data (22,081,627 SNPs). The four taxonomically recognized chimpanzee subspecies are clearly separated. Colours represent: Green; central chimpanzee. Orange; eastern chimpanzee. Red; Nigeria-Cameroon chimpanzee. Blue; western chimpanzee.

−0.1

0.0

0.1

0.2

0.3

−0.2 −0.1 0.0 0.1PC1 (0.21)

PC2

(0.0

9)

−0.1

0.0

0.1

0.2

−0.1 0.0 0.1 0.2 0.3PC2 (0.09)

PC3

(0.0

6)

SubspeciesWesternNCCentralEastern

Page 49: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

49 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S9

PCA plots for all chimpanzee SNP data within each subspecies. Samples without

know origin are coloured grey (Table S1).

Page 50: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

50 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S10

Population clustering (sNMF), Procrustes-transformed PCA and maps with regional

cluster coordinates for eastern and central chimpanzee samples. Low-coverage samples are represented by squares, while fecal samples are shown as triangles. Samples with subspecies origin only are coloured gray (Table S1).

Page 51: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

51 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S11

 

sNMF individual ancestry in a range of K-values (2-10). Sampled regions (top label) and individual samples (bottom label) reflects region specific genetic structure at different levels of clustering.

DRC−N

orth

Uganda

Rwanda

DRC−South

Tanzania

Gabon−East

Gabon−W

est

Equatorial_Guinea

Nigeria−C

ameroon

West−Africa

K2

K3

K4

K5

K6

K7

K8

K9

K10

Kidongo

Athanga

Ikuru

Bihati

Cindy

Harriet

Nakuu

Bwam

bale

Padda

Tongo

Frederike

Trixie

Washu

Maya

Coco

Cleo

Diana

Vincent

Androm

eda

Vaillant

Clara

Julie

Gam

inMarlin

Doris

Cindy

Luky

Mirinda

Alfred

Noemie

Ula

Lara

Brigitte

Yogui

Tibe

Blanquita

Negrita

Julie

Tobi

Banyo

Jean

Basho

Dam

ian

Kopongo

Koto

Paquita

Tawe

hCindy

Donald

Bosco

Jimmie

Berta

Annie

Mike

SeppToni

Linda

Clint

Alice

Koby

Page 52: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

52 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S12

 

Cross-entropy of the best sNMF runs. K=4 shows the lowest cross-entropy value.

0.6

0.7

0.8

0.9

2 4 6 8 10 12 14K

Cross−entropy

Page 53: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

53 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S13

Population genetic structure inferred with fineSTRUCTURE at the most likely level of clustering (K=32). Colour legend (to the right) shows geographical region of the sampled individuals (bottom label).

A911_Kidongo

B010_Ikuru

B013_Athanga

N017_Bihati

A910_Bwambale

A912_Nakuu

B007_Cindy

N018_Trixie

B011_Frederike

N013_Tongo

9729_Harriet

B002_Padda

N015_Cleo

B012_Washu

N019_Maya

A996_Diana

B014_Coco

100037_Vincent

100040_Andromeda

10964_Cindy

12348_Luky

12311_Ula

B021_Yogui

B022_Tibe

11352_Mirinda

12320_Lara

11528_Alfred

13656_Brigitte

B023_Blanquita

B024_Negrita

A990_Noemie

A958_Doris

B025_Marlin

A959_Julie

A960_Clara

12420_Gamin

A957_Vaillant

9730_Donald

9668_Bosco

A991_Berta

N016_Alice

X00100_Koby

B005_SeppToni

N014_Cindy

A956_Jimmie

B006_Linda

Clint

A992_Annie

A993_Mike

Damian

Taweh

Akwaya_Jean

Koto

Kopongo

Paquita

Basho

Banyo

Julie

Tobi

Origins

DRC−N

Uganda

Rwanda

DRC−S

Tanzania

Equatorial−Guinea

Gabon−W

Gabon−E

Donald

West−Africa

Ivory−Coast

Liberia

Guinea

Nigeria-Cameroon

Julie-Tobi-Banyo

Page 54: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

54 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S14

Shared ancestry across the Pan lineage. Proportions of shared ancestry between individuals of chimpanzee subspecies and bonobo. Individual names are shown below vertically, subspecies origin is shown horizontally. Only K-values for which convergence could be met between 100 iterations with random seeds are shown. Individuals are order from left to right in accordance to a West to East geographical sample origin (North to South for eastern individuals).

Page 55: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

55 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S15

A neighbour-joining (NJ) tree based on individual pairwise divergence of autosomal loci indicates that each sub-species is mono-phyletic and that the topology is consistent with the known species-tree (i.e. the pairs western:Nigeria-Cameroon and central:eastern share a most recent common ancestor). Interestingly W_Donald_CB is an outlier, agreeing with previous analysis of this hybrid individual. Origin: CB (captive born), D (Democratic Republic of the Congo: DRC), D-C (DRC central), D-N (DRC North), D-S (DRC South), EQ (Equatorial Guinea), G (Gabon), G-E (Gabon East), G-W (Gabon West), GU (Guinea), IC (Ivory Coast), L (Liberia), NA (Unknown), R (Rwanda), T-G (Tanzania-Gombe), U-W (Uganda West), WC (western Cameroon), Z (Zambia).

Page 56: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

56 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S16

A NJ tree based on individual FST rooted using the known sub-species phylogeny. The longest branch lengths are found for western individuals, and this is an indication of the high degree of genetic drift (most likely low Ne) in the western chimpanzee lineage. Again W_Donald_CB is an outlier, agreeing with previous analysis of this hybrid individual. The naming at terminal nodes follows that given for Figure S15.

Page 57: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

57 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S17

Combined PSMC on individual autosomal sequence data. Generation time is set to 25 years and mutation rate to 1.2e-8.

Page 58: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

58 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S18

Estimate of bonobo-chimpanzee divergence time using MSMC2 on male X chromosomes. Generation time is set to 25 years and mutation rate to 0.89e-8.

Page 59: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

59 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S19

Bonobo – central MSMC2 comparison on male X chromosomes. Blue curves portrays results from analyses on real sequence data, while the red and green curves are from analyses done on simulated data under the histories with Nigeria-Cameroon chimpanzees (“becn”), and the western chimpanzees (“becw”), respectively. Generation time is set to 25 years and mutation rate to 0.89e-8.

Page 60: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

60 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S20

Bonobo – eastern MSMC2 comparison on male X chromosomes. Blue curves portrays results from analyses on real sequence data, while the red and green curves are from analyses done on simulated data under the histories with Nigeria-Cameroon chimpanzees (“becn”), and the western chimpanzees (“becw”), respectively. Generation time is set to 25 years and mutation rate to 0.89e-8.

Page 61: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

61 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S21

Bonobo – Nigeria-Cameroon MSMC2 comparison on male X chromosomes. Blue curves portrays results from analyses on real sequence data, while the red curve is from the analysis done on simulated data under the history with Nigeria-Cameroon chimpanzees (“becn”). Generation time is set to 25 years and mutation rate to 0.89e-8.

Page 62: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

62 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S22

Bonobo – western MSMC2 comparison on male X chromosomes. Blue curves portrays results from analyses on real sequence data, while the green curve is from the analysis done on simulated data under the history with the western chimpanzees (“becw”). Generation time is set to 25 years and mutation rate to 0.89e-8

Page 63: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

63 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S23

Western – Nigeria-Cameroon MSMC2 comparison on male X chromosomes. Blue curve illustrates gene flow between western chimpanzee and Nigeria-Cameroon ancestors. Generation time is set to 25 years and mutation rate to 0.89e-8.

Page 64: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

64 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S24

Migration events within the Pan lineage estimated by means of TreeMix. Maximum likelihood (ML) trees and corresponding model residuals for significant migration events. Pp: bonobo (n=10), Ptv: western (n=11), Pte: Nigerian-Cameroon (n=10), Ptt: central (n=18), Pts: eastern (n=19). A) ML tree with no migrations explaining 99.87 % of the variance. B) Residuals of the model fit showing high positive standard error (SE) values between the Nigerian-Cameroon and eastern chimpanzee subspecies, indicating the most likely candidate for migration event to improve the model. C) ML tree with a significant migration event from the eastern to the Nigerian-Cameroon subspecies with weight 0.41 ± 0.007 standard errors (P<2.22x10-308) and the model explaining 99.97 % of the variance in the data. D) Residuals of the model fit with one migration event. E) ML tree with adding a significant migration from bonobo to the central chimpanzee subspecies with weight 0.38 ± 0.05 standard errors (P<2.89x10-15) and a full model explaining 99.99 % of the variance in the data. F) Residuals for the model with two migrations.

Page 65: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

65 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S25

Explained variance and likelihood of the model including migrations. The explained variance (left) and likelihood (right) for each of 50 replicates of ML trees with numbers of inferred migration in the range of 1-5. Both variables show an increasingly better fit of the model with one and two migration events compared to no migration, but not with more migrations.

Page 66: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

66 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S26

Test of allele sharing differences between bonobos and all individuals of the eastern, central, and Nigerian-Cameroon subspecies. Donald was excluded from the grouping of western subspecies to evade possible bias from recent admixture in this individual. Error bars represent 2 standard errors.

B025_MarlinB024_Negrita

B023_BlanquitaB022_TibeB021_Yogui

A990_NoemieA960_ClaraA959_JulieA958_Doris

A957_Vaillant13656_Brigitte12420_Gamin12348_Luky12320_Lara12311_Ula

11528_Alfred11352_Mirinda10964_CindyN019_MayaN017_BihatiN015_Cleo

N013_TongoB014_Coco

B013_AthangaB012_Washu

B011_FrederikeB007_CindyB002_PaddaA912_Nakuu

A911_KidongoA910_Bwambale

9729_Harriet100040_Andromeda

100037_VincentN018_TrixieB010_IkuruA996_Diana

TobiTawehPaquita

KotoKopongo

JulieDamianBashoBanyo

Akwaya_Jean

−0.08 −0.04 0.00 0.04 0.08D−statistic

X

SubspeciesNigeria−Cameroon

Eastern

Central

D(X,Western;Bonobo,Human)

Page 67: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

67 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S 27

All possible configurations of the test for admixture D(Chimpanzee1,Chimpanzee2,Bonobo1,Human). Each dot represents one comparison, with lines showing 1 standard error.

Eastern−Central

NC−Central

NC−Eastern

Western−Central

Western−Eastern

Western−NC

−0.06 −0.03 0.00 0.03 0.06Dstatistic

D(X,Y,Bonobo,Human)

Page 68: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

68 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S28

Frequency-stratified summary statistics (Dj). Dj statistics were calculated using all sites at a given allele frequency in bonobos. The human allele was used to infer the ancestral allele. Labels in the top and bottom of the panels show the populations compared and reflect the configuration that was used to compute the statistics (e.g. in the second subplot negative values reveal a higher share of alleles between central chimpanzees and bonobos than between Nigeria-Cameroon chimpanzees and bonobos). NC denote Nigeria-Cameroon chimpanzees.

Page 69: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

69 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S29

Frequency-stratified summary statistics (Dj). Dj statistics were calculated using all sites at a given allele frequency in bonobos. The ancestral allele was retrieved from the 6-primate EPO alignment. Labels in the top and bottom of the panels show the populations compared and reflect the configuration that was used to compute the statistics. NC denote Nigeria-Cameroon chimpanzees.

−0.2

0.0

0.2

0.25 0.5 0.75 1

Dj

Central

Western

Central

NC

Central

Eastern

Bonobo derived allele frequency

Eastern

NC

Eastern

Western

NC

Western

Bonobo derived allele frequency

Dj

−0.2

0.0

0.2

0.25 0.5 0.75 1 0.25 0.5 0.75 1

0.25 0.5 0.75 1 0.25 0.5 0.75 1 0.25 0.5 0.75 1

Page 70: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

70 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S30

Djx, frequency-stratified summary statistics. Statistics were calculated using all sites at a given allele frequency in bonobos and chimpanzees, using the human reference as ancestral allele. D statistics are coloured by intensity of the excess of shared alleles from bonobo with one chimpanzee population over the other chimpanzee population.

Eastern NigCam

Bonobo frequency

Chi

mpa

nzee

freq

uenc

y

<0.25

0.25 0.75

0.75 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.0 0.5 0.0 0.5 1.0 Central NigCam

Bonobo frequency

Chi

mpa

nzee

freq

uenc

y

<0.25

0.25 0.75

0.75 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.0 0.5 0.0 0.5 1.0

Central Eastern

Bonobo frequency

Chi

mpa

nzee

freq

uenc

y

<0.25

0.25 0.75

0.75 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.0 0.5 0.0 0.5 1.0 Western Eastern

Bonobo frequency

Chi

mpa

nzee

freq

uenc

y

<0.25

0.25 0.75

0.75 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.0 0.5 0.0 0.5 1.0

Western NigCam

Bonobo frequency

Chi

mpa

nzee

freq

uenc

y

<0.25

0.25 0.75

0.75 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.0 0.5 0.0 0.5 1.0 Western Central

Bonobo frequency

Chi

mpa

nzee

freq

uenc

y

<0.25

0.25 0.75

0.75 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1.0 0.5 0.0 0.5 1.0

Page 71: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

71 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S 31

Test of allele sharing asymmetries on the X chromosome between non-western chimpanzees and bonobos compared to western chimpanzees. Only female individuals were used. Error bars represent 2 standard errors.

12311_Ula

10964_Cindy

11352_Mirinda

12320_Lara

12348_Luky

A958_Doris

A959_Julie

A960_Clara

A990_Noemie

B007_Cindy

B023_Blanquita

B024_Negrita

B025_Marlin

B014_Coco

N015_Cleo

N019_Maya

100040_Andromeda

9729_Harriet

A911_Kidongo

A912_Nakuu

A996_Diana

B010_Ikuru

B011_Frederike

N017_Bihati

N018_Trixie

Tobi

Banyo

Julie

Taweh

Kopongo

Paquita

B022_Tibe

−0.1 0.0 0.1D−statistic

X

Subspecies

Central

Eastern

Nigeria-Cameroon

D(X,Western;Bonobo,Human)

Page 72: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

72 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S 32

Comparison of the D-statistic performance depending on variant calling methodologies and depth of coverage. A) D-statistic based on genotypes obtained through the allele sampling-based variant calling in sequences from (18). B) D-statistic for different coverage and variant calling procedures in our data. Low coverage diminishes the power to detect admixture. C) D-statistic at different levels of coverage for the assisted variant discovery in low coverage genomes. Each dot represents an individual comparison of the form D(Central or Eastern, Western; Bonobo, hg19). Note that all low coverage genomes (from (18) and from the present study) were analysed jointly, so that most comparisons include individuals from different studies. Only at a coverage >6 across the three Pan samples stable values around 0.04 are obtained.

Agnagui

Bayokele

Becky

Botsomi

Cindy

FanTuek

Gao

Golfi

Katie

Kazakuhire

Kidogo

Marcelle

Nakuu

Sally

−0.04 0.00 0.04Dstatistic

X

Method

Allele sampling

D(X,Western;Bonobo,Human)

A910_Bwambale

A911_Kidongo

A958_Doris

Akwaya_Jean

B024_Negrita

Koto

−0.04 0.00 0.04Dstatistic

X

Method

Allele sampling − low coverage

Freebayes − low coverage

Freebayes − high coverage

D(X,Western;Bonobo,Human)

A

B

−0.05

0.00

0.05

0.10

3 6 9

Combined coverage

Dsta

tistic

1e+06

2e+06

3e+06

Number of SNPs

D(X,Western;Bonobo,Human)

C

Page 73: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

73 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S33

Dj, frequency stratified summary statistics on simulated data. Statistics were calculated using all sites at a given frequency in the bonobo population. Top labels describe the populations compared, following the configuration D(First population, Second population; Bonobo, Ancestor). A) Simulated data for 0.5% gene flow from bonobos to central-eastern ancestor, 0.5% gene flow from the central-eastern chimpanzee ancestor to bonobos, 0.5% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.25% gene flow from bonobos to Nigeria-Cameroon (NC) chimpanzees. B) Simulated data for 0.5% gene flow from central-eastern chimpanzee ancestor to bonobos. C) Simulated data for 0.5% gene flow from bonobos to central-eastern ancestor and 0.25% gene flow from bonobos to Nigeria-Cameroon chimpanzees. D) Simulated data with no gene flow between chimpanzees and bonobos.

Central−Eastern Central−NC Central−Western Eastern−NC Eastern−Western NC−Western

−0.2

−0.1

0.0

0.1

0.2

−0.2

−0.1

0.0

0.1

0.2

−0.2

−0.1

0.0

0.1

0.2

−0.2

−0.1

0.0

0.1

0.2

0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00

Bonobo derived allele frequency

D s

tatis

tic

Bonobo derived allele frequency

A

B

C

D

Page 74: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

74 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S34

Djx, frequency stratified summary statistics on simulated data. Statistics were calculated using all sites at a given frequency in the bonobo and chimpanzee populations. Computations involving less than 100 occurrences in the simulated data were excluded from this analysis (gray cells in the plot). A) Simulated data for 0.5% gene flow from bonobos to central-eastern ancestor, 0.5% gene flow from the central-eastern chimpanzee ancestor to bonobos, 0.5% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.25% gene flow from bonobos to Nigeria-Cameroon (NC) chimpanzees. B) Simulated data for 0.5% gene flow from bonobos to central-eastern ancestor and 0.25% gene flow from bonobos to Nigeria-Cameroon chimpanzees. C) Simulated data for 0.5% gene flow from central-eastern chimpanzee ancestor to bonobos. D) Simulated data with no gene flow between chimpanzees and bonobos.

Page 75: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

75 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S35

Dj, frequency stratified summary statistics on simulated data without gene flow from bonobos to chimpanzees. Statistics were calculated using all sites at a given frequency in the bonobo population. High genetic drift in the western subspecies was simulated by a large reduction of the effective population size 100 generations before the split of western and Nigeria-Cameroon (NC) chimpanzees (275 Kya).

Central−Eastern Central−NC Central−Western Eastern−NC Eastern−Western NC−Western

−0.4

−0.2

0.0

0.2

0.4

2.5 5.0 7.5 10.0 2.5 5.0 7.5 10.0 2.5 5.0 7.5 10.0 2.5 5.0 7.5 10.0 2.5 5.0 7.5 10.0 2.5 5.0 7.5 10.0Frequency

Dj

Page 76: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

76 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S36

Windows of 50Kb across the chimpanzee genomes binned by their minimum divergence to bonobos using derived alleles at ≥90% frequency. Blue curves represent the western chimpanzees, red the Nigeria-Cameroon, green the western and orange the eastern. X-axis is the minimum divergence to bonobos in log10 scale, while the y-axis represents the divergence to the other chimpanzee population in each pairwise comparison. Both axes are constant in all plots, thus labels are grouped and shown only in the top and bottom of the panel. Confidence intervals (95%) are shown from 500 bootstrap replicates.

−2.65

−2.60

−2.55

−2.50

−2.45

−3.86−3.73−3.60−3.47−3.34−3.21−3.09−2.96−2.83−2.70−2.57

Central

Western−2.70

−2.65

−2.60

−2.55

−2.50

−3.86−3.73−3.60−3.47−3.34−3.21−3.09−2.96−2.83−2.70−2.57

Central

Eastern

−2.70

−2.65

−2.60

−2.55

−2.50

−2.45

−3.86−3.73−3.60−3.47−3.34−3.21−3.09−2.96−2.83−2.70−2.57

Eastern

Western−2.80

−2.75

−2.70

−2.65

−2.60

−2.55

−3.86−3.73−3.60−3.47−3.34−3.21−3.09−2.96−2.83−2.70−2.57

NC

Western

−2.70

−2.65

−2.60

−2.55

−2.50

−3.86−3.73−3.60−3.47−3.34−3.21−3.09−2.96−2.83−2.70−2.57

Eastern

NC−2.65

−2.60

−2.55

−2.50

−2.45

−3.86−3.73−3.60−3.47−3.34−3.21−3.09−2.96−2.83−2.70−2.57

Central

NC

Minimum divergence to bonobos(log10)

Max

imum

div

erge

nceb

etw

een

chim

panz

ee s

ubsp

ecie

s (lo

g 10)

Page 77: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

77 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S37

Heterozygosity per bp binned by the minimum divergence to bonobos in windows of chimpanzee genomes. The y-axis shows the average number of heterozygous sites per bp across the windows of a bin. Confidence intervals (95%) are shown from 500 bootstrap replicates. Compare to Fig. S44, where non-western chimpanzees do not show an increase in heterozygosity in windows with low divergence to bonobos.

0.0006

0.0009

0.0012

0.0015

−3.5 −3.0 −2.5

Minimum divergence to bonobos (log10

)

Hete

rozygosity (

per

bp)

Page 78: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

78 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S38

All comparisons between central and western chimpanzees for the observed and simulated windows of 50kb. Windows are binned by their minimum divergence to bonobos using derived alleles at ≥90% frequency in bonobos. Both axes are constant in all the subplots in the panel (x-axis described in the label at the bottom, y-axis described in the label at the left). A) Real data. B) Simulated data for 1% gene flow from bonobos to central-eastern ancestor, 1% gene flow from the central-eastern chimpanzee ancestor to bonobos, 1% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. C) Simulated data for 1% gene flow from bonobos to central-eastern ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. D) Simulated data for 1% gene flow from central-eastern chimpanzee ancestor to bonobos. E) Simulated data with no gene flow between chimpanzees and bonobos. Confidence intervals (95%) are shown from 500 bootstrap replicates.

� ��

��

��

� ��

��

−2.65

−2.60

−2.55

−2.50

−2.45

−3.86 −3.73 −3.60 −3.47 −3.34 −3.21 −3.09 −2.96 −2.83 −2.70 −2.57

Central

Western

�� � � � �

��

��

� � � �� �

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

Western

� � � � ��

��

��

� � � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

Western

� � � � � ��

��

� �

� � � � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

Western

� � � � � ��

��

��

� � � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.73 −3.66 −3.58 −3.51 −3.44 −3.36 −3.29 −3.21 −3.14 −3.07 −2.99

Central

Western

Minimum divergence to bonobos (log10)

Max

imum

div

erge

nce

betw

een

chim

panz

ee s

ubsp

ecie

s (lo

g 10)

A

B

C

D

E

Page 79: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

79 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S39

All comparisons between eastern and western chimpanzees for observed and simulated windows of 50kb. Windows are binned by their minimum divergence to bonobos using derived alleles at ≥90% frequency in bonobos. Both axes are constant in all the subplots in the panel (x-axis described in the label at the bottom, y-axis described in the label at the left). A) Real data. B) Simulated data for 1% gene flow from bonobos to central-eastern ancestor, 1% gene flow from the central-eastern chimpanzee ancestor to bonobos, 1% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. C) Simulated data for 1% gene flow from bonobos to central-eastern ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. D) Simulated data for 1% gene flow from central-eastern chimpanzee ancestor to bonobos. E) Simulated data with no gene flow between chimpanzees and bonobos. Confidence intervals (95%) are shown from 500 bootstrap replicates.

� ��

��

��

�� �

� ��

��

−2.70

−2.65

−2.60

−2.55

−2.50

−2.45

−3.86 −3.73 −3.60 −3.47 −3.34 −3.21 −3.09 −2.96 −2.83 −2.70 −2.57

Eastern

Western

�� � � � �

��

��

� � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Eastern

Western

� �� � �

� ��

��

� � � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Eastern

Western

� � � � � ��

��

� �

� � � � ��

��

� �

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Eastern

Western

� � � �� �

��

��

� � � � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.73 −3.66 −3.58 −3.51 −3.44 −3.36 −3.29 −3.21 −3.14 −3.07 −2.99

Eastern

Western

Minimum divergence to bonobos (log10)

Max

imum

div

erge

nce

betw

een

chim

panz

ee s

ubsp

ecie

s (lo

g 10)

A

B

C

D

E

Page 80: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

80 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S40

All comparisons between Nigeria-Cameroon and western chimpanzees for the observed and simulated windows of 50kb. Windows are binned by their minimum divergence to bonobos using derived alleles at ≥90% frequency in bonobos. Both axes are constant in all the subplots in the panel (x-axis described in the label at the bottom, y-axis described in the label at the left). A) Real data. B) Simulated data for 1% gene flow from bonobos to central-eastern ancestor, 1% gene flow from the central-eastern chimpanzee ancestor to bonobos, 1% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. C) Simulated data for 1% gene flow from bonobos to central-eastern ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. D) Simulated data for 1% gene flow from central-eastern chimpanzee ancestor to bonobos. E) Simulated data with no gene flow between chimpanzees and bonobos. Confidence intervals (95%) are shown from 500 bootstrap replicates.

� �

��

��

�� �

� ��

��

� �

−2.80

−2.75

−2.70

−2.65

−2.60

−2.55

−3.86 −3.73 −3.60 −3.47 −3.34 −3.21 −3.09 −2.96 −2.83 −2.70 −2.57

NC

Western

� � � �� �

��

��

�� � � � �

��

��

−2.85

−2.80

−2.75

−2.70

−2.65

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

NC

Western

� � �� �

��

��

��

�� � � �

� ��

��

−2.85

−2.80

−2.75

−2.70

−2.65

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

NC

Western

� � � � � ��

��

��

� � � � � ��

��

��

−2.85

−2.80

−2.75

−2.70

−2.65

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

NC

Western

�� � � � �

��

��

� � � � � ��

��

��

−2.85

−2.80

−2.75

−2.70

−2.65

−3.73 −3.66 −3.58 −3.51 −3.44 −3.36 −3.29 −3.21 −3.14 −3.07 −2.99

NC

Western

Minimum divergence to bonobos (log10)

Max

imum

div

erge

nce

betw

een

chim

panz

ee s

ubsp

ecie

s (lo

g 10)

A

B

C

D

E

Page 81: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

81 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S41

All comparisons between central and eastern chimpanzees for the observed and simulated windows of 50kb. Windows are binned by their minimum divergence to bonobos using derived alleles at ≥90% frequency in bonobos. Both axes are constant in all the subplots in the panel (x-axis described in the label at the bottom, y-axis described in the label at the left). A) Real data. B) Simulated data for 1% gene flow from bonobos to central-eastern ancestor, 1% gene flow from the central-eastern chimpanzee ancestor to bonobos, 1% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. C) Simulated data for 1% gene flow from bonobos to central-eastern ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. D) Simulated data for 1% gene flow from central-eastern chimpanzee ancestor to bonobos. E) Simulated data with no gene flow between chimpanzees and bonobos. Confidence intervals (95%) are shown from 500 bootstrap replicates.

� ��

��

��

� ��

��

��

−2.70

−2.65

−2.60

−2.55

−2.50

−3.86 −3.73 −3.60 −3.47 −3.34 −3.21 −3.09 −2.96 −2.83 −2.70 −2.57

Central

Eastern

�� � � � �

��

��

�� � � � �

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

Eastern

� � � � � ��

��

��

� � � � � ��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

Eastern

� � � �� �

��

��

� � � �� �

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

Eastern

� � � � ��

��

��

� � � � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.73 −3.66 −3.58 −3.51 −3.44 −3.36 −3.29 −3.21 −3.14 −3.07 −2.99

Central

Eastern

Minimum divergence to bonobos (log10)

Max

imum

div

erge

nce

betw

een

chim

panz

ee s

ubsp

ecie

s (lo

g 10)

A

B

C

D

E

Page 82: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

82 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S42

All comparisons between eastern and Nigeria-Cameroon chimpanzees for the observed and simulated windows of 50kb. Windows are binned by their minimum divergence to bonobos using derived alleles at ≥90% frequency in bonobos. Both axes are constant in all the subplots in the panel (x-axis described in the label at the bottom, y-axis described in the label at the left). A) Real data. B) Simulated data for 1% gene flow from bonobos to central-eastern ancestor, 1% gene flow from the central-eastern chimpanzee ancestor to bonobos, 1% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. C) Simulated data for 1% gene flow from bonobos to central-eastern ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. D) Simulated data for 1% gene flow from central-eastern chimpanzee ancestor to bonobos. E) Simulated data with no gene flow between chimpanzees and bonobos. Confidence intervals (95%) are shown from 500 bootstrap replicates.

��

��

��

��

��

��

��

��

−2.70

−2.65

−2.60

−2.55

−2.50

−3.86 −3.73 −3.60 −3.47 −3.34 −3.21 −3.09 −2.96 −2.83 −2.70 −2.57

Eastern

NC

� � � � � ��

��

�� � �

� ��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Eastern

NC

� � � � ��

��

��

� � � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Eastern

NC

� � � � � �� �

��

� � � � � ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Eastern

NC

� � � � � ��

��

��

� �� � � �

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.73 −3.66 −3.58 −3.51 −3.44 −3.36 −3.29 −3.21 −3.14 −3.07 −2.99

Eastern

NC

Minimum divergence to bonobos (log10)

Max

imum

div

erge

nce

betw

een

chim

panz

ee s

ubsp

ecie

s (lo

g 10)

A

B

C

D

E

Page 83: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

83 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S43

All comparisons between Nigeria-Cameroon and central chimpanzees for the observed and simulated windows of 50kb. Windows are binned by their minimum divergence to bonobos using derived alleles at ≥90% frequency in bonobos. Both axes are constant in all the subplots in the panel (x-axis described in the label at the bottom, y-axis described in the label at the left). A) Real data. B) Simulated data for 1% gene flow from bonobos to central-eastern ancestor, 1% gene flow from the central-eastern chimpanzee ancestor to bonobos, 1% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. C) Simulated data for 1% gene flow from bonobos to central-eastern ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. D) Simulated data for 1% gene flow from central-eastern chimpanzee ancestor to bonobos. E) Simulated data with no gene flow between chimpanzees and bonobos. Confidence intervals (95%) are shown from 500 bootstrap replicates.

� �

�� �

��

� ��

��

��

−2.65

−2.60

−2.55

−2.50

−2.45

−3.86 −3.73 −3.60 −3.47 −3.34 −3.21 −3.09 −2.96 −2.83 −2.70 −2.57

Central

NC

�� � � � �

��

��

�� � �

� ��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

NC

�� � � �

��

��

�� � � �

��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

NC

� � � � � ��

��

��

� � � �� �

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.78 −3.70 −3.62 −3.54 −3.47 −3.39 −3.31 −3.23 −3.16 −3.08 −3.00

Central

NC

� � � � � ��

��

��

� �� �

� ��

��

��

−2.75

−2.70

−2.65

−2.60

−2.55

−3.73 −3.66 −3.58 −3.51 −3.44 −3.36 −3.29 −3.21 −3.14 −3.07 −2.99

Central

NC

Minimum divergence to bonobos (log10)

Max

imum

div

erge

nce

betw

een

chim

panz

ee s

ubsp

ecie

s (lo

g 10)

A

B

C

D

E

Page 84: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

84 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S44

Heterozygosity across all models. Windows are binned by their minimum divergence to bonobos using derived alleles at ≥90% frequency in bonobos. Both axes are constant in all the subplots in the panel (x-axis described in the label at the bottom, y-axis described in the label at the left). A) Simulated data for 1% gene flow from bonobos to central-eastern ancestor, 1% gene flow from the central-eastern chimpanzee ancestor to bonobos, 1% gene flow from bonobos to the central-eastern chimpanzee ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. B) Simulated data for 1% gene flow from bonobos to central-eastern ancestor and 0.5% gene flow from bonobos to Nigeria-Cameroon chimpanzees. C) Simulated data for 1% gene flow from central-eastern chimpanzee ancestor to bonobos. D) Simulated data with no gene flow between chimpanzees and bonobos. Confidence intervals (95%) are shown from 500 bootstrap replicates.

0.0006

0.0009

0.0012

0.0015

−3.8 −3.6 −3.4 −3.2 −3.0

0.0006

0.0009

0.0012

0.0015

−3.8 −3.6 −3.4 −3.2 −3.0

0.0006

0.0009

0.0012

0.0015

−3.8 −3.6 −3.4 −3.2 −3.0

0.0006

0.0009

0.0012

0.0015

−3.6 −3.4 −3.2 −3.0

Minimum divergence to bonobos (log10

)

He

tero

zyg

osity

A B C D

Page 85: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

85 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S45 A

B

Simulations with super archaic introgression into western chimpanzees. A) All comparisons between chimpanzee subspecies for simulated data with superarchaic gene flow of 2 % into the western subspecies. The archaic population was modelled outside the Pan clade, diverging from the ancestors of chimpanzees and bonobos 2.5 Mya. Confidence intervals (95%) are shown from 500 bootstrap replicates. B) Stratified D statistics. Statistics were calculated using all sites at a given frequency in the bonobo population.

Central−Eastern Central−NC Central−Western Eastern−NC Eastern−Western NC−Western

−0.05

0.00

0.05

0.10

0.15

0.20

0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00Bonobo frequency

D s

tatis

tic

�� �

��

��

�� �

−2.85

−2.82

−2.79

−3.3 −3.2 −3.1 −3.0 −2.9rbins

m

subspeciesCentralWestern

��

��

��

�� �

−2.875

−2.850

−2.825

−2.800

−3.3 −3.2 −3.1 −3.0 −2.9rbins

m

subspeciesCentralNC

� ��

�� �

−2.925

−2.900

−2.875

−2.850

−3.3 −3.2 −3.1 −3.0 −2.9rbins

m

subspeciesCentralEastern

�� �

�� �

−2.875

−2.850

−2.825

−2.800

−3.3 −3.2 −3.1 −3.0 −2.9rbins

m

subspeciesEasternNC

� ��

��

�� �

−2.85

−2.82

−2.79

−2.76

−3.3 −3.2 −3.1 −3.0 −2.9rbins

m

subspeciesEasternWestern

� � �

� � �

��

��

−2.94

−2.91

−2.88

−2.85

−3.3 −3.2 −3.1 −3.0 −2.9rbins

m

subspeciesNCWestern

Minimum divergence to bonobos (log10)

Div

erge

nce

betw

een

chim

panz

ee s

ubse

peci

es (l

og10

)

Page 86: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

86 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S46

Percentage and numbers of putatively introgressed segments in heterozygosity in different chimpanzee subspecies. A) Values when using the human reference genome (hg19) for mapping and as an outgroup. The light bar represents the overall percentage of introgressed segments in each subspecies (within the callable fraction of the genome), the dark bar inside represents the percentage of such segments that are unique to each subspecies. B) Values on simulated data with 1% gene flow from bonobo into the ancestor of central and eastern chimpanzee and 0.5% gene flow into Nigeria-Cameroon chimpanzee. C) Values on simulated data without gene flow from bonobo into chimpanzee.

A B

C

Page 87: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

87 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S47

Cumulative length of discordant segments in Kb, as estimated by ARGweaver. The distribution of cumulative lengths of all discordant segments >10 Kb across pairwise comparisons is shown for all four chimpanzee subspecies.

Page 88: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

88 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S48

The age distribution of “bonobo” haplotypes in chimpanzee populations as estimated by ARGweaver when using the human reference genome (hg19) for mapping and as an outgroup. Regions that coalesce within the bonobo subtree before coalescing with the other chimpanzee population, longer than 25 Kbp, with divergence to the other chimpanzee of at least 32,000 generations are shown. Error bars represent 95% confidence across replicates.

Page 89: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

89 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S49

Schematic representation of the models tested to infer the divergence times, effective sizes and migration rates among bonobo and chimpanzee subspecies. a) Model without gene flow between bonobo and chimpanzees. b) Model with continuous gene flow between bonobo and central chimpanzees, which stops at some point in the past, and between bonobo and the ancestral populations of chimpanzees. In all models we considered a fixed population tree topology, inferring the population split times and effective sizes. Population split events were assumed to be potentially associated with founder events, here represented by bottlenecks. We also considered that periods with different populations could have different migration rates, and hence each arrow corresponds to a migration rate parameter, in a total of eight migration rates for model (a), and 14 for model (b). Furthermore, in all models we assumed that bonobo and all chimpanzee subspecies experienced a recent population declines. Therefore, these models explicitly account for the effects of differential drift across populations, population divergence and gene flow.

Page 90: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

90 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S50

Comparison of the log10 likelihood for the models with bonobo, eastern, central and western chimpanzees. Log-likelihood values computed only based on the polymorphic sites (LSFS in equation 2, Likelihood inference of demographic models based on the Site Frequency Spectrum). Distributions were obtained from 100 expected SFS approximated with 5x106 coalescent simulations with the parameters that maximized the likelihood under each model. These distributions reflect the noise resulting from approximating the likelihood with coalescent simulations.

Page 91: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

91 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S51

Schematic representation of the more complex model with the corresponding parameter tags, showing the time estimates for the model with western, and the migration rate estimates (forward in time) across the different time periods. The 95% confidence intervals are shown. The times of events are not to scale in the population tree.

Page 92: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

92 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S52

Comparison of the log10 likelihood for the models with bonobo, eastern, central and Nigeria-Cameroon chimpanzees. Log-likelihood values computed only based on the polymorphic sites (LSFS in equation 2, Likelihood inference of demographic models based on the Site Frequency Spectrum). Distributions were obtained from 100 expected SFS approximated with 106 coalescent simulations with the parameters that maximized the likelihood under each model. These distributions reflect the noise resulting from approximating the likelihood with coalescent simulations.

Page 93: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

93 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S53

Influence of migration rates on the SFS-based composite likelihood for models with western chimpanzees. A series of nested models where a given pair of migration rates was set to zero were considered, assuming that the higher the difference in the likelihoods between the nested and the full model, the more likely a given pair of migration rates was different from zero. We considered the full model with the parameters that maximized the likelihood and a series of nested models where migration rates were set to zero between Bonobos and: (i) central (no mig B-C); (ii) ancestors of eastern-central (no mig B-aEC); (iii) ancestors of common chimpanzees (no mig B-cChimps). For each nested model we fixed a given pair of migration rates to zero keeping all the other parameters as in the full model. The boxplots show the distribution of the difference between the likelihoods of the nested models and the mean likelihood of the full model, obtained based on 100 expected SFS computed with 106 coalescent simulations. Nested models with likelihood distributions similar to the full model indicate that the corresponding migration rates are likely not different from zero. Results indicate that nested models have always lower likelihoods than the full model, suggesting that all migration rates pairs are different from zero. The pair of migration rates with the higher impact on the likelihood refers to the gene flow between Bonobo and ancestors of central and eastern chimpanzees, as setting them to zero leads to a very poor fit, suggesting that these migration rates are required to have a good fit to the observed SFS.

Page 94: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

94 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S54

Schematic representation of the more complex model with the corresponding parameter tags, showing the time estimates for the model with Nigeria-Cameroon, and the migration rate estimates (forward in time) across the different periods of time. The 95% confidence intervals are shown within square brackets. The times of events are not to scale in the population tree.

Page 95: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

95 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S55

Comparison of the marginal observed and marginal expected 1D SFS for the best model with direct gene flow between bonobo and western chimpanzees. Each plot corresponds to the marginal minor allele SFS of a given population. The x-axis shows the minor allele frequencies (allele counts) and the y-axis shows the number of SNPs with a given frequency (in log10 scale). The expected SFS was obtained as the average of 100 simulated SFSs (each approximated with 106 coalescent simulations), according to the maximum-likelihood parameter estimates.

Page 96: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

96 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S56

Comparison of the marginal observed and marginal expected SFS for the best model with direct gene flow between bonobo and Nigeria-Cameroon chimpanzees. Each plot corresponds to the marginal SFS of a given population. The x-axis shows the minor allele frequencies (allele counts) and the y-axis shows the number of SNPs with a given frequency (in log10 scale). The expected SFS was obtained as the average of 100 simulated SFSs (each approximated with 106 coalescent simulations), according to the maximum-likelihood parameter estimates.

Page 97: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

97 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S57

Comparison of the multidimensional joint observed and expected SFS for the 30 entries showing the worst fit (out of the 14,641 entries) in the multidimensional joint SFS with bonobo, eastern, central and western chimpanzees. Each column corresponds to one entry of the SFS, coded as b,e,c,w (from bottom to top) as the frequency of the minor allele in bonobo (b), eastern (e), central (c), and western (w). a) Comparison of the expected and observed counts (y-axis) for each entry. b) Comparison of the relative fit, defined as the relative number of SNP counts for a given entry (Relative fit= #expSNPs/ #obsSNPs, y-axis). Expected SFS were obtained as the average of 100 simulated SFSs (approximated with 106 coalescent simulations), according to the parameter estimates obtained under the best model. Error bars correspond to the 0.01 and 0.99 quantiles of the 100 simulated SFSs. We defined an arbitrary threshold to define entries with the worst fit, by ranking entries in terms of the difference between the expected and observed SFS log10 likelihoods (i.e. |(mi Log10(pi))- (mi Log10(mi/L)|>1000, where mi is the observed counts at the i-th entry, pi is the expected SFS at the i-th entry and L is the total number of polymorphic sites).

Page 98: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

98 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Fig. S58

Comparison of the multidimensional joint observed and expected SFS for the 30

entries showing the worst fit (out of the 14,641 entries) in the multidimensional joint SFS with bonobo, eastern, central and Nigeria-Cameroon chimpanzees. Each column corresponds to one entry of the SFS, coded as b,e,c,n (from bottom to top) as the frequency of the minor allele in bonobo (b), eastern (e), central (c), and Nigeria-Cameroon (n). a) Comparison of the expected and observed counts (y-axis) for each entry. b) Comparison of the relative fit, defined as the relative number of SNP counts for a given entry (Relative fit= #expSNPs/ #obsSNPs, y-axis). Expected SFS were obtained as the average of 100 simulated SFSs (approximated with 106 coalescent simulations), according to the parameter estimates obtained under the best model. Error bars correspond to the 0.01 and 0.99 quantiles of the 100 simulated SFSs.

Page 99: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

99 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Tables

Table S1 Samples sequenced in this study (number 2 in the column “Study”) and chimpanzees from previous publications (number 1). Subspecies are abbreviated as: Pte – Pan troglodytes ellioti (Nigeria-Cameroon chimpanzee), Ptt – Pan troglodytes troglodytes (central chimpanzee), Pts – Pan troglodytes schweinfurthii (eastern chimpanzee) and Ptv – Pan troglodytes verus (western chimpanzee). Individuals from neighbouring sample locations are grouped in regional clusters. Samples marked with (*) were sequenced a posteriori to test our findings in the population structure section (Suppl. Mat. 2.6). Further details about the samples can be found in the Data file S1.

Subspecies Study Sex Name Location Regional cluster

Ptv 2 F Berta Ivory Coast Ivory Coast Ptv 2 F Annie Guinea Guinea Ptv 2 M Mike Guinea Guinea Ptv 2 M SeppToni Liberia Liberia Ptv 2 F Linda Liberia Liberia Ptv 2 F Cindy Ivory_coast Ivory Coast Ptv 2 F Alice Ivory coast Ivory Coast Ptv 1 M Bosco NA -

Ptv 1 F Jimmie NA - Ptv 1 M Koby NA - Ptv 1 M Clint NA - Ptv 1 M Donald NA - Ptt 2 F Marlin NA - Ptt 2 F Negrita Equatorial Guinea Equatorial Guinea Ptt 2 F Blanquita Equatorial Guinea Equatorial Guinea Ptt 2 F Tibe Equatorial Guinea Equatorial Guinea Ptt 2 M Yogui Equatorial Guinea Equatorial Guinea Ptt 2 F Noemie Equatorial Guinea Equatorial Guinea Ptt 2 F Cindy NA - Ptt 2 F Mirinda NA -

Ptt 2 F Ula Equatorial Guinea Equatorial Guinea Ptt 2 F Lara Equatorial Guinea Equatorial Guinea Ptt 2 F Luky Equatorial Guinea Equatorial Guinea Ptt 2 M Gamin NA - Ptt 2 M Brigitta NA - Ptt* 2 F Makokou Ogooué Ivindo Gabon-East Ptt* 2 F Judy Ogooué Maritime Gabon-West Ptt 2 M Alfred NA - Ptt 1 M Vaillant Haut-Ogooue Gabon-East Ptt 1 F Doris Rabi Gabon-West

Page 100: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

100 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Ptt 1 F Julie Haut-Ogooue Gabon-East Ptt 1 F Clara Gabon - Pts 2 F Diana DRC-South DRC-South Pts 2 M Padda Walikale DRC-East / Rwanda Pts 2 F Cindy Masindi Uganda Pts 2 F Ikuru Arua DRC-North Pts 2 M Tongo Tongo DRC-East/Rwanda Pts 2 F Cleo DRC - Pts 2 F Bihati Goma - Pts 2 F Trixie DRC-Rwanda DRC-East/Rwanda Pts 2 F Frederike Rwanda DRC-East/Rwanda

Pts 2 F Maya Mbuji Mayi DRC-South Pts 2 F Coco Zambia - Pts 2 M Athanga Beni,North-Kivu DRC-North Pts 2 M Washu Lubumbashi DRC-South Pts* 2 M Mgbadolite Gbadolite DRC-North Pts* 2 M Amahiriwe DRC (North) DRC-North Pts* 2 M Roxy DRC DRC-South Pts* 2 F Roy DRC (North) DRC-North Pts 1 F Andromeda Gombe Tanzania Pts 1 M Vincent Gombe Tanzania Pts 1 F Kidongo DRC - Pts 1 F Nakuu DRC -

Pts 1 F Harriet Uganda (West) Uganda Pts 1 M Bwambale Busumba Uganda Pte 1 M Akwaya_Jean Western Cameroon Western Cameroon Pte 1 F Banyo Western Cameroon Western Cameroon Pte 1 M Basho Western Cameroon Western Cameroon Pte 1 M Damian Western Cameroon Western Cameroon Pte 1 F Julie Western Cameroon Western Cameroon Pte 1 F Kopongo Western Cameroon Western Cameroon Pte 1 M Koto Western Cameroon Western Cameroon Pte 1 F Paquita Western Cameroon Western Cameroon Pte 1 F Taweh Western Cameroon Western Cameroon Pte 1 F Tobi Western Cameroon Western Cameroon

Page 101: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

101 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Table S2 Summary statistics for autosomal SNP sites. All sites in the all-sample filtered VCF file (sites which segregate across all samples) are included here. a) SNP sites at which all samples in the indicated population match the reference genome. b) Variants fixed within the indicated population or group (column heading) which differ from the reference genome. c) Sites segregating within the indicated population. d) Mean number of heterozygous positions per sample divided by the callable genome length. e) Number of segregating sites divided by the callable genome length. f) Watterson’s theta estimator (S divided by the harmonic number of samples). g) Effective population size, estimated as Ne = Θw / 4m, where m = 1.2 x 10-8 bp-1 per generation.

Population Nigeria-Cameroon Central Western Eastern All

Nº Samples 10 18 12 19 59 Coverage 17.25 23.08 26.5 30.86 24.42 Fixed sites equal to panTro4 a) 14,755,481 7,805,779 17,624,496 11,826,066 0 Fixed sites alternative to panTro4b) 599,513 613,160 808 731,641 0

Polymorphic sites 6,725,844 13,662,304 4,455,322 9,523,403 22,081,627 Mean heterozygosity d) 2,358,527 3,677,246 1,566,364 3,189,656 2,697,949 Het per bp 0.00096 0.00143 0.00060 0.00124 0.00106 S e) 0.00391 0.00794 0.00259 0.00553 0.01283 Θw

f) 0.00133 0.00227 0.00083 0.00156 0.00355 Ne

g) 27,795 47,314 17,378 32,492 73,942

Page 102: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

102 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Table S3 Summary of male-specific regions of the Y chromosome (MSY) genetic diversity (π - nucleotide diversity; SD - standard deviation).

Species/ subspecies Nº Samples

Polymorphic sites Total bp π (× 10-3) SD (× 10-3)

Pan troglodytes dataset Central 5 8,001 5,849,678 0.790 0.16680 Eastern 6 3,356 5,849,678 0.214 0.01810 Nigeria-Cameroon 4 2,957 5,849,678 0.254 0.02130 Western 6 450 5,849,678 0.028 0.00030 Total 21 22,955 5,849,678 All-Pan dataset Bonobo 2 5,810 5,720,892 1.016 0.12700 Central 5 7,761 5,720,892 0.784 0.07930 Eastern 6 3,248 5,720,892 0.211 0.00850 Nigeria-Cameroon 4 2,886 5,720,892 0.253 0.01030 Western 6 429 5,720,892 0.027 0.00015 Total 23 46,475 5,720,892

Page 103: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

103 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Table S4 Individual heterozygosity per Kbp. Genetic diversity for each sample was estimated dividing the genome-wide total count of heterozygous genotypes by the number of high confidence bases in the genome (1,721,192,217). Low coverage genome sequences produced to validate our observed distribution of geographical diversity were excluded from this analysis.

Sample Heterozygous

count Heterozygosity

Pan_troglodytes_schweinfurthii-A996_Diana 2,172,830 1.26 Pan_troglodytes_schweinfurthii-B010_Ikuru 2,096,197 1.22 Pan_troglodytes_schweinfurthii-N018_Trixie 2,296,043 1.33 Pan_troglodytes_ellioti-Akwaya_Jean 1,694,268 0.98 Pan_troglodytes_ellioti-Banyo 1,300,439 0.76 Pan_troglodytes_ellioti-Basho 1,593,331 0.93 Pan_troglodytes_ellioti-Damian 1,676,825 0.97 Pan_troglodytes_ellioti-Julie 1,961,715 1.14 Pan_troglodytes_ellioti-Kopongo 1,512,834 0.88 Pan_troglodytes_ellioti-Koto 1,754,622 1.02 Pan_troglodytes_ellioti-Paquita 1,578,966 0.92 Pan_troglodytes_ellioti-Taweh 1,759,205 1.02 Pan_troglodytes_ellioti-Tobi 1,686,695 0.98 Pan_troglodytes_schweinfurthii-100037_Vincent 1,968,088 1.14 Pan_troglodytes_schweinfurthii-100040_Andromeda 1,903,480 1.11 Pan_troglodytes_schweinfurthii-9729_Harriet 1,877,404 1.09 Pan_troglodytes_schweinfurthii-A910_Bwambale 2,220,045 1.29 Pan_troglodytes_schweinfurthii-A911_Kidongo 2,122,474 1.23 Pan_troglodytes_schweinfurthii-A912_Nakuu 2,065,163 1.20 Pan_troglodytes_schweinfurthii-B002_Padda 2,012,831 1.17 Pan_troglodytes_schweinfurthii-B007_Cindy 1,739,934 1.01 Pan_troglodytes_schweinfurthii-B011_Frederike 2,239,268 1.30 Pan_troglodytes_schweinfurthii-B012_Washu 2,221,732 1.29 Pan_troglodytes_schweinfurthii-B013_Athanga 2,120,903 1.23 Pan_troglodytes_schweinfurthii-B014_Coco 2,200,678 1.28 Pan_troglodytes_schweinfurthii-N013_Tongo 2,207,522 1.28 Pan_troglodytes_schweinfurthii-N015_Cleo 2,274,346 1.32 Pan_troglodytes_schweinfurthii-N017_Bihati 1,792,697 1.04 Pan_troglodytes_schweinfurthii-N019_Maya 2,204,646 1.28 Pan_troglodytes_troglodytes-10964_Cindy 2,574,369 1.50 Pan_troglodytes_troglodytes-11352_Mirinda 2,602,385 1.51 Pan_troglodytes_troglodytes-11528_Alfred 2,601,123 1.51 Pan_troglodytes_troglodytes-12311_Ula 2,482,123 1.44 Pan_troglodytes_troglodytes-12320_Lara 2,535,174 1.47 Pan_troglodytes_troglodytes-12348_Luky 2,493,485 1.45 Pan_troglodytes_troglodytes-12420_Gamin 2,305,636 1.34 Pan_troglodytes_troglodytes-13656_Brigitte 2,543,340 1.48 Pan_troglodytes_troglodytes-A957_Vaillant 2,604,821 1.51

Page 104: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

104 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Pan_troglodytes_troglodytes-A958_Doris 2,370,460 1.38 Pan_troglodytes_troglodytes-A959_Julie 2,364,976 1.37 Pan_troglodytes_troglodytes-A960_Clara 2,258,833 1.31 Pan_troglodytes_troglodytes-A990_Noemie 2,218,673 1.29 Pan_troglodytes_troglodytes-B021_Yogui 2,527,752 1.47 Pan_troglodytes_troglodytes-B022_Tibe 2,488,089 1.45 Pan_troglodytes_troglodytes-B023_Blanquita 2,541,301 1.48 Pan_troglodytes_troglodytes-B024_Negrita 2,549,457 1.48 Pan_troglodytes_troglodytes-B025_Marlin 2,297,838 1.34 Pan_troglodytes_verus-9668_Bosco 991,297 0.58 Pan_troglodytes_verus-9730_Donald 1,646,466 0.96 Pan_troglodytes_verus-A956_Jimmie 1,006,068 0.58 Pan_troglodytes_verus-A991_Berta 797,536 0.46 Pan_troglodytes_verus-A992_Annie 915,718 0.53 Pan_troglodytes_verus-A993_Mike 889,689 0.52 Pan_troglodytes_verus-B005_SeppToni 1,079,046 0.63 Pan_troglodytes_verus-B006_Linda 1,070,960 0.62 Pan_troglodytes_verus-Clint 990,108 0.58 Pan_troglodytes_verus-N014_Cindy 1,069,613 0.62 Pan_troglodytes_verus-N016_Alice 1,024,124 0.60 Pan_troglodytes_verus-X00100_Koby 977,227 0.57

Page 105: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

105 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Table S5 Population-wide test for gene flow between the different chimpanzee subspecies and bonobos. western subspecies: Ptv, Nigerian-Cameroon: Pte, central: Ptt, eastern: Pts. As outgroup, we used the ancestral allele or human, denoted AA and Human. Donald was excluded from the grouping of the western subspecies. Positive Z-scores indicates gene flow between W and Y or X and Z. Negative Z-scores indicates gene flow between W and Z or X and Y.

Mapped to PanTro4

Y Z W X D-statistic Z score SE

Ptt Pte Pp AA 0.042 13.753 0.003 Ptt Pts Pp AA 0.009 4.6231 0.002 Ptt Ptv Pp AA 0.136 35.283 0.004 Pts Pte Pp AA 0.034 10.467 0.003 Pte Ptv Pp AA 0.125 28.608 0.004 Pts Ptv Pp AA 0.131 32.515 0.004

Mapped to Hg19

Y Z W X D-statistic Z score SE

Ptt Pte Pp Human 0.020 5.143 0.005 Ptt Pts Pp Human 0.008 2.017 0.004 Ptt Ptv Pp Human 0.037 5.218 0.007 Pts Pte Pp Human 0.020 3.388 0.006 Pte Ptv Pp Human 0.013 1.634 0.008 Pts Ptv Pp Human 0.030 3.929 0.007

Page 106: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

106 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Table S6 Numbers of putatively introgressed segments and fraction of the autosome in brackets. “Population-specific” segments do not overlap with putatively introgressed segments in another population. “Total” segments are allowed to occur in different populations.

Population Population-specific Total

Central 1,856 (0.51%) 5,048 (1.41%)

Eastern 1,315 (0.36%) 4,015 (1.12%)

Nigeria-Cameroon 1,856 (0.25%) 3,007 (0.85%)

Western 266 (0.08%) 645 (0.20%)

Page 107: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

107 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Table S7 Chromosomes with the highest and lowest fractions of putatively introgressed segments in each chimpanzee population.

Population Highest introgression Lowest introgression

Central chr21 (2.6%), chr6 (2.1%),

chr4 (2.0%) chr2B (0.7%), chr22 (0.7%), chr9 (1%)

Eastern chr21 (1.9%), chr4 (1.7%), chr10 (1.5%)

chr2B (0.5%), chr19 (0.5%), chr22 (0.6%)

Nigeria-Cameroon chr4 (1.3%), chr6 (1.2%), chr10 (1.2%)

chr2B (0.4%), chr22 (0.5%), chr9 (0.5%)

Western chr21 (0.4%), chr6 (0.4%), chr10 (0.3%)

chr19 (0.1%), chr15 (0.1%), chr22 (0.1%)

Page 108: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

108 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

Table S8 Putatively introgressed (“bonobo-like”) segments in heterozygosity in central chimpanzees longer than 25 kb, and length in bp. Neighbouring regions within 20 kb were joined. Region Length 14:83307123-83370204 63,082 16:45160002-45220095 60,094 17:34408733-34467475 58,743 1:122178166-122236501 58,336 1:80236614-80291741 55,128 4:183063944-183117544 53,601 1:95397283-95450680 53,398 21:14459753-14512882 53,130 17:75724092-75777171 53,080 3:187448917-187499087 50,171 5:48025606-48075351 49,746 10:98697622-98745083 47,462 6:161692223-161739354 47,132 5:97615805-97662928 47,124 6:10327481-10372771 45,291 12:50914618-50959847 45,230 4:58322343-58367352 45,010 3:4250191-4295127 44,937 6:121624785-121668626 43,842 5:30363406-30406410 43,005 4:24860080-24902885 42,806 6:23394779-23437556 42,778 1:157952405-157992975 40,571 18:2554357-2594452 40,096 10:56107876-56147896 40,021 9:116204462-116244234 39,773 2B:228296759-228335975 39,217 15:95889194-95928252 39,059 11:39347493-39386106 38,614 6:40781352-40819790 38,439 1:137642086-137680068 37,983 2A:79869857-79907809 37,953 8:119169688-119207554 37,867 16:49983931-50021632 37,702 20:7121278-7158282 37,005 4:86877613-86914567 36,955 10:98618362-98655232 36,871 5:49651701-49688228 36,528 10:63630410-63666750 36,341 16:19019585-19055406 35,822 5:36864720-36900258 35,539 5:87154071-87189586 35,516

14:23859365-23894733 35,369 18:21865364-21900628 35,265 14:21450606-21485361 34,756 6:54859362-54894087 34,726 8:49313460-49347661 34,202 1:119522494-119556694 34,201 13:86131015-86165187 34,173 18:74650480-74684591 34,112 20:19826464-19860403 33,940 5:114110973-114144740 33,768 12:82981918-83015318 33,401 4:37805932-37839181 33,250 4:48253399-48286402 33,004 16:60543153-60576102 32,950 7:31450506-31483311 32,806 19:61735133-61767929 32,797 1:102409404-102442103 32,700 6:115715918-115748241 32,324 9:100828068-100860307 32,240 16:53819112-53851337 32,226 4:189669189-189701366 32,178 11:89094399-89126399 32,001 13:19865867-19897825 31,959 7:154401599-154433555 31,957 6:79047019-79078897 31,879 6:141843060-141874544 31,485 17:16633736-16665111 31,376 3:79405803-79437138 31,336 22:37243507-37274749 31,243 10:107984598-108015761 31,164 2A:81479716-81510793 31,078 5:59275101-59306140 31,040 3:77428825-77459862 31,038 12:50338306-50369228 30,923 5:119480209-119510962 30,754 2B:243927114-243957817 30,704 3:106024478-106055044 30,567 3:158466412-158496896 30,485 12:41984834-42015221 30,388 11:84785276-84815141 29,866 12:36824143-36853901 29,759 12:28016737-28046338 29,602 5:20300327-20329902 29,576

Page 109: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

109 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material

8:23113906-23143442 29,537 3:196599679-196628931 29,253 7:89774524-89803634 29,111 16:57440636-57469715 29,080 18:48570041-48599033 28,993 5:72451187-72480097 28,911 12:82604897-82633743 28,847 21:13362467-13391275 28,809 2A:38752051-38780805 28,755 2B:173822121-173850844 28,724 6:166842468-166871114 28,647 9:97270738-97299365 28,628 2B:175203658-175232250 28,593 19:9895506-9924056 28,551 8:124678427-124706910 28,484 7:143326093-143354378 28,286 12:105381982-105410224 28,243 17:16413187-16441338 28,152 3:182996761-183024886 28,126 13:98028965-98057079 28,115 5:146260877-146288923 28,047 19:10089576-10117584 28,009 11:43180292-43208273 27,982 8:32716969-32744885 27,917 1:172453816-172481683 27,868 10:119869215-119897046 27,832 8:23378655-23406366 27,712 14:29275986-29303663 27,678 6:33079923-33107562 27,640 18:25397678-25425271 27,594 20:54469970-54497531 27,562 18:65912258-65939773 27,516 7:81636420-81663895 27,476 15:68352907-68380294 27,388 4:118613834-118641100 27,267 4:69782103-69809367 27,265 3:105618379-105645535 27,157 4:91466948-91494103 27,156 1:98320294-98347442 27,149 2A:20610586-20637637 27,052 7:34711078-34738086 27,009 6:99840082-99867054 26,973 7:11659089-11685983 26,895 12:77031512-77058365 26,854 7:6998518-7025242 26,725 4:106931935-106958607 26,673

15:33465103-33491714 26,612 2A:107049201-107075703 26,503 3:141233034-141259454 26,421 13:78664157-78690469 26,313 20:25777590-25803783 26,194 6:40033070-40059192 26,123 4:149560096-149586210 26,115 11:120408233-120434332 26,100 10:27409363-27435332 25,970 7:48064600-48090532 25,933 4:62148467-62174295 25,829 11:107773268-107799088 25,821 6:100741653-100767470 25,818 3:159610384-159636168 25,785 20:52628040-52653742 25,703 18:14742931-14768612 25,682 2A:100769917-100795528 25,612 19:41607249-41632813 25,565 11:57773608-57799116 25,509 6:118884580-118910001 25,422 5:123903704-123929075 25,372 6:143329433-143354536 25,104 10:85433368-85458461 25,094 8:66727022-66752032 25,011

Page 110: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

Table S9 Significantly enriched gene ontology (GO) categories (P < 0.05; FWER < 0.05) in putatively introgressed windows, and genes found therein.

GO term p-value Gene names

positive regulation of actin filament polymerization

0.007 ACTR3C; FER; FMN1; KIRREL; NCKAP1L; PRKCE

axon guidance 0.01

ABLIM1; ALCAM; ANK2; ANK3; BOC; CACNB4; CNTN1; CNTN4; COL4A1; COL5A2; CSNK2A1; DCC; DPYSL2; ENAH; EPHA5; EXT1; LAMA2; LAMC1; NRXN3; NTRK1; PLXNA4; PTPRM; RELN; ROBO2; SEMA3E; SLIT3; SPTA1; SRGAP1; TENM2; TTC8; UNC5C; USP33

regulation of systemic arterial blood pressure by baroreceptor feedback

0.012 ADRA1A; ASIC2; CHRNA7; NAV2

regulation of small GTPase mediated signal transduction

0.015

ADRA1A; ARAP2; ARHGAP15; ARHGAP18; ARHGAP24; ARHGEF28; DLC1; IQGAP2; KALRN; KIAA1244; MCF2L2; NRG1; NTRK1; PSD3; RASA1; RASGEF1B; RELN; RHOBTB1; RHOJ; SRGAP1; SRGAP3; STARD13; TIAM1

regulation of cell shape 0.017 AKAP2; ARHGAP15; DLC1; EPB41L3; LIMD1; PALM2-AKAP2; RASA1; RHOJ; SEMA3E; SPTA1

ventral spinal cord development 0.019 ; DAB1; MDGA2; RELN

antigen processing and presentation of exogenous peptide antigen via MHC class I, TAP-dependent

0.02 HLA-G; PSMA8; PSMB8; PSMB9; TAP1; TAP2

protein phosphorylation 0.034

ADRA1A; ALK; ATF6; BMP2K; BUB1; CAMK1D; CAMKK1; CDK8; CHRNA7; CSNK2A1; DAB1; DCC; DCLK1; DDR2; DNAJC10; DOK5; EIF4G3; EPHA5; EPHA6; ERBB4; FAM129A; FCER1A; FER; FIP1L1; FKTN; FNDC1; GHR; HUNK; IL26; INSRR; ITK; ITLN1; KALRN; KIRREL; MAK; MAP3K7; MAP3K8; MAP3K9; MAPK14; MAST2; MAST4; MUSK; NCKAP1L; NDRG2; NELL1; NPNT; NRG1; NRG3; NTRK1; PAK7; PAQR3; PARK2; PELI2; PHKG2; PRKCE; PRKG1; RARRES2; RELN; RNASEL; ROS1; RSRC1; STK17A; STK32B; STK40; SYK; TGFA; TRIM5; TXK; ZMYND11

mitotic cytokinesis 0.038 ANK3; RASA1; SNX9 serotonin secretion 0.038 FCER1A; P2RX1; SYK NK T cell differentiation 0.038 AP3B1; ITK; TXK

macrophage activation involved in immune response

0.038 LBP; PRKCE; SYK

leukocyte migration involved in inflammatory response

0.038 LBP; S100A8; SELE

thyroid hormone generation 0.038 CPQ; IYD; TPO

positive regulation of glucose transport 0.038 GIP; ITLN1; RARRES2

positive regulation of cell adhesion mediated by integrin

0.038 CXCL13; NCKAP1L; SYK

Page 111: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 111

positive regulation of smooth muscle contraction

0.038 ADRA1A; CHRM3; NPNT

positive regulation of phagocytosis 0.038 CAMK1D; DOCK2; NCKAP1L

renal tubule development 0.038 MTSS1; NPNT; TTC8

cellular response to alkaloid 0.038 NTRK1; RYR3; SPIDR

nephron epithelium development 0.038 MAGI2; MTSS1; NPNT

apoptotic process 0.038

ADRA1A; ALK; ANXA4; ASIC2; BUB1; C11orf82; CAMK1D; CARD11; CAST; CDH1; CECR2; CSNK2A1; DCC; DLC1; DNAJC10; EGLN3; EPB41L3; ERBB4; FABP1; FHIT; FNDC1; GRID2; ITPR1; KALRN; KNG1; MAP3K7; MAP3K9; MAPK14; MUSK; NCKAP1L; NOS1AP; NRG1; NTRK1; OXR1; P2RX1; PAK7; PARK2; PRAMEF8; PRKCE; PRUNE2; PSMA8; PSMB8; PSMB9; RASA1; ROBO2; S100A8; SARM1; SGPL1; SLIT3; STK17A; STPG1; TGFA; TIAM1; TNS4; TOX3; TP53AIP1; UNC5C; USP28; XAF1; XRCC2; YBX3; ZMYND11; ZNF385B

negative regulation of cell-substrate adhesion 0.044 ANGPT2; MUC22; RASA1; SEMA3E; SPOCK1; TBCD

dicarboxylic acid metabolic process 0.044 GHR; HAL; LIPF; ME3; NR1H4; SLC46A1

regulation of anion transport 0.044 DPYSL2; GRM7; PLA2R1; ROS1; SLC17A3; SYK

maintenance of protein location 0.044 ANK3; EPB41L3; GOPC; MINOS1-NBL1; PEX5L; TLN2

production of molecular mediator involved in inflammatory response

0.047 ALOX5AP; LBP; P2RX1; SYK

translational initiation 0.047 EIF4E3; EIF4G3; RPL26L1; RPSA axo-dendritic transport 0.047 AP3B1; AP3B2; DST; UGT8 ribonucleoside monophosphate biosynthetic process

0.047 ADK; AK9; DPYD; GMPS

central nervous system neuron axonogenesis 0.047 ARHGEF28; DCC; DCLK1; PLXNA4

Reflex 0.047 ADRA1A; FOXP2; NPNT; PCDH15 neuromuscular junction development 0.047 ANK3; CACNB4; COL4A1; GPHN; MUSK

hindbrain morphogenesis 0.047 DAB1; DLC1; GRID2; LMX1B; RORA

Page 112: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 112

Table S10 Statistics for ARGweaver. Ratio A:B = Ratio of number of discordant segments in each pairwise comparison; raw numbers in Figure 4B. Length test = Pairwise Wilcoxon rank sum test for the length distribution of segments in each comparison; Benjamini-Hochberg corrected P-values. “Young” Ratio A:B = Ratio of number of discordant segments < 200 kya.

Comparison Ratio A:B

Length test

“Young” Ratio A:B

Central/Western 4.4 1.8x10-5 4.6

Eastern/Western 4.0 1.1x10-9 3.2

Nigeria-Cameroon/Western 2.5 0.77 2.0

Central/Nigeria-Cameroon 2.2 3.5x10-10 1.7

Eastern/Nigeria-Cameroon 2.0 3.4x10-18 1.5

Central/Eastern 1.1 2.2 x10-3 1.1

Page 113: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 113

Table S11 Results for ARGweaver on simulations, under model A) no gene flow; B) gene flow from bonobos to non-western chimpanzees; C) gene flow from an unknown population outside the Pan clade into western chimpanzees. The number of discordant segments longer than 10 kb in each individual for each pairwise comparison is shown.

Comparison A B C

Central/Western 4/2 35/4 4/0

Eastern/Western 4/1 31/2 2/0

Nigeria-Cameroon/Western 8/3 14/1 12/0

Central/Nigeria-Cameroon 9/4 24/9 13/9

Eastern/Nigeria-Cameroon 15/5 20/4 2/0

Central/Eastern 3/10 15/8 3/7

Page 114: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 114

Table S12 List of individuals kept for the demographic analysis based on the site frequency spectrum.

Individual ID Mean depth of coverage

Bonobo (Pan paniscus) A914 Hortense 28.24 A917 Dzeeta 32.75 A918 Hermien 29.48 A919 Desmond 31.99 A926 Natalie 27.32 Chimpanzee (Pan troglodytes) Nigeria-Cameroon (Pan troglodytes ellioti)

Akwaya Jean 22.42 Damian 20.53 Julie 22.10 Koto 24.14 Taweh 20.91 Eastern (Pan troglodytes schweinfurthii) A910 Bwambale 38.16 A911 Kidongo 40.60 B012 Washu 38.29 N015 Cleo 61.14 N019 Maya 36.15 Central (Pan troglodytes troglodytes) 11528 Alfred 28.99 12320 Lara 29.67 13656 Brigitta 29.21 A957 Vaillant 27.11 A958 Doris 21.77 Western (Pan troglodytes verus) A956 Jimmie 23.93 B005 SeppToni 37.01 B006 Linda 35.11 N014 Cindy 40.10 N016 Alice 35.43

Page 115: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 115

Table S13 Search ranges specified for each parameter for the maximum likelihood optimization. The search range for all parameters was assumed to be uniform, except for the migration rates, which was done in a log scale. The b indicates parameters with a fixed upper bound for the search range. Parameters tags are described in Fig. S51.

Search range Seach range Parameter Parameter tag Lower Upper Effective Sizes (number of haploid gene copies) Ancestral sizes: Ancestral Bonobos-common chimps Nanc 10,000 60,000

Ancestral common chimps NancCChimp 100 400,000 Ancestral Eastern-Central NancCE 100 400,000 Past sizes: Bonobo NBon 1,000 100,000

b

Eastern NEast 1,000 100,000 b Central NCent 1,000 200,000 b Western/Nigeria-Cameroon NWest/NNigCam 1,000 100,000 b Ancestral Bonobo NancB 1,000 100,000 b Ancestral Western NancW 1,000 100,000 b Current sizes: Recent Bonobo NrecB 10 10,000

b

Recent Eastern NrecE 10 10,000 b Recent Central NrecC 10 10,000 b Recent Western NrecW 10 10,000 b Bottleneck sizes Bottleneck Bonobo NBotlB 1 5,000

b

Bottleneck split Eastern NBotlSplitE 1 5,000 b Bottleneck split Central NBotlSplitC 1 5,000 b Bottleneck split Western/Nig.Cam. NBotlSplitW/Nc 1 5,000 b Bottleneck Western/Nig.Cam. NBotlW/NBotlNc 1 5,000 b Bottleneck split of Common chimps NBotlSplitCChimp 1 5,000 b Time of events (generations)

Time of split Bonobo TBonobo 30,000 150,000 Time of split West./Nig.Cam. TWestern/TNigCam 6,000 TBonobo b Time of split Eastern-Central TEasternCentral 3,000 TWestern/TNigCam b Time Bottleneck Bonobo TBotlB 800 TBonobo b Time bottleneck West./Nig.Cam. TBotlWestern/NigCam 800 TWestern/TNigCam b Time migration Bonobo stops TMigStopBonobo 2000 TEasternCentral b Time migration West./Nig.Cam. stops TMigStopWestern/NigCam 100 TMigStopBonobo b Scaled migration rates (2Nm)

Eastern to Central NmEC 0.00 5.00 b Central to Eastern NmCE 0.00 5.00 b Eastern to West/Nig.Cam. NmEW/NmEN 0.00 1.00/2.5 b Western/Nig.Cam. to Eastern NmWE/NmNE 0.00 1.00/2.5 b Central to Western NmCW/NmCN 0.00 1.00/2.5 b West./Nig.Cam. to Central NmWC/NmNC 0.00 1.00/2.5 b West,/Nig.Cam to anc. East./Cent. NmWaEC/NmNaEC 0.00 1.00 b Anc. East.-Cent. to West./Nig.Cam. NmaECW/NmaECN 0.00 1.00 b Central to Bonobo NmCB 0.00 0.50 b Bonobo to Central NmBC 0.00 0.50 b Western/Nig.Cam to Bonobo NmWB/NmNB 0.00 0.50 b Bonobo to Western/Nig.Cam. NmBW/NmBN 0.00 0.50 b Ancestral East.-Cent. to Bonobo NmaECB 0.00 0.50 b Bonobo to ancestral East.-Cent. NmBaEC 0.00 0.50 b Ancestral common chimps to Bonobo NmaCB 0.00 0.50 b Bonobo to ancestral common chimps NmBaC 0.00 0.50 b

Page 116: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 116

Table S14 Log-likelihood values obtained for the models with western chimpanzees. The models correspond to the schematic representations in Fig. S49. The maximum observed likelihood (Max Obs Lhood) corresponds to the likelihood of a perfect fit to the observed data. The maximum estimated likelihood (Max Estimated Lhood) is the likelihood estimated based on the average of 100 expected SFS (approximated with 5x105 coalescent simulations) according to the parameters that maximized the likelihood in the ECM optimization. The smaller the difference (ΔLhood) between the Max Estimated Lhood and Max Obs Lhood the better the fit. The likelihood values are reported in log10 scale.

Model tag Model #params Max Obs Lhood

Max Estimated

Lhood ΔLhood

Mig Bon Direct migration between

bonobo and central chimpanzees

40 -9,586,362 -9,599,345 -12,984

No Mig Bon No Migration with bonobo 33 -9,586,362 -9,607,332 -20,971

Page 117: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 117

Table S15 Point estimates for the parameters of the three models obtained for the models with bonobo, eastern, central and western. The parameter tags are described in Fig. S51. a) All effective sizes are given as number of diploid individuals. b) Times in thousands of years ago obtained assuming a mutation rate of 1.2x10-8mutations/generation/site and a generation time of 25 years. c) Forward in time migration rates. Migration rates for bonobo and western chimpanzees defined as NancB*mig and NancW*mig. For periods where these populations have a size NBon and Nwest the migration rates need to be adjusted.

Models

Parameters No gene flow with

bonobo Gene flow between bonobo and both central and

western

Point estimates Point est. 95% CI Ancestral effective sizesa: Nanc 23,450 5,738 5,428 13,012 NancCChimp 32,543 12,302 11,430 14,171 Past sizesa: NancCE 180,044 74,044 71,607 123,635 NBon 32,280 16,232 15,238 18,747 NancB 43,863 23,968 9,624 32,988 NEast 15,990 17,650 12,291 18,893 NCent 44,050 48,306 35,664 50,515 Nwest 10,395 12,323 9,992 16,340 NancW 40,972 1,895 1,001 2,372 Bottleneck sizes (during 100 generations)a: NBotlSplitCChimp 68 1,374 46 2,182 NBotlSplitW 870 1,409 663 2,221 NBotlSplitE 607 508 296 949 NBotlSplitC 2,527 2,160 1,538 2,814 NBotlW 60 1,185 678 2,316 NBotlB 37 17 3 23 Current sizes (last 50 gen)a: NrecB 467 2,069 834 3,859 NrecE 2,371 1,738 1,021 4,611 NrecC 4,606 2,444 1,809 4,558 NrecW 1,129 545 348 868 Time of events (thousands of years)b: TBonobo 906 1,989 1,663 2,103 TWestern 363 603 544 633 TEasternCentral 207 164 139 182 TMigStopBonobo - 125 82 146 TMigStopWestern 95 112 50 123 TBotlB 338 555 497 603 TBotlW 306 221 187 246 Migration rates between common chimpsc: NmCE 1.41 1.32 0.99 1.82 NmEC 4.51 4.92 4.15 4.91 NmWE 0.28 0.66 0.19 0.96 NmEW 0.01 0.00 0.00 0.03 NmWC 0.00 0.02 0.00 0.21 NmCW 0.28 0.00 0.00 0.09 NmaECW 0.01 0.14 0.08 0.19 NmWaEC 0.00 0.93 0.67 0.98 Migration rates with Bonoboc: NmBC - 0.00 0.00 0.16 NmCB - 0.09 0.00 0.22 NmBaEC - 0.06 0.04 0.10 NmaECB - 0.08 0.03 0.10 NmBaC - 0.01 0.00 0.02 NmaCB - 0.18 0.00 0.44

Page 118: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 118

Table S16 Inferred migration rates between Bonobo and central/ancestors of central-eastern chimpanzees at different time periods, for the best model with western chimpanzees. Point estimates and 95% confidence intervals for the forward in time scaled migration rates (2Nm) are shown, with values lower than 10-2 in scientific notation.

Migration direction

Period 1

eastern split (95% CI: 139-182 kya) until Bonobo migration stops (95% CI:

50-123 kya)

Period 2

western split (95% CI: 544-633 kya) until eastern split (95% CI: 139-182 kya)

point estimate 95%CI point

estimate 95%CI

Bonobo into central or central-eastern

ancestors (2Nm) 3.4e-3 6.9e-5 0.16 0.06 0.04 0.10

Central or central-eastern ancestors into

Bonobo (2Nm) 0.09 1.3e-4 0.22 0.05 0.03 0.08

Page 119: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 119

Table S17 Log-likelihood values obtained for the models with Nigeria-Cameroon chimpanzee. The models correspond to the schematic representations in Fig. S49. The maximum observed likelihood (Max Obs Lhood) corresponds to the likelihood of a perfect fit to the observed data. The maximum estimated likelihood (Max Estimated Lhood) is the likelihood estimated based on the average of 100 expected SFS (approximated with 106 coalescent simulations) according to the parameters that maximized the likelihood in the ECM optimization. The smaller the difference (ΔLhood) between the Max Estimated Lhood and Max Obs Lhood the better the fit. The likelihood values are reported in log10 scale.

Model tag Model #params Max Obs Lhood

Max Estimated Lhood ΔLhood

Mig Bon Direct migration between

bonobo and central chimpanzees

40 -10,319,592 -10,332,968 -13,376

No Mig Bon No Migration with bonobo 33 -10,319,592 -10,341,785 -22,193

Page 120: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 120

Table S18 Point estimates for the parameters of the three models obtained for the models with bonobo, eastern, central and Nigeria-Cameroon. The parameter tags are described in Fig S54. a) All effective sizes given as number of diploid individuals. b) Times in thousands of years ago assuming a mutation rate of 1.2x10-8mutatins/generation/site and a generation time of 25 years. c) Forward in time migration rates. Migration rates for bonobo and western chimpanzees defined as NancB*mig and NancW*mig. Where these populations have a size NBon and Nwest the migration rates need to be adjusted.

Models

Parameters No gene flow with

bonobo Gene flow between bonobo and both central and

western

Point estimates Point est. 95% CI Ancestral effective sizesa: Nanc 25,056 15,299 12,659 17,899 NancCChimp 34,124 19,326 17,205 21,404 Past sizesa: NancCE 117,477 157,757 126,103 308,595 NBon 29,197 17,490 15,707 20,155 NancB 15,495 2,947 1,581 3,612 NEast 9,950 10,634 8,625 14,057 NCent 53,722 33,578 26,420 40,800 NNigCam 11,839 13,279 10,123 14,394 NancNigCam 18,771 12,435 5,127 44,186 Bottleneck sizes (during 100 generations)a: NBotlSplitCChimp 59 19 2 28 NBotlSplitNc 1,367 252 90 1,934 NBotlSplitE 705 561 278 755 NBotlSplitC 2,019 2,133 1,697 2,739 NBotlNc 77 790 253 2,115 NBotlB 40 947 615 2,292 Current sizes (last 50 gen)a: NrecB 474 1,369 687 3,674 NrecE 4,398 2,150 1,225 4,751 NrecC 1,569 2,705 1,686 4,145 NrecNc 753 453 306 523 Time of events (thousands of years)b: TBonobo 869 1,562 1,484 1,776 TNigCam 417 429 400 469 TEasternCentral 276 104 90 134 TMigStopBonobo 0 77 55 96 TMigStopNigCam 20 15 4 22 TBotlB 354 439 379 500 TBotlNigCam 382 86 94 415 Migration rates between common chimpsc: NmCE 1.73 1.17 0.94 1.60 NmEC 4.83 4.34 3.82 4.91 NmNE 0.47 0.57 0.51 0.86 NmEN 0.14 0.45 0.10 0.41 NmNC 0.96 0.76 0.32 0.97 NmCN 1.08 1.90 1.52 2.35 NmaECN 0.03 0.01 0.00 0.23 NmNaEC 0.01 0.02 0.00 0.24 Migration rates with Bonoboc: NmBC - 0.24 0.07 0.35 NmCB - 0.14 0.07 0.29 NmBaEC - 0.00 0.00 0.03 NmaECB - 0.00 0.00 0.00 NmBaC - 0.03 0.02 0.05 NmaCB - 0.03 0.01 0.04

Page 121: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

deManuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material 121

Table S19 Inferred migration rates between Bonobo and central/ancestors of central-eastern chimpanzees at different time periods, for the best model with Nigeria-Cameroon chimpanzees. Point estimates and 95% confidence intervals for the forward in time scaled migration rates (2Nm) are shown, with values lower than 10-3 in scientific notation.

Migration direction

Period 1

central-eastern split (95% CI: 90-134 kya) until Bonobo migration stops

(95% CI: 55-96 kya)

Period 2

western split (95% CI: 400-469 kya) until central-eastern split (95% CI: 90-

134 kya)

point estimate 95%CI point

estimate 95%CI

Bonobo into central or central-eastern ancestors (2Nm)

0.24 0.07 0.35 4.1e-5 6.4e-6 0.031

Central or central-eastern ancestors

into Bonobo (2Nm) 0.14 0.07 0.29 3.4e-3 5.0e-5 0.017

Page 122: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

References and Notes 1. S. McBrearty, N. G. Jablonski, First fossil chimpanzee. Nature 437, 105–108 (2005). Medline

doi:10.1038/nature04008

2. C. Hvilsom, F. Carlsen, R. Heller, N. Jaffré, H. R. Siegismund, Contrasting demographic histories of the neighboring bonobo and chimpanzee. Primates 55, 101–112 (2014). Medline doi:10.1007/s10329-013-0373-3

3. I. Lobon, S. Tucci, M. de Manuel, S. Ghirotto, A. Benazzo, J. Prado-Martinez, B. Lorente-Galdos, K. Nam, M. Dabad, J. Hernandez-Rodriguez, D. Comas, A. Navarro, M. H. Schierup, A. M. Andres, G. Barbujani, C. Hvilsom, T. Marques-Bonet, Demographic history of the genus Pan inferred from whole mitochondrial genome reconstructions. Genome Biol. Evol. 8, 2020–2030 (2016). Medline doi:10.1093/gbe/evw124

4. J. L. Caswell, S. Mallick, D. J. Richter, J. Neubauer, C. Schirmer, S. Gnerre, D. Reich, Analysis of chimpanzee history based on genome sequence alignments. PLOS Genet. 4, e1000057 (2008). Medline doi:10.1371/journal.pgen.1000057

5. A. Fischer, K. Prüfer, J. M. Good, M. Halbwax, V. Wiebe, C. André, R. Atencia, L. Mugisha, S. E. Ptak, S. Pääbo, Bonobos fall within the genomic variation of chimpanzees. PLOS ONE 6, e21605 (2011). Medline doi:10.1371/journal.pone.0021605

6. C. Becquet, N. Patterson, A. C. Stone, M. Przeworski, D. Reich, Genetic structure of chimpanzee populations. PLOS Genet. 3, e66 (2007). Medline doi:10.1371/journal.pgen.0030066

7. T. Fünfstück, M. Arandjelovic, D. B. Morgan, C. Sanz, P. Reed, S. H. Olson, K. Cameron, A. Ondzie, M. Peeters, L. Vigilant, The sampling scheme matters: Pan troglodytes troglodytes and P. t. schweinfurthii are characterized by clinal genetic variation rather than a strong subspecies break. Am. J. Phys. Anthropol. 156, 181–191 (2015). Medline doi:10.1002/ajpa.22638

8. J. Prado-Martinez, P. H. Sudmant, J. M. Kidd, H. Li, J. L. Kelley, B. Lorente-Galdos, K. R. Veeramah, A. E. Woerner, T. D. O’Connor, G. Santpere, A. Cagan, C. Theunert, F. Casals, H. Laayouni, K. Munch, A. Hobolth, A. E. Halager, M. Malig, J. Hernandez-Rodriguez, I. Hernando-Herraez, K. Prüfer, M. Pybus, L. Johnstone, M. Lachmann, C. Alkan, D. Twigg, N. Petit, C. Baker, F. Hormozdiari, M. Fernandez-Callejo, M. Dabad, M. L. Wilson, L. Stevison, C. Camprubí, T. Carvalho, A. Ruiz-Herrera, L. Vives, M. Mele, T. Abello, I. Kondova, R. E. Bontrop, A. Pusey, F. Lankester, J. A. Kiyang, R. A. Bergl, E. Lonsdorf, S. Myers, M. Ventura, P. Gagneux, D. Comas, H. Siegismund, J. Blanc, L. Agueda-Calpena, M. Gut, L. Fulton, S. A. Tishkoff, J. C. Mullikin, R. K. Wilson, I. G. Gut, M. K. Gonder, O. A. Ryder, B. H. Hahn, A. Navarro, J. M. Akey, J. Bertranpetit, D. Reich, T. Mailund, M. H. Schierup, C. Hvilsom, A. M. Andrés, J. D. Wall, C. D. Bustamante, M. F. Hammer, E. E. Eichler, T. Marques-Bonet, Great ape genetic diversity and population history. Nature 499, 471–475 (2013). Medline doi:10.1038/nature12228

9. H. Vervaecke, L. Van Elsacker, Hybrids between common chimpanzees (Pan troglodytes) and pygmy chimpanzees (Pan paniscus) in captivity. Mammalia 56, 667–669 (1992).

10. Materials and methods are available as supplementary materials on Science Online.

11. H. Li, R. Durbin, Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011). Medline doi:10.1038/nature10231

Page 123: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

12. L. Excoffier, I. Dupanloup, E. Huerta-Sánchez, V. C. Sousa, M. Foll, Robust demographic inference from genomic and SNP data. PLOS Genet. 9, e1003905 (2013). doi:10.1371/journal.pgen.1003905

13. S. K. Wasser, L. Brown, C. Mailand, S. Mondol, W. Clark, C. Laurie, B. S. Weir, Genetic assignment of large seizures of elephant ivory reveals Africa’s major poaching hotspots. Science 349, 84–87 (2015). Medline doi:10.1126/science.aaa2457

14. R. E. Green, J. Krause, A. W. Briggs, T. Maricic, U. Stenzel, M. Kircher, N. Patterson, H. Li, W. Zhai, M. H. Fritz, N. F. Hansen, E. Y. Durand, A. S. Malaspinas, J. D. Jensen, T. Marques-Bonet, C. Alkan, K. Prüfer, M. Meyer, H. A. Burbano, J. M. Good, R. Schultz, A. Aximu-Petri, A. Butthof, B. Höber, B. Höffner, M. Siegemund, A. Weihmann, C. Nusbaum, E. S. Lander, C. Russ, N. Novod, J. Affourtit, M. Egholm, C. Verna, P. Rudan, D. Brajkovic, Z. Kucan, I. Gusic, V. B. Doronichev, L. V. Golovanova, C. Lalueza-Fox, M. de la Rasilla, J. Fortea, A. Rosas, R. W. Schmitz, P. L. Johnson, E. E. Eichler, D. Falush, E. Birney, J. C. Mullikin, M. Slatkin, R. Nielsen, J. Kelso, M. Lachmann, D. Reich, S. Pääbo, A draft sequence of the Neandertal genome. Science 328, 710–722 (2010). Medline doi:10.1126/science.1188021

15. K. Prüfer, F. Racimo, N. Patterson, F. Jay, S. Sankararaman, S. Sawyer, A. Heinze, G. Renaud, P. H. Sudmant, C. de Filippo, H. Li, S. Mallick, M. Dannemann, Q. Fu, M. Kircher, M. Kuhlwilm, M. Lachmann, M. Meyer, M. Ongyerth, M. Siebauer, C. Theunert, A. Tandon, P. Moorjani, J. Pickrell, J. C. Mullikin, S. H. Vohr, R. E. Green, I. Hellmann, P. L. Johnson, H. Blanche, H. Cann, J. O. Kitzman, J. Shendure, E. E. Eichler, E. S. Lein, T. E. Bakken, L. V. Golovanova, V. B. Doronichev, M. V. Shunkov, A. P. Derevianko, B. Viola, M. Slatkin, D. Reich, J. Kelso, S. Pääbo, The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014). Medline doi:10.1038/nature12886

16. Q. Fu, M. Hajdinjak, O. T. Moldovan, S. Constantin, S. Mallick, P. Skoglund, N. Patterson, N. Rohland, I. Lazaridis, B. Nickel, B. Viola, K. Prüfer, M. Meyer, J. Kelso, D. Reich, S. Pääbo, An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216–219 (2015). Medline doi:10.1038/nature14558

17. M. Kuhlwilm, I. Gronau, M. J. Hubisz, C. de Filippo, J. Prado-Martinez, M. Kircher, Q. Fu, H. A. Burbano, C. Lalueza-Fox, M. de la Rasilla, A. Rosas, P. Rudan, D. Brajkovic, Ž. Kucan, I. Gušic, T. Marques-Bonet, A. M. Andrés, B. Viola, S. Pääbo, M. Meyer, A. Siepel, S. Castellano, Ancient gene flow from early modern humans into Eastern Neanderthals. Nature 530, 429–433 (2016). Medline doi:10.1038/nature16544

18. K. Prüfer, K. Munch, I. Hellmann, K. Akagi, J. R. Miller, B. Walenz, S. Koren, G. Sutton, C. Kodira, R. Winer, J. R. Knight, J. C. Mullikin, S. J. Meader, C. P. Ponting, G. Lunter, S. Higashino, A. Hobolth, J. Dutheil, E. Karakoç, C. Alkan, S. Sajjadian, C. R. Catacchio, M. Ventura, T. Marques-Bonet, E. E. Eichler, C. André, R. Atencia, L. Mugisha, J. Junhold, N. Patterson, M. Siebauer, J. M. Good, A. Fischer, S. E. Ptak, M. Lachmann, D. E. Symer, T. Mailund, M. H. Schierup, A. M. Andrés, J. Kelso, S. Pääbo, The bonobo genome compared with the chimpanzee and human genomes. Nature 486, 527–531 (2012). Medline

19. Y.-J. Won, J. Hey, Divergence population genetics of chimpanzees. Mol. Biol. Evol. 22, 297–307 (2005). Medline doi:10.1093/molbev/msi017

20. C. Becquet, M. Przeworski, A new approach to estimate parameters of speciation models with application to apes. Genome Res. 17, 1505–1519 (2007). Medline doi:10.1101/gr.6409707

Page 124: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

21. G. McVicker, D. Gordon, C. Davis, P. Green, Widespread genomic signatures of natural selection in hominid evolution. PLOS Genet. 5, e1000471 (2009). Medline doi:10.1371/journal.pgen.1000471

22. S. Sankararaman, S. Mallick, M. Dannemann, K. Prüfer, J. Kelso, S. Pääbo, N. Patterson, D. Reich, The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014). Medline doi:10.1038/nature12961

23. B. Vernot, S. Tucci, J. Kelso, J. G. Schraiber, A. B. Wolf, R. M. Gittelman, M. Dannemann, S. Grote, R. C. McCoy, H. Norton, L. B. Scheinfeldt, D. A. Merriwether, G. Koki, J. S. Friedlaender, J. Wakefield, S. Pääbo, J. M. Akey, Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science 352, 235–239 (2016). Medline doi:10.1126/science.aad9416

24. J. K. Pickrell, J. K. Pritchard, Inference of population splits and mixtures from genome-wide allele frequency data. PLOS Genet. 8, e1002967 (2012). Medline doi:10.1371/journal.pgen.1002967

25. M. D. Rasmussen, M. J. Hubisz, I. Gronau, A. Siepel, Genome-wide inference of ancestral recombination graphs. PLOS Genet. 10, e1004342 (2014). Medline

26. M. Meyer, M. Kircher, Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 10.1101/pdb.prot5448 (2010). Medline doi:10.1101/pdb.prot5448

27. M. Kircher, S. Sawyer, M. Meyer, Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 (2012). Medline doi:10.1093/nar/gkr771

28. Q. Fu, M. Meyer, X. Gao, U. Stenzel, H. A. Burbano, J. Kelso, S. Pääbo, DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl. Acad. Sci. U.S.A. 110, 2223–2227 (2013). Medline doi:10.1073/pnas.1221359110

29. G. Renaud, U. Stenzel, J. Kelso, leeHom: Adaptor trimming and merging for Illumina sequencing reads. Nucleic Acids Res. 42, e141 (2014). Medline doi:10.1093/nar/gku699

30. H. Li, R. Durbin, Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010). Medline doi:10.1093/bioinformatics/btp698

31. E. Garrison, G. Marth, https://arxiv.org/abs/1207.3907 (2012).

32. S. Marco-Sola, M. Sammeth, R. Guigó, P. Ribeca, The GEM mapper: Fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1185–1188 (2012). Medline doi:10.1038/nmeth.2221

33. O. Delaneau, J. Marchini, J.-F. Zagury, A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012). Medline doi:10.1038/nmeth.1785

34. A. Auton, A. Fledel-Alon, S. Pfeifer, O. Venn, L. Ségurel, T. Street, E. M. Leffler, R. Bowden, I. Aneas, J. Broxholme, P. Humburg, Z. Iqbal, G. Lunter, J. Maller, R. D. Hernandez, C. Melton, A. Venkat, M. A. Nobrega, R. Bontrop, S. Myers, P. Donnelly, M. Przeworski, G. McVean, A fine-scale chimpanzee genetic map from population sequencing. Science 336, 193–198 (2012). Medline

35. R. M. Kuhn, D. Haussler, W. J. Kent, The UCSC genome browser and associated tools. Brief. Bioinform. 14, 144–161 (2013). Medline doi:10.1093/bib/bbs038

Page 125: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

36. B. Paten, J. Herrero, S. Fitzgerald, K. Beal, P. Flicek, I. Holmes, E. Birney, Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008). Medline doi:10.1101/gr.076521.108

37. B. Paten, J. Herrero, K. Beal, S. Fitzgerald, E. Birney, Enredo and Pecan: Genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 18, 1814–1828 (2008). Medline doi:10.1101/gr.076554.108

38. Chimpanzee Sequencing and Analysis Consortium, Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005). Medline doi:10.1038/nature04072

39. A. Manichaikul, J. C. Mychaleckyj, S. S. Rich, K. Daly, M. Sale, W. M. Chen, Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010). Medline doi:10.1093/bioinformatics/btq559

40. S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, M. A. Ferreira, D. Bender, J. Maller, P. Sklar, P. I. de Bakker, M. J. Daly, P. C. Sham, PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). Medline doi:10.1086/519795

41. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, 1000 Genome Project Data Processing Subgroup, The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). Medline doi:10.1093/bioinformatics/btp352

42. J. F. Hughes, H. Skaletsky, T. Pyntikova, T. A. Graves, S. K. van Daalen, P. J. Minx, R. S. Fulton, S. D. McGrath, D. P. Locke, C. Friedman, B. J. Trask, E. R. Mardis, W. C. Warren, S. Repping, S. Rozen, R. K. Wilson, D. C. Page, Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content. Nature 463, 536–539 (2010). Medline doi:10.1038/nature08700

43. P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin, 1000 Genomes Project Analysis Group, The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011). Medline doi:10.1093/bioinformatics/btr330

44. L. Excoffier, H. E. L. Lischer, Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 (2010). Medline doi:10.1111/j.1755-0998.2010.02847.x

45. P. Hallast, P. Maisano Delser, C. Batini, D. Zadik, M. Rocchi, W. Schempp, C. Tyler-Smith, M. A. Jobling, Great ape Y Chromosome and mitochondrial DNA phylogenies reflect subspecies structure and patterns of mating and dispersal. Genome Res. 26, 427–439 (2016). Medline doi:10.1101/gr.198754.115

46. J. Felsenstein, Phylip: Phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989).

47. A. J. Drummond, M. A. Suchard, D. Xie, A. Rambaut, Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973 (2012). Medline doi:10.1093/molbev/mss075

Page 126: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

48. A. Helgason, A. W. Einarsson, V. B. Guðmundsdóttir, Á. Sigurðsson, E. D. Gunnarsdóttir, A. Jagadeesan, S. S. Ebenesersdóttir, A. Kong, K. Stefánsson, The Y-chromosome point mutation rate in humans. Nat. Genet. 47, 453–457 (2015). Medline doi:10.1038/ng.3171

49. K. E. Langergraber, K. Prüfer, C. Rowney, C. Boesch, C. Crockford, K. Fawcett, E. Inoue, M. Inoue-Muruyama, J. C. Mitani, M. N. Muller, M. M. Robbins, G. Schubert, T. S. Stoinski, B. Viola, D. Watts, R. M. Wittig, R. W. Wrangham, K. Zuberbühler, S. Pääbo, L. Vigilant, Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proc. Natl. Acad. Sci. U.S.A. 109, 15716–15721 (2012). Medline doi:10.1073/pnas.1211740109

50. K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013). Medline doi:10.1093/molbev/mst197

51. H. Kaessmann, V. Wiebe, G. Weiss, S. Pääbo, Great ape DNA sequences reveal a reduced diversity and an expansion in humans. Nat. Genet. 27, 155–156 (2001). Medline doi:10.1038/84773

52. B. L. Browning, S. R. Browning, Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013). Medline doi:10.1534/genetics.113.150029

53. A. Fischer, V. Wiebe, S. Pääbo, M. Przeworski, Evidence for a complex demographic history of chimpanzees. Mol. Biol. Evol. 21, 799–808 (2004). Medline doi:10.1093/molbev/msh083

54. D. E. Reich, M. Cargill, S. Bolk, J. Ireland, P. C. Sabeti, D. J. Richter, T. Lavery, R. Kouyoumjian, S. F. Farhadian, R. Ward, E. S. Lander, Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001). Medline doi:10.1038/35075590

55. B. M. Henn, C. R. Gignoux, M. Jobin, J. M. Granka, J. M. Macpherson, J. M. Kidd, L. Rodríguez-Botigué, S. Ramachandran, L. Hon, A. Brisbin, A. A. Lin, P. A. Underhill, D. Comas, K. K. Kidd, P. J. Norman, P. Parham, C. D. Bustamante, J. L. Mountain, M. W. Feldman, Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl. Acad. Sci. U.S.A. 108, 5154–5162 (2011). Medline doi:10.1073/pnas.1017511108

56. L. M. Shannon, R. H. Boyko, M. Castelhano, E. Corey, J. J. Hayward, C. McLean, M. E. White, M. Abi Said, B. A. Anita, N. I. Bondjengo, J. Calero, A. Galov, M. Hedimbi, B. Imam, R. Khalap, D. Lally, A. Masta, K. C. Oliveira, L. Pérez, J. Randall, N. M. Tam, F. J. Trujillo-Cornejo, C. Valeriano, N. B. Sutter, R. J. Todhunter, C. D. Bustamante, A. R. Boyko, Genetic structure in village dogs reveals a Central Asian domestication origin. Proc. Natl. Acad. Sci. U.S.A. 112, 13639–13644 (2015). Medline doi:10.1073/pnas.1516215112

57. M. Slatkin, Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nat. Rev. Genet. 9, 477–485 (2008). Medline doi:10.1038/nrg2361

58. B. M. Peter, M. Slatkin, Detecting range expansions from genetic data. Evolution 67, 3274–3289 (2013). Medline doi:10.1111/evo.12202

59. M. K. Gonder, S. Locatelli, L. Ghobrial, M. W. Mitchell, J. T. Kujawski, F. J. Lankester, C. B. Stewart, S. A. Tishkoff, Evidence from Cameroon reveals differences in the genetic structure and histories of chimpanzee populations. Proc. Natl. Acad. Sci. U.S.A. 108, 4766–4771 (2011). Medline doi:10.1073/pnas.1015422108

Page 127: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

60. D. Wegmann, L. Excoffier, Bayesian inference of the demographic history of chimpanzees. Mol. Biol. Evol. 27, 1425–1435 (2010). Medline doi:10.1093/molbev/msq028

61. M. W. Mitchell, S. Locatelli, L. Ghobrial, A. A. Pokempner, P. R. Sesink Clee, E. E. Abwe, A. Nicholas, L. Nkembi, N. M. Anthony, B. J. Morgan, R. Fotso, M. Peeters, B. H. Hahn, M. K. Gonder, The population genetics of wild chimpanzees in Cameroon and Nigeria suggests a positive role for selection in the evolution of chimpanzee subspecies. BMC Evol. Biol. 15, 3 (2015). Medline doi:10.1186/s12862-014-0276-y

62. J. Oksanen et al., Package “vegan.” R Package version 2.0–8, 254 (2013).

63. E. Frichot, F. Mathieu, T. Trouillon, G. Bouchard, O. François, Fast and efficient estimation of individual ancestry coefficients. Genetics 196, 973–983 (2014). Medline doi:10.1534/genetics.113.160572

64. H. Tang, J. Peng, P. Wang, N. J. Risch, Estimation of individual admixture: Analytical and study design considerations. Genet. Epidemiol. 28, 289–301 (2005). Medline doi:10.1002/gepi.20064

65. D. H. Alexander, J. Novembre, K. Lange, Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009). Medline doi:10.1101/gr.094052.109

66. D. J. Lawson, G. Hellenthal, S. Myers, D. Falush, Inference of population structure using dense haplotype data. PLOS Genet. 8, e1002453 (2012). Medline doi:10.1371/journal.pgen.1002453

67. S. Leslie, B. Winney, G. Hellenthal, D. Davison, A. Boumertit, T. Day, K. Hutnik, E. C. Royrvik, B. Cunliffe, Wellcome Trust Case Control Consortium 2, International Multiple Sclerosis Genetics Consortium, D. J. Lawson, D. Falush, C. Freeman, M. Pirinen, S. Myers, M. Robinson, P. Donnelly, W. Bodmer, The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015). Medline doi:10.1038/nature14230

68. S. Schiffels, R. Durbin, Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014). Medline doi:10.1038/ng.3015

69. P. R. Staab, S. Zhu, D. Metzler, G. Lunter, scrm: Efficiently simulating long sequences using the approximated coalescent with recombination. Bioinformatics 31, 1680–1682 (2015). Medline doi:10.1093/bioinformatics/btu861

70. R. Burgess, Z. Yang, Estimation of hominoid ancestral population sizes under Bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol. Biol. Evol. 25, 1979–1994 (2008). Medline doi:10.1093/molbev/msn148

71. T. Miyata, H. Hayashida, K. Kuma, K. Mitsuyasu, T. Yasunaga, Male-driven molecular evolution: A model and nucleotide sequence analysis. Cold Spring Harb. Symp. Quant. Biol. 52, 863–867 (1987). Medline doi:10.1101/SQB.1987.052.01.094

72. O. Venn, I. Turner, I. Mathieson, N. de Groot, R. Bontrop, G. McVean, Strong male bias drives germline mutation in chimpanzees. Science 344, 1272–1275 (2014). Medline doi:10.1126/science.344.6189.1272

73. N. Patterson, P. Moorjani, Y. Luo, S. Mallick, N. Rohland, Y. Zhan, T. Genschoreck, T. Webster, D. Reich, Ancient admixture in human history. Genetics 192, 1065–1093 (2012). Medline doi:10.1534/genetics.112.145037

Page 128: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

74. M. Slatkin, J. L. Pollack, Subdivision in an ancestral species creates asymmetry in gene trees. Mol. Biol. Evol. 25, 2241–2246 (2008). Medline doi:10.1093/molbev/msn172

75. E. Y. Durand, N. Patterson, D. Reich, M. Slatkin, Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011). Medline doi:10.1093/molbev/msr048

76. B. Vernot, J. M. Akey, Resurrecting surviving Neandertal lineages from modern human genomes. Science 343, 1017–1021 (2014). Medline doi:10.1126/science.1245938

77. K. Prüfer, B. Muetzel, H. H. Do, G. Weiss, P. Khaitovich, E. Rahm, S. Pääbo, M. Lachmann, W. Enard, FUNC: A package for detecting significant associations between gene sets and ontological annotations. BMC Bioinformatics 8, 41 (2007). Medline doi:10.1186/1471-2105-8-41

78. F. Racimo, S. Sankararaman, R. Nielsen, E. Huerta-Sánchez, Evidence for archaic adaptive introgression in humans. Nat. Rev. Genet. 16, 359–371 (2015). Medline doi:10.1038/nrg3936

79. R. R. Hudson, Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002). Medline doi:10.1093/bioinformatics/18.2.337

80. S. P. Otto, M. C. Whitlock, in eLS (John Wiley & Sons, 2013); http://onlinelibrary.wiley.com/doi/10.1002/9780470015902.a0005464.pub3/full.

81. M. A. Yang, A. S. Malaspinas, E. Y. Durand, M. Slatkin, Ancient structure in Africa unlikely to explain Neanderthal and non-African genetic similarity. Mol. Biol. Evol. 29, 2987–2995 (2012). Medline doi:10.1093/molbev/mss117

82. V. Sousa, J. Hey, Understanding the origin of species with genome-scale data: Modelling gene flow. Nat. Rev. Genet. 14, 404–414 (2013). Medline doi:10.1038/nrg3446

83. R. D. Hernandez, J. L. Kelley, E. Elyashiv, S. C. Melton, A. Auton, G. McVean, G. Sella, 1000 Genomes Project, M. Przeworski, Classic selective sweeps were rare in recent human evolution. Science 331, 920–924 (2011). Medline doi:10.1126/science.1198878

84. R. Nielsen, Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154, 931–942 (2000). Medline

85. A. M. Adams, R. R. Hudson, Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics 168, 1699–1712 (2004). Medline doi:10.1534/genetics.104.030171

86. X. L. Meng, D. B. Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80, 267–278 (1993). doi:10.1093/biomet/80.2.267

87. F. Cunningham, M. R. Amode, D. Barrell, K. Beal, K. Billis, S. Brent, D. Carvalho-Silva, P. Clapham, G. Coates, S. Fitzgerald, L. Gil, C. G. Girón, L. Gordon, T. Hourlier, S. E. Hunt, S. H. Janacek, N. Johnson, T. Juettemann, A. K. Kähäri, S. Keenan, F. J. Martin, T. Maurel, W. McLaren, D. N. Murphy, R. Nag, B. Overduin, A. Parker, M. Patricio, E. Perry, M. Pignatelli, H. S. Riat, D. Sheppard, K. Taylor, A. Thormann, A. Vullo, S. P. Wilder, A. Zadissa, B. L. Aken, E. Birney, J. Harrow, R. Kinsella, M. Muffato, M. Ruffier, S. M. Searle, G. Spudich, S. J. Trevanion, A. Yates, D. R. Zerbino, P. Flicek, Ensembl 2015. Nucleic Acids Res. 43, D662–D669 (2015). Medline doi:10.1093/nar/gku1010

Page 129: Supplementary Materials for - Science...2016/10/27  · 3 de Manuel, Kuhlwilm, Frandsen et al. 2016 Supplementary Material Materials and Methods 1. Data generation 1.1 Novel sequencing:

88. K. R. Rosenbloom, J. Armstrong, G. P. Barber, J. Casper, H. Clawson, M. Diekhans, T. R. Dreszer, P. A. Fujita, L. Guruvadoo, M. Haeussler, R. A. Harte, S. Heitner, G. Hickey, A. S. Hinrichs, R. Hubley, D. Karolchik, K. Learned, B. T. Lee, C. H. Li, K. H. Miga, N. Nguyen, B. Paten, B. J. Raney, A. F. Smit, M. L. Speir, A. S. Zweig, D. Haussler, R. M. Kuhn, W. J. Kent, The UCSC Genome Browser database: 2015 update. Nucleic Acids Res. 43, D670–D681 (2015). Medline doi:10.1093/nar/gku1177

89. E. V. Davydov, D. L. Goode, M. Sirota, G. M. Cooper, A. Sidow, S. Batzoglou, Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLOS Comput. Biol. 6, e1001025 (2010). Medline doi:10.1371/journal.pcbi.1001025

90. A. C. Davison, D. V. Hinkley, Bootstrap Methods and their Application (Cambridge Univ. Press, 1997).