Top Banner
The 1984 publication of a short mitochondrial DNA sequence from the quagga, a zebra-like equid that has been extinct since the 1880s, initiated the field of ancient DNA (aDNA) research 1 . Following concomitant devel- opment of PCR and realization that DNA survived in osseous materials 2 , the future of aDNA research looked bright. However, the degraded nature of aDNA 3 coupled with the sensitivity of PCR to contamination — whether derived from environmental microorganisms or human handling, and thus embedded in the samples, or in the form of laboratory and/or reagent contamination — con- tributed to a series of publications based on false-positive results. Given that these problems seriously undermined the field’s broader scientific interest and reliability until the mid-2000s, few would have expected that, by the field’s twenty-fifth birthday, the genome of an ancient human 4 and draft genomes of the extinct mammoth 5 and Neanderthals 6 would have been sequenced. Today, many tens of ancient genomes, ranging from microbial pathogens 7–13 to vertebrate genomes 14–29 (including the quagga 19 ), have been sequenced. Paleogenomics is driven by high-throughput sequenc- ing (HTS) platforms, some of which generate data from billions of short DNA fragments per run 30 . In most pale- ogenomic studies, DNA libraries are generated by ligat- ing the genomic extract to generic adaptors, amplified using PCR and then subjected to HTS using so-called second-generation sequencing platforms. This contrasts with traditional PCR-based approaches, in which loci are individually targeted and sub-amplicon-sized DNA is unexploitable. In addition to enabling whole-genome sequencing, HTS revealed how a diverse range of fos- sil specimens that were previously ignored owing to an inability to yield PCR amplicons nevertheless contained ultrashort aDNA fragments (~30–50 bp). Combining HTS with extraction methods tailored to the short, damaged aDNA molecules increased the time win- dow for aDNA sequencing by an order of magnitude to at least 1 million years in permafrozen regions 17 and 500,000 years in temperate caves 31,32 . Beyond genomes, the profiling of the epigenetic landscape (that is, epig- enomes) of these ancient samples has recently become feasible 33,34 , conferring the potential to characterize regulatory changes throughout evolutionary timescales. However, there are also difficulties in paleogenomic stud- ies. Indeed, HTS has enhanced some of the challenges, including data authentication and contaminant iden- tification, as well as accounting for inflated error rates caused by damaged nucleotides. In this Review, we discuss key technological devel- opments underpinning the paleogenomic revolution (FIG. 1) and describe post-mortem damage types com- mon to aDNA and how they can be accounted for (and even exploited). Furthermore, we discuss how aDNA targets can be enriched relative to other DNA, how the resulting sequences can be analysed, and recent progress in characterizing ancient epigenomes. Throughout, we highlight current limitations and provide perspectives for future developments. As most advances relate to human calcified tissues (bones and teeth), we principally focus on these. Some of the key findings addressing long- standing debates in our own global population history 1 Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5–7, Copenhagen 1350C, Denmark. 2 Université de Toulouse, University Paul Sabatier (UPS), Laboratoire AMIS, CNRS UMR 5288, 37 allées Jules Guesde, 31000 Toulouse, France. 3 Trace and Environmental DNA Laboratory, Department of Environment and Agriculture, Curtin University, Perth, Western Australia 6102, Australia. Correspondence to L.O. e‑mail: [email protected] doi:10.1038/nrg3935 Published online 9 June 2015 Osseous materials Calcified animal tissues, such as bones and teeth. Reconstructing ancient genomes and epigenomes Ludovic Orlando 1,2 , M. Thomas P. Gilbert 1,3 and Eske Willerslev 1 Research involving ancient DNA (aDNA) has experienced a true technological revolution in recent years through advances in the recovery of aDNA and, particularly, through applications of high-throughput sequencing. Formerly restricted to the analysis of only limited amounts of genetic information, aDNA studies have now progressed to whole-genome sequencing for an increasing number of ancient individuals and extinct species, as well as to epigenomic characterization. Such advances have enabled the sequencing of specimens of up to 1 million years old, which, owing to their extensive DNA damage and contamination, were previously not amenable to genetic analyses. In this Review, we discuss these varied technical challenges and solutions for sequencing ancient genomes and epigenomes. APPLICATIONS OF NEXT-GENERATION SEQUENCING REVIEWS NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 395 © 2015 Macmillan Publishers Limited. All rights reserved
14

Reconstructing ancient genomes and epigenomesbiology-web.nmsu.edu/~houde/reconstructing ancient... · Reconstructing ancient genomes and epigenomes Ludovic Orlando 1,2, M. Thomas

Jul 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • The 1984 publication of a short mitochondrial DNA sequence from the quagga, a zebra-like equid that has been extinct since the 1880s, initiated the field of ancient DNA (aDNA) research1. Following concomitant devel-opment of PCR and realization that DNA survived in osseous materials2, the future of aDNA research looked bright. However, the degraded nature of aDNA3 coupled with the sensitivity of PCR to contamination — whether derived from environmental microorganisms or human handling, and thus embedded in the samples, or in the form of laboratory and/or reagent contamination — con-tributed to a series of publications based on false-positive results. Given that these problems seriously undermined the field’s broader scientific interest and reliability until the mid-2000s, few would have expected that, by the field’s twenty-fifth birthday, the genome of an ancient human4 and draft genomes of the extinct mammoth5 and Neanderthals6 would have been sequenced. Today, many tens of ancient genomes, ranging from microbial pathogens7–13 to vertebrate genomes14–29 (including the quagga19), have been sequenced.

    Paleogenomics is driven by high-throughput sequenc-ing (HTS) platforms, some of which generate data from billions of short DNA fragments per run30. In most pale-ogenomic studies, DNA libraries are generated by ligat-ing the genomic extract to generic adaptors, amplified using PCR and then subjected to HTS using so-called second-generation sequencing platforms. This contrasts with traditional PCR-based approaches, in which loci are individually targeted and sub-amplicon-sized DNA is unexploitable. In addition to enabling whole-genome

    sequencing, HTS revealed how a diverse range of fos-sil specimens that were previously ignored owing to an inability to yield PCR amplicons nevertheless contained ultrashort aDNA fragments (~30–50 bp). Combining HTS with extraction methods tailored to the short, damaged aDNA molecules increased the time win-dow for aDNA sequencing by an order of magnitude to at least 1 million years in permafrozen regions17 and 500,000 years in temperate caves31,32. Beyond genomes, the profiling of the epi genetic landscape (that is, epig-enomes) of these ancient samples has recently become feasible33,34, conferring the potential to characterize regulatory changes throughout evolutionary timescales. However, there are also difficulties in paleogenomic stud-ies. Indeed, HTS has enhanced some of the challenges, including data authentication and contaminant iden-tification, as well as accounting for inflated error rates caused by damaged nucleotides.

    In this Review, we discuss key technological devel-opments underpinning the paleogenomic revolution (FIG. 1) and describe post-mortem damage types com-mon to aDNA and how they can be accounted for (and even exploited). Furthermore, we discuss how aDNA targets can be enriched relative to other DNA, how the resulting sequences can be analysed, and recent progress in characterizing ancient epigenomes. Throughout, we highlight current limitations and provide perspectives for future developments. As most advances relate to human calcified tissues (bones and teeth), we principally focus on these. Some of the key findings addressing long-standing debates in our own global population history

    1Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5–7, Copenhagen 1350C, Denmark.2Université de Toulouse, University Paul Sabatier (UPS), Laboratoire AMIS, CNRS UMR 5288, 37 allées Jules Guesde, 31000 Toulouse, France.3Trace and Environmental DNA Laboratory, Department of Environment and Agriculture, Curtin University, Perth, Western Australia 6102, Australia.Correspondence to L.O. e‑mail: [email protected]:10.1038/nrg3935Published online 9 June 2015

    Osseous materialsCalcified animal tissues, such as bones and teeth.

    Reconstructing ancient genomes and epigenomesLudovic Orlando1,2, M. Thomas P. Gilbert1,3 and Eske Willerslev1

    Research involving ancient DNA (aDNA) has experienced a true technological revolution in recent years through advances in the recovery of aDNA and, particularly, through applications of high-throughput sequencing. Formerly restricted to the analysis of only limited amounts of genetic information, aDNA studies have now progressed to whole-genome sequencing for an increasing number of ancient individuals and extinct species, as well as to epigenomic characterization. Such advances have enabled the sequencing of specimens of up to 1 million years old, which, owing to their extensive DNA damage and contamination, were previously not amenable to genetic analyses. In this Review, we discuss these varied technical challenges and solutions for sequencing ancient genomes and epigenomes.

    A P P L I C AT I O N S O F N E X T- G E N E R AT I O N S E Q U E N C I N G

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 395

    © 2015 Macmillan Publishers Limited. All rights reserved

    mailto:[email protected]

  • Second-generation sequencingHigh-throughput short-read DNA sequencing platforms that require library construction and thus modification of the DNA before sequencing. Most commonly represented by the Illumina, GS-FLX (454), ABI SOLiD and Ion Torrent series.

    Resonance structuresDynamic, alternative forms of molecular groups, such as nucleotide bases, that result from electron delocalization within the molecule.

    are summarized in BOX 1 to illustrate the diversity of information that can be gathered, and recent literature describing other key evolutionary insights revealed by ancient genomics have been reviewed elsewhere35–38.

    aDNA damage and tailored extractionsaDNA damage. aDNA damage accumulates over time and was originally characterized using enzymatic reac-tions to reveal the presence of particular types of DNA damage (such as abasic sites and crosslinks3) or gas chro-matography experiments coupled with mass spectrome-try39. Later approaches inferred damage types on the basis of mutational patterns in sequence data40–43. Specifically, an excess of C→T mutations, and their significant reduc-tion following treatment with uracil DNA glycosylase41, revealed cytosine deamination to uracil (a thymine analogue) as the most prominent base modification.

    HTS data subsequently refined our understanding of such damage, demonstrating that deamination increases towards read termini44, consistent with expectations of faster rates in the overhanging single strands at the frag-ment termini16,44,45 (FIG. 2). HTS data also revealed that depurination drives post-mortem DNA fragmentation, as genomic positions preceding read starts (corresponding to breaks or abasic sites in aDNA molecules) often con-sist of purines44. This bias appears towards adenines for younger samples but guanines for older samples, possibly reflecting differences in fragmentation dynamics46 and/or base-specific resonance structures47. Statistical models exploiting nucleotide misincorporation patterns in HTS data sets revealed single-strand breaks in aDNA44,48, most likely at nicks or abasic sites. Finally, whereas indirect

    detection methods indicated that polymerase-blocking lesions such as interstrand crosslinks could be prominent in aDNA49, direct experimental assays based on HTS data suggested a more minor contribution50. Therefore, their general importance may be context dependent.

    Targeting ultrashort fragments. Extensive aDNA frag-mentation was documented early in the field’s his-tory, with later quantitative PCR assays revealing up to 100-fold decreases in the abundance of PCR templates for each doubling of target size51. As HTS generally allows most aDNA molecules to be sequenced over their full length, the resulting distribution represents a size-decay curve52 that enables direct quantitative comparisons of fragmentation across specimens through space, time and environmental conditions53. Although random DNA fragmentation should decrease molecule numbers expo-nentially as size increases, aDNA templates often peak at 40–80 bp before this decay is observed. The exact median length observed reflects the overall fragmentation levels experienced after death, which generally increase with the depositional temperature53,54. However, the deviation from the expected exponential decay curve for ultrashort sizes suggests that common extraction protocols do not recover, and thus do not optimally exploit, this fraction of molecules.

    This challenge was met by introducing improved silica-based extraction protocols that modify volume and composition of the DNA-binding buffer31. These methodological improvements increased recovery rates of 35–50-bp molecules by twofold to fivefold, and greatly contributed towards the sequencing of even very

    Figure 1 | Major advances in ancient genomics. The major methodological advances described in this Review are presented with respect to milestones in paleogenomics, including whole-genome sequencing and the characterization of transcriptomes, epigenomes and proteomes. Average genome fold-coverage (×) and sequencing platforms are indicated where applicable. aDNA, ancient DNA; ssDNA, single-stranded DNA.

    Nature Reviews | Genetics

    16× Paleo−Eskimo genome (Illumina)4

    Mammothproteome115

    6.8× Icemangenome(SOLiD)15

    High-throughput DNA sequencing

    True single-molecule DNA sequencing (Helicos)45

    Extraction of ultrashort DNA fragments31

    ssDNA library16,66

    Selective uracil enrichment68

    Primer extension capture69

    In-solution target enrichment; PCR probes70

    Microarray-based target enrichment73

    Whole-chromosome target enrichment77

    Whole-genome in-solution capture78

    Date range of methods

    aDNA studies and date of publication

    13-Mb-long mammothDNA (454)52

    2006 2008 2010 2012 2014

    Methylomes33

    Methylome and nucleosome maps34

    0.7× mammoth genome (454)5 52× Neanderthal genome (Illumina)22

    1.3× Neanderthal genome (454 and Illumina)6 1.1× 700,000-year-old horse genome (Illumina and Helicos)17

    30× Denisovan genome (Illumina)16 Maize kernel transcriptomes113

    20162007 2009 2011 2013 2015

    400,000-year-old mitochondrial genomes31,32

    R E V I E W S

    396 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

    © 2015 Macmillan Publishers Limited. All rights reserved

  • Pre-digestionExposure of ancient calcified materials to a short initial digestion aimed at removing substantial fractions of exogenous contaminants.

    454The initial generation of GS-FLX sequencing platforms based on pyrosequencing, before their acquisition and renaming by Roche.

    old (for example, ~400,000-year-old) specimens31,32. Furthermore, light pre-digestion of bone or tooth pow-der before full extraction on the remaining undigested matter significantly increases the relative proportion of endogenous DNA recovered45,55–58, probably by washing away microbial contaminants57 or fully liberating DNA from the matrix59. Finally, the specific tissue sampled (for example, petrosal bone versus other bones18,25 and cementum versus dentine58,60) and sampling procedures (for example, drilling at low versus high speed60) affect the quality of extracted aDNA.

    DNA library construction and amplificationGeneral recommendations. Second-generation sequenc-ing requires template molecule modification through adaptor ligation30. Both library construction and subse-quent PCR amplification represent sources of error61,62. The parts of a genome sequenced can be affected by adaptor binding biases and/or the relative efficacy of PCR enzymes to amplify the constructs. Which and where nucleotide misincorporations occur during these amplifi-cations also confer errors in resulting sequences16,44,61. For example, the Phusion polymerase, which was originally

    part of the Illumina library building procedure, preferen-tially amplifies short and relatively GC-rich templates62. The same is true for related polymerases, such as Phusion Hot Start I and II, even when high-fidelity buffers are used. This bias is reduced, or even disappears, when other polymerases are used, and Accuprime Pfx, Herculase II Fusion and Pfu Turbo Cx Hotstart currently seem to be better alternatives than the most commonly used polymerases, AmpliTaq Gold and Platinum Taq High-Fidelity62. Increasing PCR cycle number often reduces the molecular complexity of DNA libraries63; thus, poly-merases should be carefully selected, PCR amplification cycles minimized and/or independent PCR reactions undertaken in parallel to limit such biases. This has important consequences for authenticating aDNA data and quantifying post-mortem DNA damage, as expected misincorporation models require tailoring to the exact experimental procedure followed45,64.

    Double-stranded DNA libraries. Different DNA library construction methods also show clear differences in efficiency. Early aDNA libraries were based around 454-compatible blunt-end approaches42–44,52 (FIG. 3a).

    Box 1 | Human evolution insights: one of the principal achievements of ancient genomics

    An area of great interest in the study of human evolution is clarifying the admixture history and the migration routes followed by our ancestors to create contemporary patterns of genetic variation112. Study of the historical hair of an Aboriginal Australian revealed the existence of a migration from Africa or the Middle East that reached Australia and that took place 20,000–30,000 years earlier than the migration that gave rise to present-day Europeans and Asians14. The 36,200-year-old bone remains from an Upper Paleolithic man from Kostenki, Russia, were also found to be genetically closer to contemporary Europeans than to contemporary Asians, suggesting an earlier date for the separation between these populations27. The 24,000-year-old remains of a child from Mal’ta, south-central Siberia, Russia, showed strong genetic affinities not only with Europeans but also with Native Americans, indicating a mixed population ancestry for the first Americans24. The Solutrean theory, which assumed a European origin across the Atlantic for the Paleo-Indian Clovis culture in North America, could be ruled out because the 12,600-year-old cranial remains of the Anzick individual belonging to this culture shows greater genetic affinities to Native Americans than to Europeans25.

    The peopling of Europe and the effect of the agricultural revolution have also received great attention15,18,21,27,71,89,103,105,121. The main genetic components present in modern Europeans seem to have already differentiated by 36,200 years ago27, and their later dispersal involved several migration waves71,122. The expansion of the first Neolithic farmers resulted in mixing hunter-gatherer Mesolithic and near-eastern population backgrounds within western Europe ~7,500 years ago18,21,71,121. A later extensive migration took place ~4,500 years ago from the steppes and was associated with the spread of Indo-European languages into Europe71. The possibility to gather genome-wide data at population scales from ancient individuals now provides an opportunity for a fine reconstruction of population migration and admixture patterns from classical antiquity to modern times.

    In some cases, ancient genomes have revealed direct genetic continuity across different archaeological cultures, questioning theories assuming that culture only changes through the migration of peoples and not simply though the spread of ideas. The first example is provided by the Paleo-Eskimos from the New World Arctic, who represent distinct cultural units but were found to represent a single population, first replaced by Inuit

  • T/A ligationA common DNA ligation technology that relies on complementary pairing of thymine and adenine overhangs at the 3ʹ ends of the adaptors and inserts to be ligated, respectively.

    Shotgun sequencingThe sequencing of fragmented DNA in the absence of any selection strategy.

    However, as adaptor ligation is random, a fraction of the constructs do not contain both of the different adap-tors and thus cannot be sequenced using this method. Another possible limitation is adaptor dimer formation during ligation; if amplified and sequenced, these waste sequencing capacity. Illumina introduced T/A ligation to overcome this in their original library construction pro-cedure, in which aDNA fragments have an overhanging adenine added (known as A-tailing) to facilitate liga-tion to T-tailed adaptors (FIG. 3b). However, this strategy seems to be suboptimal for aDNA, mostly because tem-plates starting with thymines are less efficiently processed during ligation61. Thus the (often substantial) fraction of templates containing deaminated cytosine residues (thymine analogues) at their temini44 fails to incorpo-rate into libraries61. TruSeq libraries, which also rely on T/A ligation, have also been shown to introduce signifi-cant amounts of palindromic artefacts, whereby short sequence segments at read starts are copied towards read ends65.

    Single-stranded DNA libraries. A subsequent devel-opment was library construction directly on single-stranded DNA (ssDNA) templates16,66. In this method, DNA is denatured using heat into single strands and then ligated to a first adaptor, before extension with Bst polymerase generates the complementary strand. A second adaptor is ligated at the 3ʹ end of the comple-mentary strand, and the full construct is then amplified by PCR (FIG. 3c). Inclusion of biotin in the first adaptor allows minimal DNA loss during purification using streptavidin-coated paramagnetic beads. The develop-ment of this method enabled characterization of the

    Denisovan genome at ~30× coverage using DNA extracts generated from 40 mg of bone material16. Although the method is sometimes beneficial on highly degraded osseous materials31,32 (as both strands and every single-strand break of endogenous DNA molecules have 3ʹ ter-mini that are compatible with their incorporation into libraries), its benefit on less-degraded and non-osseous materials remains unverified.

    Enriching for aDNAaDNA extracts are metagenomic mixtures. The endog-enous DNA within most ancient specimens is usually embedded within high levels of environmental microbial DNA. Although there are notable exceptions (including some keratinized materials4,67, particularly dense bones such as the petrosal bone18,25, and intentionally preserved materials from museums or herbaria8), it is unusual for the endogenous DNA content in most calcified remains to account for more than a few percent of the total DNA content. DNA preservation and environmental microbial contamination levels can show extreme variation within a single bone. For example, extracts and libraries con-structed from a single 36,000-year-old European human bone yielded 0.1–8.0% of human DNA27, and even greater variation (0.5–27.8%) was seen using the early Native American ‘Anzick’ cranial bone25.

    High microbial contaminant DNA levels render shotgun sequencing of genomes uneconomical. Thus, several methods have been developed that improve accessibility to endogenous aDNA. These enrichment strategies are used either during library construction, by preferentially incorporating damaged aDNA frag-ments68, or after library construction, by separating

    Post-mortem base modification Post-mortem base modification

    Nature Reviews | Genetics

    Single-strand break

    3′ 5′

    Post-mortem DNA decay

    YR RY YR RYYRRYCpGm

    3′5′3′5′

    5′ 5′ 5′

    3′

    CpG

    5′ UpG TpG3′ 5′ 3′ 5′

    Overhang

    O

    O 2

    O

    OOP

    O

    O

    O N

    CH

    3′

    O

    O 2

    O

    OOP

    O

    O

    O N

    CH

    3CH

    3′

    NH

    2

    O

    OOP

    O

    O

    O CH

    3′

    OH

    Abasic site

    NH

    Figure 2 | Typical ancient DNA molecules. A diverse range of degradation reactions affect DNA post-mortem and result in extensive fragmentation (preferentially at purine nucleotides) and base modifications. The most common base modification identified in high-throughput sequencing data sets is deamination of cytosines into uracils (red), or

    thymines (blue) when cytosines were methylated (mC). Such deaminations occur much faster at overhanging ends. Other modifications include abasic sites (green) and single-strand breaks (vertical lines). The chemical structures of three damage by-products (uracils, thymines and abasic sites) are shown. R, purine; Y, pyrimidine.

    R E V I E W S

    398 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

    © 2015 Macmillan Publishers Limited. All rights reserved

  • endogenous and exogenous fractions through anneal-ing to pre-defined sets of probes (in solution69–72 or on microarrays7,73). Intended capture targets range from whole mitochondrial genomes (~16 kb31,32,69,72,74,75) or ancient commensal and pathogenic bacterial genomes (~4 Mb7,10–13) to large sets of single-nucleotide poly-morphisms (SNPs) (~400,000 SNPs71), whole exomes (~30 Mb73,76), chromosomes (~30 Mb77) and even whole

    nuclear genomes (~3 Gb29,78–80). Other approaches that have been demonstrated, although not used in the most recent relevant studies, include targeted diges-tion of environmental microbial DNA using restriction enzymes6 and primer extension capture (PEC)69. Before discussing enrichment strategies further, we highlight that currently none is able to recover 100% of the tar-get molecules, and thus they come at a cost of reduced

    Nature Reviews | Genetics

    a dsDNA library

    U

    Extension

    Heat denaturation

    Supernatant

    Adapter ligation

    ssDNA adapter ligation

    U (From strand 1)

    (From strand 2)

    (From strand 1)

    U(Strand 2)

    UA

    UA

    A

    c ssDNA library(Strand 1) U

    Adapter ligation

    Fill in

    A

    A

    A

    b A-tailed library

    End repair

    U

    Extension and A-tailing

    Adapter ligation

    A

    AA

    TT

    A

    End repair Denaturation

    U

    U

    AAAU

    U

    U

    U

    Figure 3 | Constructing ancient DNA libraries. The three most common types of ancient DNA (aDNA) libraries are shown. 5ʹ-phosphate groups are indicated with black circles, single-strand DNA breaks are shown as vertical lines, biotinylated adaptor groups are shown in red, and streptavidin-coated beads are shown in grey. a | To construct a double-stranded DNA (dsDNA) library, aDNA is first end-repaired. It is then ligated to double-stranded adaptors (blue), and the resultant nicks are filled in to construct library templates devoid of single-strand breaks. b | To construct an A-tailed DNA library, aDNA is end-repaired and then A-tailed (that is, an adenine is added to the 3ʹ ends of the strands) to facilitate subsequent ligation to T-tailed adaptors while disfavouring ligation between adaptor pairs. The adaptors are typically Y‑shaped (that is, they are complementary

    at the T-tailed end but have non-complementary arms at the other end). The use of such adaptors results in aDNA strands being flanked by distinct non-complementary adaptor sequences at each end to enable subsequent unidirectional sequencing through the aDNA fragment. Nicks resulting from ligation are filled-in through PCR post-ligation. c | To construct a single-stranded DNA (ssDNA) library, aDNA is first denatured into single strands using heat and then ligated to biotinylated single-stranded adaptors. The original DNA strand is then copied using DNA polymerase extension, and a second adaptor is ligated to enable further PCR amplification and sequencing. Purification steps are performed using streptavidin-coated paramagnetic beads. Part c adapted with permission from REF. 16, American Association for the Advancement of Science.

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 399

    © 2015 Macmillan Publishers Limited. All rights reserved

  • Primer extension capture(PEC). An enrichment technology based on the ligation of short 5ʹ-biotinylated oligonucleotides (including a 12-nucleotide-long spacer followed by a primer of 18–25 nucleotides that is designed to match a particular region of interest) to single-stranded target molecules. This is followed by a single round of polymerase-based extension so as to increase the length over which the molecules are hybridized.

    Tiled probesProbes that overlap in their positioning on the target so as to ensure that every target position is covered by more than one different probe.

    Chimeric DNA librariesRecombination between libraries containing different template molecules during library PCR amplification, resulting in hybrid (chimeric) sequences that do not represent true biological sequences.

    Double-indexed DNA librariesDNA libraries in which short (for example, 8 bp long) unique nucleotide indexes are incorporated within both adaptors used during library construction. Indexes are bordered by known sequences that serve to prime index sequencing reactions and also enable library attachment to the surface of the sequencing flow cell.

    library complexity. Therefore, the upper threshold on the maximum sequencing depth attainable from a given library is reduced, and users must consider the end goal of their analyses before determining whether capture is a sensible strategy over direct shotgun sequencing. If the goal is to sequence to high coverage, highly complex libraries showing relatively high endogenous content can be shotgun-sequenced4,6,18,22,26, but enrichment of multiple libraries is advisable in other cases71,72.

    Damaged template enrichment. One approach selec-tively targets damaged DNA molecules68 during ssDNA library preparation16,66. After the DNA strand comple-mentary to the original template is generated, con-structs are 5ʹ-phosphorylated, which enables ligation to a non-phosphorylated adaptor (FIG. 4a). Following extension with Bst polymerase to fill the nick located 5ʹ of this adaptor, treatment with uracil DNA glycosy-lase and endonuclease VIII (USER mix) is implemented to first replace deaminated cytosines with abasic sites and then to cleave out these abasic sites81. The new 3ʹ end is then dephosphorylated and used for priming a new extension. Thus, all library strands that originally harboured deaminated cytosines are reconstructed over their full length and are available in the reaction supernatant for further amplification and sequenc-ing. The undamaged DNA template fraction remains attached to streptavidin-coated paramagnetic beads and can be retained for other uses. This method has shown great specificity when applied to samples from Late Pleistocene Neanderthals showing extreme levels of deamination68. Importantly, in all extracts tested, the relative contamination from modern human DNA decreased by ~1.6-fold following selective enrichment, suggesting that undamaged templates resulting from recent manipulations of the specimen could readily be filtered. Furthermore, the endogenous content of one sample increased by 3.7–5-fold, which markedly reduced the genome sequencing cost. Future experi-ments will no doubt explore the wider potential of this method. For now, users should bear in mind that any endogenous undamaged molecules will not be retained and will thus be lost, making the method only appro-priate for the most damaged samples. Additionally, any DNA carrying damage will be enriched, potentially providing access to the genomes of associated ancient microorganisms (although these can show reduced DNA damage levels compared to their human hosts9).

    Extension-free target enrichment in solution. Target enrichment approaches based on target–probe hybrid-ization are currently widely used. These require heat denaturation of DNA libraries to enable annealing of library inserts to overlapping tiled probes along target regions. Probes can be economically generated using long-range PCR, if fresh DNA material from closely related species can be extracted70, through PCR ampli-con shearing and then ligation to a biotinylated adaptor. This probe library can be amplified (with biotinylated primers) and used in an unlimited number of enrich-ment reactions. Following annealing at stringencies that

    can be adapted depending on the phylogenetic distance between targets and probes, streptavidin-coated beads are washed to eliminate library constructs with inserts showing no genetic proximity to the targeted regions, and the final fraction is amplified and sequenced.

    This strategy has predominantly been used for sequencing mitochondrial genomes72,74,75,82–85, bacte-rial plasmids86 and short nuclear loci84. Hybridization is even successful when probes diverge from targets by 10–13%82, which is useful if no close living relative and/or reference genome is available. This can also be exploited to detect probe carry-over post-sequencing if the DNA from a distantly related organism was used for preparing probes (for example, if DNA from a European bison was used when enriching for aDNA from aurochs83). Alternatively, potential probe carry-over can be eliminated before sequencing using dedicated molecular tools. For example, replacement of deoxythy-midine triphosphate (dTTP) by deoxyuridine triphos-phate (dUTP) in probes enables subsequent digestion with uracil DNA glycosylase before amplification and sequencing83.

    Biotinylated probes can also be custom designed and synthesized, which enables specific probe til-ing and in silico assessment for secondary structures, homogeneous GC content and annealing tempera-tures. Different manufacturers can now deliver such probes, with related procedures apparently achieving similar efficiency80. Depending on the overall size of the genomic regions targeted, multiple libraries can, in theory, be enriched as pools to achieve faster hands-on times. However, owing to the probable formation of chimeric DNA libraries during post-capture PCRs, pooling of libraries before capture should ideally be avoided, or if pooling is used then the constituent libraries should at least be double-indexed DNA libraries87 to enable chi-maera identification and elimination from subsequent analyses. Increasing probe tiling densities (11 bp ver-sus 24 bp) did not consistently improve enrichment for ~670 nuclear loci in archaeological maize, suggesting that even relatively reduced probe densities can be used to efficiently recover the full molecular complexity of DNA libraries88.

    In general, custom-synthesized biotinylated probes are most economical when targeting fairly small regions (hundreds of kilobases to a few megabases) owing to probe synthesis costs. However, microarrays can achieve extremely high probe numbers (approximately 1 million each) and, if manufacturers consent, can be chemically treated to cleave the probes from the microarray sur-face, thus recovering large sets of probes at relatively reasonable costs71,76,77. Synthetic DNA probes are built into biotinylated probe libraries using biotinylated adap-tors of minimal size (~20 bp) to limit interference during probe–target annealing. The known adaptor sequence allows further amplification, thereby immortalizing the probe set at low cost. In this way, Fu et al.77 used 8.7 mil-lion probes to recover most of the non-repetitive frac-tion of chromosome 21 from a 40,000-year-old human specimen from Tianyuan cave, China. In addition, they targeted ~3,500 200-bp-long regions around positions

    R E V I E W S

    400 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

    © 2015 Macmillan Publishers Limited. All rights reserved

  • Nature Reviews | Genetics

    Fresh DNA extract

    BiotinylatedRNA probes

    Washingand elution

    Target-enriched fraction

    U

    U

    Probe DNA library

    In vitro transcription

    U

    Hybridization

    Heat denaturation

    Hybridization

    A

    A

    UA

    UA

    USER treatment

    Adapter ligation

    Extension andphosphorylation

    ssDNA adapter ligation

    Extension

    U

    U (From strand 1)

    (From strand 2)

    (From strand 1)

    U

    Extension

    DenaturationExogenous fraction

    (Strand 1) Endogenous fraction(Strand 2)

    A

    A Supernatant

    a Selective uracil enrichment b WISC

    Figure 4 | Enriching DNA libraries for ancient inserts. a | Selective uracil enrichment is shown. 5ʹ-phosphate groups are indicated with black circles, single-strand DNA breaks are shown as vertical lines, biotinylated adaptor groups are shown in red, and streptavidin-coated beads are shown in grey. A single-stranded DNA (ssDNA) library is built until the polymerase extension step. DNA is then phosphorylated to enable the ligation of the second adaptor. This contrasts with the ssDNA library procedure, in which the ligation occurs between the 5ʹ end of the second adaptor and the 3ʹ end of the newly synthesized strand (FIG. 3c). DNA is then treated with uracil DNA glycosylase and endonuclease VIII (USER mix) to generate and then cleave out abasic sites at cytosines that were deaminated into uracils post-mortem. The 3ʹ-phosphate groups at these new termini are then removed (not shown). The resulting 3ʹ-OH ends now serve to prime an extension with a DNA polymerase, which copies throughout the whole length of the strand complementary to where the damage was. As a result, the supernatant now contains double-stranded DNA (dsDNA) library templates corresponding to the original deaminated strands. Other library templates remain unaffected and can be separated, as they remain bound to streptavidin-coated paramagnetic beads. b | In whole-genome in-solution capture (WISC), ssDNA templates from an ancient DNA (aDNA) library are prepared. The target, endogenous aDNA is shown as thin black lines, whereas the exogenous contaminating DNA is shown as thin green lines; adaptors are shown as thick blue lines. In parallel, a probe DNA library is prepared from fresh modern DNA extracts (thin red lines) and used to generate biotinylated RNA probes through in vitro transcription. T7 adaptors to enable in vitro transcription are shown in thick purple lines. The aDNA library is annealed to the RNA probes, low-complexity DNA and adaptor blockers (the latter two are not shown for simplicity). The library fraction of interest is then recovered following elution from streptavidin-coated paramagnetic beads. Part a adapted with permission from REF. 68, Cold Spring Harbor Laboratory Press. Part b adapted with permission from REF. 78, The American Society of Human Genetics.

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 401

    © 2015 Macmillan Publishers Limited. All rights reserved

  • Mate pairsPairs of sequences derived from both ends of a DNA library.

    Edit distanceThe number of sequence mismatch counts between reads and targets.

    known to carry allelic variants in archaic and modern humans, thereby enabling direct estimates of archaic hominin ancestry within the Tianyuan specimen. The method was also used to obtain the exome sequence of two Neanderthals from Spain and Croatia76 and, more recently, sequence data from ~400,000 loci within a sin-gle reaction71. This target enrichment procedure reduced the genotyping costs by at least 45-fold per ancient speci-men71 and enabled genome-wide analyses of ancient individuals at population scales. In this analysis, two 52-nucleotide-long probes were selected to be located on each side of a polymorphic site, and two were centred on the polymorphic site, each representing one of the two possible alleles.

    Solid-phase target enrichment. Direct application of microarrays can also enrich large sets of targets, using approaches originally described for modern DNA89. First used in the aDNA context to characterize exome sequences from a 49,000-year-old Neanderthal specimen73, microarrays have subsequently enabled whole-genome sequencing from bacterial strains responsible for major historical epidemiological out-breaks7,9–13, including the Black Death7. Microarrays also provide interesting alternatives to real-time PCR and shotgun sequencing for parallel screening of >100 pathogens12,90. This is particularly appropriate for iden-tifying ancient pathogens, which often leave no physical skeletal evidence and are generally found only as trace material. Possible drawbacks are poor detection of the most divergent genomic regions and omission of regions with important genomic rearrangements (such as insertions) or unknown additional plasmids that do not segregate in modern strains.

    Whole-genome enrichment. There is a growing interest in characterizing the entire genome sequence of ancient individuals at population scales. However, none of the methods presented above is appropriate for pulling down whole human genomes, as this requires synthe-sizing gigabases of probes. Whole-genome in-solution capture (WISC)78 and a commercial alternative with similar performance79,80 fill this niche, enabling eco-nomical whole-genome enrichment. WISC starts with the preparation of a genome-wide RNA probe library from a species with a genome that is closely related to the target genome in the aDNA sample (FIG. 4b). These RNA probes are generated from a genomic DNA library flanked by adaptors containing T7 promoters that enable a relatively inexpensive reaction, in vitro tran-scription. This in vitro transcription step is carried out in the presence of biotin 16–UTP, so that the resultant RNA probes are biotinylated. The biotinylated RNA probes are annealed to the ssDNA of a heat-denatured aDNA library, while low-complexity DNA and adaptor-blocking RNA oligonucleotides improve stringency and reduce enrichment for highly repetitive regions. Non-hybridized DNA is washed away, whereas the bound, enriched library fraction is finally released following RNase treatment (which precludes probe carry-over) and amplified before sequencing.

    WISC-like approaches consistently improve the pro-portion of sequences that can be mapped to the human reference genome compared to shotgun sequencing (6–159-fold), at least when based on double-stranded DNA libraries29,78–80. As hybridization efficiency increases with target length79, its efficacy may be reduced when analysing libraries built using single-strand meth-ods16,66, which routinely exhibit smaller mean target mol-ecule sizes. The fraction of reads that align to repetitive regions also generally increases with WISC, despite the use of an excess of low-complexity DNA. Unsurprisingly, WISC-enriched libraries show reduced complexity, so that almost every unique insert can be sequenced with minimal sequencing efforts78. As an example, 5–10 mil-lion sequencing reads generated using WISC-enriched libraries of a Bronze Age Danish human hair sample and a pre-Columbian Peruvian human bone were found to cover 7,000–21,000 ancestry-informative markers, which proved to be sufficient for inferring the continental groups that are the closest to these ancient individuals78.

    Analysing aDNAFrom reads to genome alignments. Most available paleo genomes were generated using Illumina technolo-gies, although there are exceptions5,15,17. Analysis of the underlying sequence data mainly relies on computa-tional approaches developed for handling HTS data from modern DNA material, with some additional par-ticularities. Most procedures are implemented within the open-source PALEOMIX package91, in which reads are trimmed of adaptor sequences using AdapterRemoval92 and collapsed when mate pairs are available and over-lap significantly, filtered for a minimal size of 25–30 bp and aligned against reference genomes of interest using Burrows–Wheeler Aligner (BWA)93 or Bowtie 2 (REF. 94). Alignments showing low-quality scores and PCR dupli-cates are further removed using the MarkDuplicates pro-gram from Picard tools, and reads are locally realigned around small insertions and deletions (indels) to improve overall genome quality using the IndelRealigner tool from the Genome Analysis Toolkit (GATK)95. PALEOMIX can also quantify DNA damage levels using mapDamage2 (REF. 48) and perform phylogenomic and metagenomic analyses using modules mostly based on inferences deriv-ing from ExaML (Exascale Maximum Likelihood)96 and MetaPhlAn (Metagenomic Phylogenetic Analysis)97, respectively.

    Unlike sequences in other re-sequencing genome projects, in which mismatches relative to the refer-ence genome generally are derived from sequencing errors and polymorphisms, aDNA sequences exhibit substantial fractions of nucleotide misincorporations that result from sequencing damaged bases. As these misincorporations cluster towards read termini, seed-ing approaches, whereby only the most upstream part of the sequence is used for speeding up identification of possible alignments along the genome, should be avoided98. Parameters controlling acceptance thresholds for read-to-reference edit distance should be adapted to the phylogenetic distance to the reference genome, as overly conservative procedures will under-represent the

    R E V I E W S

    402 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

    © 2015 Macmillan Publishers Limited. All rights reserved

    https://github.com/MikkelSchubert/paleomixhttps://github.com/slindgreen/AdapterRemovalhttp://bio-bwa.sourceforge.net/http://bowtie-bio.sourceforge.net/bowtie2/index.shtmlhttp://broadinstitute.github.io/picardhttps://www.broadinstitute.org/gatk/http://ginolhac.github.io/mapDamage/http://sco.h-its.org/exelixis/web/software/examl/index.htmlhttp://huttenhower.sph.harvard.edu/metaphlan

  • Probabilistic alignersMapping algorithms that can accommodate non-uniform distributions of sequencing errors along reads, generally leading to improved alignments between reads and reference genomes.

    Thermal ageThe predicted time that it would have taken an archaeological sample to produce the observed degree of DNA degradation were the sample exposed to a constant temperature of 10 °C since deposition. Thermal age has been proposed to adjust the chronological age of a sample to its thermal history and to help in predicting the likelihood of DNA surviving in archaeological remains.

    HaplotypesThe DNA sequences of haploid chromosomes.

    Derived allelesAlleles that are evolutionarily derived in a lineage of interest and that are not represented in an ancestral population or species.

    Ancestral allelesAlleles in the ancestral state before a mutation took place in a descending population, species or lineage.

    Nearly fixedFixed alleles are those that are derived and present in all individuals in a descendent population or species. Nearly fixed alleles therefore represent those that are present in nearly all individuals (thus close to fixation, for example, showing allelic frequencies of 99% in the population).

    most polymorphic regions and under-estimate heterozy-gosity levels. Conversely, overly permissive procedures will inflate the alignment false-positive rate, resulting in regions with many reads from different organisms, which is a particular challenge for aDNA data, given its complex mixture of endogenous and exogenous reads52,57.

    Owing to the accumulation of nucleotide misincor-poration towards read ends, probabilistic aligners based on position-scoring matrices have been developed to embed aDNA features from the aligning step. Available aligners include Mapping Iterative Assembler (MIA)69, ANFO Short Read Aligner/Mapper6 and BWA-PSSM (position-specific scoring matrix)99, and these generally show good performance for short reads and/or low-quality data, although some show running times that are compatible only with alignments against relatively small reference genomes (for example, mitochondrial genomes). Importantly, such probabilistic approaches handle platform-specific error profiles in a sound statistical framework.

    Authenticating aDNA data. Following read alignment, analyses often focus on authenticating whether sequenc-ing data are ancient. Software such as mapDamage45,100 or pmdtools101 can test the presence of typical nucleotide misincorporation patterns that result from inflated cyto-sine deamination rates at overhangs. Such patterns can be first obtained by preparing libraries on an aliquot of the DNA extract, while saving the remaining fraction for preparing almost damage-free libraries following USER treatment81. This will limit nucleotide misincorpora-tion effects on downstream analyses. Alternatively, mild USER treatment, which removes most, but not all, of the damage signature, has been proposed to enable sequence authentication and population analyses using the same sample aliquot72.

    Nucleotide misincorporation patterns can be exploited to fit statistical models of post-mortem DNA damage and estimate cytosine deamination rates and nick frequencies44,48. Even though deamination rates at overhangs were reported to increase linearly with time across a wide range of archaeological sites and preserva-tion conditions46, this pattern has not been confirmed within archaeological sites in permafrost17 or temperate environments72. Additionally, different remains from the same specimen and/or extracts from the same remain can show variable levels of DNA damage27,32. This sug-gests complex relationships in which both global condi-tions, as reflected in the thermal age of a given specimen, and microenvironmental factors (within and between remains) drive the amount of DNA damage ultimately measured. In our opinion, these complex relationships, and the dependency of damage quantification on the library preparation and amplification procedures, pre-clude the use of strict minimal thresholds of expected DNA damage levels as authentication criteria. Thus, quantitative comparison with the levels observed for samples excavated at the same or similar archaeological sites, and processed with the same experimental tools, is recommended.

    Statistical damage models also allow correction of base quality scores depending on their probability of being the result of nucleotide misincorporations at damage sites48, thus limiting their possible effect on downstream analyses. However, we emphasize that for low-coverage data — in which mismatches are observed on a few reads at best and penalized when close to read termini — this procedure can potentially inflate the genetic proximity to the reference genome. SNP calling can also benefit from genotype callers, such as SNPest4,102, that explicitly model post-mortem DNA damage as a possible source of error. Furthermore, nucleotide mis-incorporation patterns can be used by computational tools to sort the fraction of reads that show evidence of post-mortem damage101, which is useful when there is substantial modern DNA contamination. Although extremely conservative and not cost-effective (as not all aDNA molecules carry post-mortem DNA damage and many true aDNA reads will be discarded), damage-based filtering approaches have shown great success in characterizing whole-mitochondrial sequences from extensively contaminated Neanderthal specimens101 and an ~400,000-year-old hominin32. Finally, com-paring analytical outcomes when considering the full population of reads or only the most damaged frac-tion (and disregarding mutations, such as transitions, that derive from post-mortem damage40–44) can provide evidence that the results are not driven by damage and contamination artefacts103.

    In addition to revealing nucleotide misincorporation patterns, mapDamage also delivers the base composition of the genomic regions directly flanking DNA inserts and therefore tests depurination as the main driver for DNA fragmentation44,48,100, which can also help authen-tication. This pattern is substantially affected following USER treatment, which mainly cleaves DNA down-stream of unmethylated cytosine residues, therefore resulting in an excess of cytosines at genomic positions just preceding read starts16,72.

    Estimating contamination levels. Nucleotide mis-incorporation and base compositional patterns can be detected in even substantially contaminated samples. This can happen when treating the outer sample surface with bleach before DNA extraction, which can help to remove a fraction of fresh DNA contaminants but also introduces signatures of DNA damage within the remain-ing contaminants104. This can also happen when a mix-ture of highly degraded aDNA templates and undamaged DNA contaminants is incorporated into libraries. A suite of tools has thus been developed for further authenticat-ing aDNA data (in particular for human aDNA). The current methods available exploit the sequence infor-mation at sites and/or haplotypes with known variation across species and/or populations. For example, modern human contamination in Neanderthal HTS data has been estimated using the relative proportion of derived alleles and ancestral alleles observed at mitochondrial sites showing nearly fixed derived alleles in modern humans6. A similar rationale was used to estimate the possible con-tribution of different human population backgrounds105

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 403

    © 2015 Macmillan Publishers Limited. All rights reserved

    http://mia-assembler.sourceforge.net/https://bioinf.eva.mpg.de/anfo/http://bwa-pssm.binf.ku.dk/http://ginolhac.github.io/mapDamage/https://code.google.com/p/pmdtools/https://github.com/slindgreen/SNPest

  • EpiallelesAllelic variants showing identical genetic sequences but different epigenetic marks, such as different methylation patterns.

    or species83 to final mitochondrial consensus sequences. A statistically more powerful contamination estimator for mitochondrial reads that uses linkage information at the read level has been developed74.

    As the cellular mitochondrial number is variable across cell types and tissues, contamination estimates based on mitochondrial sequence data do not directly reflect the true contamination levels of the nuclear genome106. Heterozygosity levels observed on male X chromosomes can be used as a nuclear contamina-tion proxy. As males are haploid for most X chromo-some loci, base discordance between overlapping reads should result only from sequencing errors and should be distributed randomly along the chromosome. However, if modern human DNA contamination is present, dis-cordance rates should inflate at sites that are polymor-phic within contemporary populations14. For archaic hominin specimens, nuclear contamination rates can be calculated from fixed alleles that are derived in mod-ern humans6. For female ancient human samples, the presence of sequences that are known to be unique to the Y chromosome can also reflect the presence of con-tamination from male-derived sources106. Triallelic sites at autosomes could potentially be used in the future to estimate levels of nuclear contamination with modern human DNA, irrespective of the sample gender.

    Genome completion and error rates. Reliable contami-nation estimates can generally be recovered from the data aligning to the X chromosome using even low-depth information, as long as each single genomic posi-tion is covered once on average (that is, ~1× coverage). Ultimately, the exact fraction of the genome that is cov-ered depends on the sequencing effort and the sequence length. For aDNA sequence reads of 60 nucleotides, ~87% of the human genome is non-repetitive, and there-fore reads of similar size (or shorter) cannot be uniquely aligned to the remaining ~13% of the genome4. For example, the genome of a Paleo-Eskimo Greenlander of the Saqqaq culture was sequenced to ~16× coverage, with ~20% of the genome missing. This achieved ~20× coverage at positions covered at least once, although some variation was observed along the chromosomes, as half of the positions showed a depth of coverage of ≤7×. Using DNA polymerases that reduce size and base com-positional biases during library amplification62 can help to limit such variation, although nucleosome protection can also lead to specific patterns of depth-of-coverage variation along the genome (see below).

    Overall sequence accuracy of ancient genomes is another parameter that is worth considering, as sequenc-ing errors will have an impact on downstream analyses. Genome-wide error rates are generally estimated using three-way alignments that include the genome of a closely related outgroup (for example, the chimpanzee) and a high-quality genome from a living conspecific individual107. The excess of derived alleles observed in the genome of the ancient individual provides an esti-mate for its error rate relative to the high-quality modern genome. Unsurprisingly, this rate is highly dependent on DNA damage levels and the molecular tools used before

    and during sequencing. The best alternative developed so far involves USER treatment followed by paired-end sequencing81, which can generally reduce error rates to

  • Ghost populationAn unsampled population that exchanges migrants with other sampled populations and that can be identified based on admixture signatures left in descending populations.

    Introgressive block lengthsPopulation admixture introduces a mosaic of ancestry blocks along the genome, the lengths of which decrease with each subsequent generation owing to recombination. Introgressive block lengths can therefore be exploited to determine the date of admixture events.

    found to correlate with known methylation levels at pro-moters, exons, introns and CpG islands. This was not observed for other CpN dinucleotides, confirming that methylation drives the signal. The authors also used CpG→TpG substitutions at read starts (where deami-nation rates are maximal) to infer ancient methylation levels for genomic regions overlapping the loci from the Illumina Infinium Human Methylation450BeadChip array. The inferred methylation profile from the Saqqaq sample was found to cluster with hair follicle meth-ylation profiles, which is in agreement with the tissue originally used for DNA extraction.

    Genome-wide nucleosome maps. Library inserts derived from endogenous aDNA generally show uni-modal size distributions that are typically centred

    around 40–80 bp. However, several aDNA sequence data sets exhibit striking 10-bp periodicity in their size distribution4,21,25,26,34,79. Pedersen et al.34 proposed that this results from nucleosome protection, with DNA fragmentation preferentially occurring at nucleotides facing away from nucleosomes. Assuming that nucle-osomes are strongly positioned and phased along DNA scaffolds, and recalling that the turn of the DNA helix is 10 bp long, only 1 nucleotide per 10 bp would be fully exposed to hydrolysis. If this is true, then nucleo-some protection should also drive additional patterns. For example, DNA fragmentation should occur pref-erentially within spacers, which are nucleosome-free regions of ~50 bp separating successive ~150-bp DNA blocks covered by nucleosomes. Fewer endogenous reads should therefore map to spacer regions, lead-ing to depth-of-coverage periodicities of ~200 bp, with peaks of coverage corresponding to nucleosome centres and correlating with both in silico predicted and experimentally derived nucleosome maps. These predicted periodicities were confirmed in the Saqqaq sample data, even following correction for base com-positional effects, which can substantially affect depth-of-coverage variation during library amplifica-tion62. This finding, together with expected patterns of methylation and depth of coverage within CTCF regions and splicing sites, confirmed the nucleosome protection hypothesis.

    Nucleosomes might protect DNA from cleavage that occurs during cellular apoptosis and/or post-mortem34. As similar periodicity patterns have been found not only in ancient hair follicles4,34, which have undergone extensive apoptosis, but also in other ancient tissues that are not particularly affected by apoptosis, such as teeth21 and bones25,26,79, we expect that ancient nucleo-some maps could, in the future, be reconstructed across a wide range of samples. Recalling that such patterns are also absent from many of the samples analysed so far, further work is needed to understand which fac-tors drive the preservation of signatures of nucleosome protection after death.

    Assessing ancient gene expression levels. Post-mortem DNA damage enables the reconstruction of ancient methylome and nucleosome maps. Given the central role of epigenetic states in regulating chromatin acces-sibility to transcription factors, this information can be tentatively used to infer ancient gene expression levels. Encouragingly, methylation ratios between gene bod-ies and promoter regions (a proxy for gene expression) showed strong correlation with hair follicle expression levels measured using high-throughput RNA sequenc-ing (RNA-seq)34. However, further work is needed to develop genuine proxies that accurately measure ancient gene expression levels. The epigenome of each cell type is complex, and ancient samples will necessarily span a range of tissues, with unbalanced contributions from different cell types, which will possibly result in vari-able validity of expression predictions across samples, age, sex and health conditions. As one example, genome hypermethylation is a known response to viral infection

    Box 2 | Reconstructing population histories

    One of the most common first steps in the analysis of genome-wide data from ancient humans is the characterization of their closest relatives among modern populations. Such inferences are generally based on principal component analysis (PCA) or statistical clustering, using software such as Admixture126. A benefit of statistical clustering is that it also enables documentation of contamination levels through determining whether the ancient samples exhibit a genetic contribution that could be derived from the research team4. With shotgun sequencing at low depth of coverage (for example, ≤8×), genotypes cannot be reliably determined, and analyses are generally performed using pseudo-haploid data in which sequence reads from many loci consist of a random sampling of only one of the two constituent alleles, and thus individuals are considered to be homozygous for the unique allele sampled at a given locus. The genomic regions covered across multiple individuals are then also limited, which reduces the number of orthologous loci overlapping known genetic variation in modern populations. In such cases, the ancestry of each ancient individual can be determined using multidimensional scaling (MDS), which exploits pairwise measures of genetic distances in a panel of individuals, calculated by normalizing the sum of all instances where two individuals show different alleles by the total number of loci with no missing data in each pair. This procedure is implemented in the bammds package127. Additionally, Procrustes transformation of individual PCA projections based on the particular vector of single-nucleotide polymorphisms covered in each specimen and the same reference panel can help to visualize the population affinities of a group of ancient individuals within a single analysis103.

    However, PCA-based approaches reflect not only population ancestries but also the temporal sampling between ancient and modern individuals128. Thus, at best, MDS, PCA and clustering analyses should be viewed as formulating evolutionary hypotheses, which subsequently require testing using approaches such as model-based inference, as well as coalescence simulations14, D-statistics129 and population f-statistics130. Population f-statistics methods, such as the f

    3-statistics, have been developed for

    detecting populations with mixed ancestries and identifying populations that are closest to ancient individuals130. D-statistics has received particular attention because it originally supported the theory that admixture occurred between Neanderthals and non-African modern humans6. D-statistics is based on four-way alignments that include one outgroup (O) and three populations (H

    1, H

    2 and H

    3), of which two (H

    1 and H

    2) are

    more closely related. For example, in the case of Neanderthals, with the following configuration (O = Chimpanzee, H

    3 = Neanderthals; H

    2 = Eurasians, H

    2 = Africans),

    positive D-statistics indicate an excess of shared polymorphisms and possible admixture between Neanderthals and Eurasians6,16,22. However, this observation is also compatible with gene flow into Africans from a currently unsampled and divergent ghost population129, as well as with population subdivision in Africa, with Neanderthal and Eurasian ancestors leaving Africa from related population backgrounds131. Admixture events can be further dated from the distribution of introgressive block lengths in modern and ancient individuals27,122,132, as recombination reduces their size over time. The resulting date seemed to be too recent to be compatible with a scenario involving population subdivision in Africa, which confirmed admixture with Neanderthals outside Africa.

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 405

    © 2015 Macmillan Publishers Limited. All rights reserved

    http://www.genetics.ucla.edu/software/http://dna.ku.dk/~sapfo/bammds.html

  • CTCF regionsGenomic regions targeted by CCCTC-binding factor (CTCF) and involved in regulating the three-dimensional structure of chromatin and transcription by mediating long-range interactions between genomic sequences.

    AdmixtureInterbreeding of individuals from multiple population origins, resulting in the introduction of DNA from one population into the genomes of a second population.

    in plants, and methylation assays for ancient plant mate-rial can therefore be used to monitor viral exposure in ancient populations109.

    ConclusionsRecent technical developments have enhanced our understanding of the properties of aDNA molecules and how we should best proceed to maximize their retrieval. In some environments, this enables genomic characteri-zation throughout much of the past million years17,31,32. Ongoing research and the increasing wealth of sequenc-ing data generated will undoubtedly further improve current approaches in the near future. DNA extraction represents an area with great potential for improve-ment, especially if tailored to the molecular structures, niches and microenvironmental parameters that best preserve DNA.

    The discovery that post-mortem cytosine deami-nation preferentially occurs at overhangs was impor-tant for the development of authentication criteria44. However, other base modifications, including pyrimi-dine derivatives, have been identified39. Improved characterization of the chemical features of aDNA mol-ecules, as well as their methylation and nucleosome pro-tection patterns, could therefore open new avenues for data authentication. This will also improve our ability to correct sequence analyses from as-yet-unidentified biases and provide opportunities for targeting damaged templates before sequencing. The development of engi-neered DNA polymerases that can bypass specific DNA lesions introduced post-mortem110 could also facilitate library construction and amplification.

    Importantly, although the approaches outlined here improve aDNA retrieval and analyses, the HTS

    technologies themselves had the greatest impact on the field. Although not originally designed for aDNA, their massive throughput coupled with their ability to sequence short molecules rendered them ideal for aDNA applica-tions. Therefore, it is likely that future HTS platforms that directly sequence DNA bases and their modifications with minimal (if any) library preparation will drive the future of aDNA research. The results of the initial application of true single-molecule DNA sequencing are encouraging, having demonstrated substantial improvement in relative amounts of accessible endogenous sequences17,45,56.

    Although most paleogenomic studies have focused on a limited number of individuals, current approaches allow the characterization of genome-wide SNP vari-ation at ancient population scales71,111. Future studies can be expected to investigate genetic variation in large population samples on the high-density SNP or even whole-genome scale, thus improving our understanding of past demographic, adaptive and admixture trajectories with greater detail112.

    Besides delivering ancient genomes and epig-enomes, new methodological developments have also provided access to ancient transcriptomes113,114 and proteomes17,115,116. Owing to the biochemical processes inherent in animal cell death, animal tissues are unlikely to represent good reservoirs for long-term RNA sur-vival. Materials still exist in other organisms that do not undergo autolysis. One example is plant seed, a tissue that requires RNA survival for germination and that has dem-onstrated ancient RNA survival going back hundreds to thousands of years113,114. Such materials may contribute to our understanding of how gene expression path-ways have been remodelled during domestication. Additionally, a wide range of ancient proteins have been sequenced from Late115 and Middle17 Pleistocene speci-mens. With half-lives exceeding that of DNA, ancient peptides might be the only way to retrieve genetic information from the early Pleistocene and even earlier time periods. Within a much more recent time range, namely the past few thousand years, studies of proteins have already delivered information that is not obtainable from DNA, such as whether milk products were already consumed in particular ancient societies117. Molecular analyses of dental plaque, which offers a rich reservoir entrapping biomolecules derived not only from the host but also from its diet and the oral microbiome118,119, may also hold great promises, especially now that computa-tional approaches have been developed to compare the diversity of past and present microbiomes57.

    A final question worth considering is whether the technological breakthroughs in ancient genomics may offer pathways towards de-extinction120. Bringing back lost species is of growing interest, and although it is a topic fraught with challenges ranging from the ethical to the technological, for many extinct species a key starting requisite will be a well-characterized reference genome. As new extraction and computational methods expand the age range and quality of specimens from which such data can reliably be obtained, so too will the range of species that could be considered as possible targets for de-extinction attempts.

    Nature Reviews | Genetics

    Depth ofcoverage

    CpG→TpG

    mCpGCpG

    Nucleosome Spacer DNA

    Figure 5 | Tracking ancient nucleosome and methylation maps. DNA wrapped around nucleosomes can be protected post-mortem and over-represented in high-throughput sequencing (HTS) data. Therefore, depth-of-coverage patterns along the genome can be exploited to position the location of nucleosomes on ancient genomes. Similarly, post-mortem deamination at CpG sites transforms methylated CpG (mCpG) sites into TpG sites but transforms unmethylated CpG sites into UpG sites. With molecular tools disabling the sequencing of the UpGs, CpG→TpG mutations in HTS data provides an opportunity to detect ancient mCpGs, with hypomethylated regions showing low CpG→TpG conversion rates and hypermethylated regions showing high CpG→TpG conversion rates. Adapted with permission from REF. 38, American Association for the Advancement of Science.

    R E V I E W S

    406 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

    © 2015 Macmillan Publishers Limited. All rights reserved

  • 1. Higuchi, R., Bowman, B., Freiberger, M., Ryder, O. A. & Wilson, A. C. DNA sequences from the quagga, an extinct member of the horse family. Nature 312, 282–284 (1984).

    2. Hagelberg, E., Sykes, B. & Hedges, R. Ancient bone DNA amplified. Nature 342, 485 (1989).

    3. Pääbo, S. Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proc. Natl Acad. Sci. USA 86, 1939–1943 (1989).

    4. Rasmussen, M. et al. Ancient human genome sequence of an extinct Paleo-Eskimo. Nature 463, 757–762 (2010).This study takes advantage of the relative absence of environmental microorganisms within ancient hairs to characterize the first high-quality ancient human genome.

    5. Miller, W. et al. Sequencing the nuclear genome of the extinct woolly mammoth. Nature 456, 387–390 (2008).

    6. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).This paper reports the first draft genome of an archaic hominin and many methodological developments that are still commonly used for characterizing and analysing ancient genomes.

    7. Bos, K. I. et al. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478, 506–510 (2011).This paper reports the first genome isolated from an ancient pathogenic bacterium, confirming the Black Death as a plague epidemic. It revealed that no derived variant is unique to the medieval strain, suggesting that non-genetic factors enhanced the virulence of the pathogen.

    8. Martin, M. D. et al. Reconstructing genome evolution in historic samples of the Irish potato famine pathogen. Nat. Commun. 4, 2172 (2013).

    9. Schuenemann, V. J. et al. Genome-wide comparison of medieval and modern Mycobacterium leprae. Science 341, 179–183 (2013).

    10. Bos, K. I. et al. Pre-Columbian mycobacterial genomes reveal seals as a source of New World human tuberculosis. Nature 514, 494–497 (2014).

    11. Devault, A. M. et al. Second-pandemic strain of Vibrio cholera from the Philadelphia cholera outbreak. N. Engl. J. Med. 370, 334–340 (2014).

    12. Devault, A. M. et al. Ancient pathogen DNA in archaeological samples detected with a microbial detection array. Sci. Rep. 4, 4245 (2014).

    13. Wagner, D. M. et al. Yersinia pestis and the Plague of Justinian 541–543 AD: a genomic analysis. Lancet Infect. Dis. 14, 319–326 (2014).

    14. Rasmussen, M. et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334, 94–98 (2011).

    15. Keller, A. et al. New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. Nat. Commun. 3, 698 (2012).

    16. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).This paper describes a novel method for constructing aDNA libraries using ssDNA templates, which enabled the characterization of the Denisovan genome at a quality rivalling that of modern genomes, starting from only minute amounts of DNA extracts.

    17. Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).This study takes advantage of both second-generation (high-throughput, and library- and amplification-dependent) and third-generation (high-throughput, and library- and amplification-independent) sequencing technologies to present the oldest genome sequence hitherto characterized: that of an ~700,000-year-old horse.

    18. Gamba, C. et al. Genome flux and stasis in a five millennium transect of European prehistory. Nat. Commun. 5, 5257 (2014).

    19. Jónsson, H. et al. Speciation with gene flow in equids despite extensive chromosomal plasticity. Proc. Natl Acad. Sci. USA 111, 18655–18660 (2014).

    20. Malaspinas, A. S. et al. Two ancient human genomes reveal Polynesian ancestry among the indigenous Botocudos of Brazil. Curr. Biol. 24, R1035–R1037 (2014).

    21. Olalde, I. et al. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature 507, 225–228 (2014).

    22. Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

    23. Raghavan, M. et al. The genetic prehistory of the New World Arctic. Science 345, 1255832 (2014).

    24. Raghavan, M. et al. Upper Paleolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014).

    25. Rasmussen, M. et al. The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature 506, 225–229 (2014).

    26. Schubert, M. et al. Prehistoric genomes reveal the genetic foundation and cost of horse domestication. Proc. Natl Acad. Sci. USA 111, E5661–E5669 (2014).

    27. Seguin-Orlando, A. et al. Genomic structure in Europeans dating back at least 36,200 years. Science 346, 1113–1118 (2014).

    28. Ramirez, O. et al. Genome data from a sixteenth century pig illuminate modern breed relationships. Heredity 114, 175–184 (2015).

    29. Schroeder, H. et al. Genome-wide ancestry of 17th century enslaved Africans from the Caribbean. Proc. Natl Acad. Sci. USA 112, 3669–3673 (2015).

    30. Metzker, M. L. Sequencing technologies — the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

    31. Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15758–15763 (2013).

    32. Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2014).

    33. Gokhman, D. et al. Reconstructing the DNA methylation maps of the Neandertal and the Denisovan. Science 344, 523–527 (2014).

    34. Pedersen, J. S. et al. Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome. Genome Res. 24, 454–466 (2014).This study exploits DNA degradation patterns in HTS data sets to characterize, for the first time, genome-wide nucleosome and methylation maps from an ancient human and infer ancient gene expression levels and the age at death of the individual.

    35. Ermini, L., Der Sarkissian, C., Willerslev, E. & Orlando, L. Major transitions in human evolution revisited: a tribute to ancient DNA. J. Hum. Evol. 79, 4–20 (2015).

    36. Shapiro, B. & Hofreiter, M. A paleogenomic perspective on evolution and gene function: new insights from ancient DNA. Science 343, 1236573 (2014).

    37. Orlando, L. & Cooper, A. Using ancient DNA to understand evolutionary and ecological processes. Ann. Rev. Ecol. Evol. Syst. 45, 573–598 (2014).

    38. Orlando, L. & Willerslev, E. An epigenetic window into the past? Science 345, 511–512 (2014).

    39. Höss, M., Jaruga, P., Zastawny, T. H., Dizdaroglu, M. & Pääbo, S. DNA damage and DNA sequence retrieval from ancient tissues. Nucleic Acids Res. 24, 1304–1307 (1996).

    40. Hansen, A. J., Willerslev, E., Wiuf, C., Mourier, T. & Arctander, P. Statistical evidence for miscoding lesions in ancient DNA templates. Mol. Biol. Evol. 18, 262–265 (2001).

    41. Hofreiter, M., Jaenicke, V., Serre, D., von Haeseler, A. & Pääbo, S. DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. Nucleic Acids Res. 29, 4793–4799 (2001).

    42. Stiller, M. et al. Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA. Proc. Natl Acad. Sci. USA 103, 13578–13584 (2006).

    43. Gilbert, M. T. et al. Recharacterization of ancient DNA miscoding lesions: insights in the era of sequencing-by-synthesis. Nucleic Acids Res. 35, 1–10 (2007).

    44. Briggs, A. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl Acad. Sci. USA 104, 14616–14621 (2007).This study characterizes typical nucleotide misincorporation and fragmentation patterns using HTS data from aDNA extracts, which have been subsequently used as essential authentication criteria.

    45. Orlando, L. et al. True single-molecule DNA sequencing of a Pleistocene horse bone. Genome Res. 21, 1705–1719 (2011).

    46. Sawyer, S. et al. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS ONE 7, e34131 (2012).

    47. Overballe-Petersen, S., Orlando, L. & Willerslev, E. Next-generation sequencing offers new insights into DNA degradation. Trends Biotechnol. 30, 364–368 (2012).

    48. Jónsson, H. et al. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).

    49. Hansen, A. J. et al. Crosslinks rather than strand breaks determine access to ancient DNA sequences from frozen sediments. Genet. 173, 1175–1179 (2006).

    50. Heyn, P. et al. Road blocks on paleogenomes — polymerase extension profiling reveals the frequency of blocking lesions in ancient DNA. Nucleic Acids Res. 38, e161 (2010).

    51. Poinar, H. N., Kuch, M., McDonald, G., Martin, P. & Pääbo, S. Nuclear gene sequences from a late Pleistocene sloth coprolithe. Curr. Biol. 13, 1150–1152 (2003).

    52. Poinar, H. N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 393–394 (2006).This study reports the first genetic analysis of ancient specimens based on a HTS technology, paving the way for whole-genome sequencing from ancient specimens.

    53. Allentoft, M. E. et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. Biol. Sci. 279, 4724–4733 (2012).

    54. Smith, C. I., Chamberlain, A. T., Riley, M. S., Stringer, C. & Collins, M. J. The thermal history of human fossils and the likelihood of successful DNA amplification. J. Hum. Evol. 45, 203–217 (2003).

    55. Schwarz, C. et al. New insights from old bones: DNA preservation and degradation in permafrost preserved mammoth remains. Nucleic Acids Res. 37, 3215–2129 (2009).

    56. Ginolhac, A. et al. Improving the performance of true single molecule sequencing for ancient DNA. BMC Genomics 13, 177 (2012).

    57. Der Sarkissian, C. et al. Shotgun microbial profiling of fossil remains. Mol. Ecol. 23, 1780–1798 (2014).

    58. Damgaard, P. et al. Improving access to endogenous DNA in ancient bones and teeth. BioRxiv http://dx.doi.org/10.1101/014985 (2015).

    59. Salamon, M., Tuross, N., Arensburg, B. & Weiner, S. Relatively well preserved DNA is present in the crystal aggregates of fossil bones. Proc. Natl Acad. Sci. USA 102, 13783–13788 (2005).

    60. Adler, C. J., Haak, W., Donlon, D. & Cooper, A. Survival and recovery of DNA from ancient teeth and bones. J. Archaeol. Sci. 38, 956–964 (2011).

    61. Seguin-Orlando, A. et al. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes. PLoS ONE 8, e78575 (2013).

    62. Dabney, J. & Meyer, M. Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. Biotechniques 87–94 (2012).

    63. Young, A. L. et al. A new strategy for genome assembly using short sequence reads and reduced representation libraries. Genome Res. 20, 249–256 (2010).

    64. Seguin-Orlando, A. et al. Amplification of TruSeq ancient DNA libraries with AccuPrime Pfx: consequences on nucleotide misincorporation and methylation patterns. STAR 1, STAR2015112054892315Y.0000000005 (2015).

    65. Star, B. et al. Palindromic sequence artifacts generated during next generation sequencing library preparation from historic and ancient DNA. PLoS ONE 9, e89676 (2014).

    66. Gansauge, M. T. & Meyer, M. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protoc. 8, 737–748 (2013).

    67. Gilbert, M. T. et al. Whole-genome shotgun sequencing of mitochondria from ancient hair shafts. Science 317, 1927–1930 (2007).

    68. Gansauge, M. T. & Meyer, M. Selective enrichment of damaged DNA molecules for ancient genome sequencing. Genome Res. 24, 1543–1549 (2014).

    69. Briggs, A. et al. Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science 325, 318–321 (2009).

    70. Maricic, T., Whitten, M. & Pääbo, S. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS ONE 5, e14004 (2010).

    71. Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature http://dx.doi.org/10.1038/nature14317 (2015).

    72. Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil–DNA–glycosylase treatment for screening of ancient DNA. Phil. Trans. R. Soc. B 370, 20130624 (2015).

    R E V I E W S

    NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 407

    © 2015 Macmillan Publishers Limited. All rights reserved

  • 73. Burbano, H. A. et al. Targeted investigation of the Neandertal genome by array-based sequence capture. Science 328, 723–725 (2010).This paper reports the first characterization of an ancient exome using target enrichment approaches on microarrays.

    74. Fu, Q. et al. A revised timescale for human evolution based on ancient mitochondrial genomes. Curr. Biol. 23, 553–559 (2013).

    75. Vilstrup, J. T. et al. Mitochondrial phylogenomics of modern and ancient equids. PLoS ONE 8, e55950 (2013).

    76. Castellano, S. et al. Patterns of coding variation in the complete exomes of three Neandertals. Proc. Natl Acad. Sci. USA 111, 6666–6671 (2014).

    77. Fu, Q. et al. DNA analysis of an early modern human form Tianyuan Cave, China. Proc. Natl Acad. Sci. USA 110, 2223–2227 (2013).This paper describes a target enrichment procedure exploiting millions of DNA probes cleaved from user-designed DNA microarrays to characterize the almost complete sequence of the non-repetitive fraction of chromosome 21 for an ~40,000-year-old human.

    78. Carpenter, M. L. et al. Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am. J. Hum. Genet. 93, 852–864 (2013).This paper reports the first whole-genome target enrichment method, which makes use of self-generated RNA probes. The method substantially reduces the operational cost of target enrichment and allows genetic analyses of specimens with only minute amounts of aDNA templates.

    79. Enk, J. M. et al. Ancient whole genome enrichment using baits built from modern DNA. Mol. Biol. Evol. 31, 1292–1295 (2014).

    80. Avila-Arcos, C. et al. Comparative performance of two whole-genome capture methodologies on ancient DNA Illumina libraries. Methods Ecol. Evol. http://dx.doi.org/10.1111/2041-210X.12353 (2015).

    81. Briggs, A. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).This paper presents an enzymatic procedure based on the treatment of DNA extracts with USER mix, which can considerably reduce the sequencing error rate of ancient genomes by limiting the effect of nucleotide misincorporations at damaged sites.

    82. Mason, V. C., Li, G., Helgen, K. M. & Murphy, W. J. Efficient cross-species capture hybridization and next-generation sequencing of mitochondrial genomes from noninvasively sampled museum specimens. Genome Res. 21, 1695–1704 (2011).

    83. Zhang, H. et al. Morphological and genetic evidence for early Holocene cattle management in northeastern China. Nat. Commun. 4, 2755 (2013).

    84. Fabre, P. H. et al. Rodents of the Caribbean: origin and diversification of hutias unraveled by next-generation museomics. Biol. Lett. http://dx.doi.org/10.1098/rsbl.2014.0266 (2014).

    85. Foote, A. D. et al. Tracking niche variation over millennial timescales in sympatric killer whale lineages. Proc. Biol. Sci. 280, 20131481 (2013).

    86. Schuenemann, V. J. et al. Targeted enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis from victims of the Black Death. Proc. Natl Acad. Sci. USA 108, E746–E452 (2011).

    87. Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 (2012).

    88. Avila-Arcos, M. C. et al. Application and comparison of large-scale solution-based DNA capture-enrichment methods on ancient DNA. Sci. Rep. 1, 73 (2011).

    89. Hodges, E. et al. Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat. Protoc. 4, 960–974 (2009).

    90. Bos, K. I. et al. Parallel detection of ancient pathogens via array-based DNA capture. Phil. Trans. R. Soc. B 370, 20130375 (2015).

    91. Schubert, M. et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat. Protoc. 9, 1056–1082 (2013).This paper presents a fully automated pipeline performing all sequence analyses associated with re-sequencing genomic projects, phylogenomic inference and metagenomic profiling. It is applicable to both modern and ancient sequence data sets.

    92. Lindgreen, S. AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res. Notes 5, 337 (2012).

    93. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    94. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    95. McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    96. Kozlov, A. M., Aberer, A. J. & Stamatakis, A. ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics http://dx.doi.org/10.1093/bioinformatics/btv184 (2015).

    97. Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

    98. Schubert, M. et al. Improving ancient DNA read mapping against modern reference genomes. BMC Genomics13, 178 (2012).

    99. Kerpedjev, P., Frellsen, J., Lindgreen, S. & Krogh, A. Adaptable probabilistic mapping of short reads using position specific scoring matrices. BMC Bioinformatics 15, 100 (2014).

    100. Ginolhac, A., Rasmussen, M., Gilbert, M. T., Willerslev, E. & Orlando, L. mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics 27, 2153–2155 (2011).

    101. Skoglund, P. et al. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc. Natl Acad. Sci. USA 111, 2229–2234 (2014).

    102. Lindgreen, S., Krogh, A. & Pedersen, J. S. SNPest: a probabilistic graphical model for estimating genotypes. BMC Res. Notes 7, 698 (2014).

    103. Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336, 466–469 (2012).

    104. García-Garcerà, M. et al. Fragmentation of contaminant and endogenous DNA in ancient samples determined by shotgun sequencing; prospects for human paleogenomics. PLoS ONE 6, e24161 (2011).

    105. Sánchez-Quinto, F. et al. Genomic affinities of two 7,000-year-old Iberian hunter-gatherers. Curr. Biol. 22, 1494–1499 (2012).

    106. Green, R. E. et al. The Neandertal genome and ancient DNA authenticity. EMBO J. 28, 2494–2502 (2009).

    107. Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010).

    108. Llamas, B. et al. High-resolution analysis of cytosine methylation in ancient DNA. PLoS ONE 7, e30226 (2012).

    109. Smith, O. et al. Genomic methylation patterns in archaeological barley show de-methylation as a time-dependent diagenetic process. Sci. Rep. 4, 5559 (2014).

    110. d’Abbadie, M. et al. Molecular breeding of polymerases for amplification of ancient DNA. Nat. Biotech. 25, 939–943 (2007).

    111. da Fonseca, R. R. et al. The origin and evolution of maize in the American Southwest. Nat. Plants 1, 14003 (2015).

    112. Pickrell, J. K. & Reich, D. Toward a new history and geography of human genes informed by ancient DNA. Trends Genet. 30, 377–389 (2014).

    113. Fordyce, S. L. et al. Deep sequencing of RNA from ancient maize kerne