-
The 1984 publication of a short mitochondrial DNA sequence from
the quagga, a zebra-like equid that has been extinct since the
1880s, initiated the field of ancient DNA (aDNA) research1.
Following concomitant devel-opment of PCR and realization that DNA
survived in osseous materials2, the future of aDNA research looked
bright. However, the degraded nature of aDNA3 coupled with the
sensitivity of PCR to contamination — whether derived from
environmental microorganisms or human handling, and thus embedded
in the samples, or in the form of laboratory and/or reagent
contamination — con-tributed to a series of publications based on
false-positive results. Given that these problems seriously
undermined the field’s broader scientific interest and reliability
until the mid-2000s, few would have expected that, by the field’s
twenty-fifth birthday, the genome of an ancient human4 and draft
genomes of the extinct mammoth5 and Neanderthals6 would have been
sequenced. Today, many tens of ancient genomes, ranging from
microbial pathogens7–13 to vertebrate genomes14–29 (including the
quagga19), have been sequenced.
Paleogenomics is driven by high-throughput sequenc-ing (HTS)
platforms, some of which generate data from billions of short DNA
fragments per run30. In most pale-ogenomic studies, DNA libraries
are generated by ligat-ing the genomic extract to generic adaptors,
amplified using PCR and then subjected to HTS using so-called
second-generation sequencing platforms. This contrasts with
traditional PCR-based approaches, in which loci are individually
targeted and sub-amplicon-sized DNA is unexploitable. In addition
to enabling whole-genome
sequencing, HTS revealed how a diverse range of fos-sil
specimens that were previously ignored owing to an inability to
yield PCR amplicons nevertheless contained ultrashort aDNA
fragments (~30–50 bp). Combining HTS with extraction methods
tailored to the short, damaged aDNA molecules increased the time
win-dow for aDNA sequencing by an order of magnitude to at least
1 million years in permafrozen regions17 and
500,000 years in temperate caves31,32. Beyond genomes, the
profiling of the epi genetic landscape (that is, epig-enomes) of
these ancient samples has recently become feasible33,34, conferring
the potential to characterize regulatory changes throughout
evolutionary timescales. However, there are also difficulties in
paleogenomic stud-ies. Indeed, HTS has enhanced some of the
challenges, including data authentication and contaminant
iden-tification, as well as accounting for inflated error rates
caused by damaged nucleotides.
In this Review, we discuss key technological devel-opments
underpinning the paleogenomic revolution (FIG. 1) and describe
post-mortem damage types com-mon to aDNA and how they can be
accounted for (and even exploited). Furthermore, we discuss how
aDNA targets can be enriched relative to other DNA, how the
resulting sequences can be analysed, and recent progress in
characterizing ancient epigenomes. Throughout, we highlight current
limitations and provide perspectives for future developments. As
most advances relate to human calcified tissues (bones and teeth),
we principally focus on these. Some of the key findings addressing
long-standing debates in our own global population history
1Centre for GeoGenetics, Natural History Museum of Denmark,
University of Copenhagen, Øster Voldgade 5–7, Copenhagen 1350C,
Denmark.2Université de Toulouse, University Paul Sabatier (UPS),
Laboratoire AMIS, CNRS UMR 5288, 37 allées Jules Guesde, 31000
Toulouse, France.3Trace and Environmental DNA Laboratory,
Department of Environment and Agriculture, Curtin University,
Perth, Western Australia 6102, Australia.Correspondence to
L.O. e‑mail: [email protected]:10.1038/nrg3935Published
online 9 June 2015
Osseous materialsCalcified animal tissues, such as bones and
teeth.
Reconstructing ancient genomes and epigenomesLudovic Orlando1,2,
M. Thomas P. Gilbert1,3 and Eske Willerslev1
Research involving ancient DNA (aDNA) has experienced a true
technological revolution in recent years through advances in the
recovery of aDNA and, particularly, through applications of
high-throughput sequencing. Formerly restricted to the analysis of
only limited amounts of genetic information, aDNA studies have now
progressed to whole-genome sequencing for an increasing number of
ancient individuals and extinct species, as well as to epigenomic
characterization. Such advances have enabled the sequencing of
specimens of up to 1 million years old, which, owing to their
extensive DNA damage and contamination, were previously not
amenable to genetic analyses. In this Review, we discuss these
varied technical challenges and solutions for sequencing ancient
genomes and epigenomes.
A P P L I C AT I O N S O F N E X T- G E N E R AT I O N S E Q U E
N C I N G
R E V I E W S
NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 395
© 2015 Macmillan Publishers Limited. All rights reserved
mailto:[email protected]
-
Second-generation sequencingHigh-throughput short-read DNA
sequencing platforms that require library construction and thus
modification of the DNA before sequencing. Most commonly
represented by the Illumina, GS-FLX (454), ABI SOLiD and Ion
Torrent series.
Resonance structuresDynamic, alternative forms of molecular
groups, such as nucleotide bases, that result from electron
delocalization within the molecule.
are summarized in BOX 1 to illustrate the diversity of
information that can be gathered, and recent literature describing
other key evolutionary insights revealed by ancient genomics have
been reviewed elsewhere35–38.
aDNA damage and tailored extractionsaDNA damage. aDNA damage
accumulates over time and was originally characterized using
enzymatic reac-tions to reveal the presence of particular types of
DNA damage (such as abasic sites and crosslinks3) or gas
chro-matography experiments coupled with mass spectrome-try39.
Later approaches inferred damage types on the basis of mutational
patterns in sequence data40–43. Specifically, an excess of C→T
mutations, and their significant reduc-tion following treatment
with uracil DNA glycosylase41, revealed cytosine deamination to
uracil (a thymine analogue) as the most prominent base
modification.
HTS data subsequently refined our understanding of such damage,
demonstrating that deamination increases towards read termini44,
consistent with expectations of faster rates in the overhanging
single strands at the frag-ment termini16,44,45 (FIG. 2). HTS
data also revealed that depurination drives post-mortem DNA
fragmentation, as genomic positions preceding read starts
(corresponding to breaks or abasic sites in aDNA molecules) often
con-sist of purines44. This bias appears towards adenines for
younger samples but guanines for older samples, possibly reflecting
differences in fragmentation dynamics46 and/or base-specific
resonance structures47. Statistical models exploiting nucleotide
misincorporation patterns in HTS data sets revealed single-strand
breaks in aDNA44,48, most likely at nicks or abasic sites. Finally,
whereas indirect
detection methods indicated that polymerase-blocking lesions
such as interstrand crosslinks could be prominent in aDNA49, direct
experimental assays based on HTS data suggested a more minor
contribution50. Therefore, their general importance may be context
dependent.
Targeting ultrashort fragments. Extensive aDNA frag-mentation
was documented early in the field’s his-tory, with later
quantitative PCR assays revealing up to 100-fold decreases in the
abundance of PCR templates for each doubling of target size51. As
HTS generally allows most aDNA molecules to be sequenced over their
full length, the resulting distribution represents a size-decay
curve52 that enables direct quantitative comparisons of
fragmentation across specimens through space, time and
environmental conditions53. Although random DNA fragmentation
should decrease molecule numbers expo-nentially as size increases,
aDNA templates often peak at 40–80 bp before this decay is
observed. The exact median length observed reflects the overall
fragmentation levels experienced after death, which generally
increase with the depositional temperature53,54. However, the
deviation from the expected exponential decay curve for ultrashort
sizes suggests that common extraction protocols do not recover, and
thus do not optimally exploit, this fraction of molecules.
This challenge was met by introducing improved silica-based
extraction protocols that modify volume and composition of the
DNA-binding buffer31. These methodological improvements increased
recovery rates of 35–50-bp molecules by twofold to fivefold, and
greatly contributed towards the sequencing of even very
Figure 1 | Major advances in ancient genomics. The major
methodological advances described in this Review are presented with
respect to milestones in paleogenomics, including whole-genome
sequencing and the characterization of transcriptomes, epigenomes
and proteomes. Average genome fold-coverage (×) and sequencing
platforms are indicated where applicable. aDNA, ancient DNA; ssDNA,
single-stranded DNA.
Nature Reviews | Genetics
16× Paleo−Eskimo genome (Illumina)4
Mammothproteome115
6.8× Icemangenome(SOLiD)15
High-throughput DNA sequencing
True single-molecule DNA sequencing (Helicos)45
Extraction of ultrashort DNA fragments31
ssDNA library16,66
Selective uracil enrichment68
Primer extension capture69
In-solution target enrichment; PCR probes70
Microarray-based target enrichment73
Whole-chromosome target enrichment77
Whole-genome in-solution capture78
Date range of methods
aDNA studies and date of publication
13-Mb-long mammothDNA (454)52
2006 2008 2010 2012 2014
Methylomes33
Methylome and nucleosome maps34
0.7× mammoth genome (454)5 52× Neanderthal genome
(Illumina)22
1.3× Neanderthal genome (454 and Illumina)6 1.1×
700,000-year-old horse genome (Illumina and Helicos)17
30× Denisovan genome (Illumina)16 Maize kernel
transcriptomes113
20162007 2009 2011 2013 2015
400,000-year-old mitochondrial genomes31,32
R E V I E W S
396 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics
© 2015 Macmillan Publishers Limited. All rights reserved
-
Pre-digestionExposure of ancient calcified materials to a short
initial digestion aimed at removing substantial fractions of
exogenous contaminants.
454The initial generation of GS-FLX sequencing platforms based
on pyrosequencing, before their acquisition and renaming by
Roche.
old (for example, ~400,000-year-old) specimens31,32.
Furthermore, light pre-digestion of bone or tooth pow-der before
full extraction on the remaining undigested matter significantly
increases the relative proportion of endogenous DNA
recovered45,55–58, probably by washing away microbial
contaminants57 or fully liberating DNA from the matrix59. Finally,
the specific tissue sampled (for example, petrosal bone versus
other bones18,25 and cementum versus dentine58,60) and sampling
procedures (for example, drilling at low versus high speed60)
affect the quality of extracted aDNA.
DNA library construction and amplificationGeneral
recommendations. Second-generation sequenc-ing requires template
molecule modification through adaptor ligation30. Both library
construction and subse-quent PCR amplification represent sources of
error61,62. The parts of a genome sequenced can be affected by
adaptor binding biases and/or the relative efficacy of PCR enzymes
to amplify the constructs. Which and where nucleotide
misincorporations occur during these amplifi-cations also confer
errors in resulting sequences16,44,61. For example, the Phusion
polymerase, which was originally
part of the Illumina library building procedure, preferen-tially
amplifies short and relatively GC-rich templates62. The same is
true for related polymerases, such as Phusion Hot Start I and
II, even when high-fidelity buffers are used. This bias is reduced,
or even disappears, when other polymerases are used, and Accuprime
Pfx, Herculase II Fusion and Pfu Turbo Cx Hotstart currently
seem to be better alternatives than the most commonly used
polymerases, AmpliTaq Gold and Platinum Taq High-Fidelity62.
Increasing PCR cycle number often reduces the molecular complexity
of DNA libraries63; thus, poly-merases should be carefully
selected, PCR amplification cycles minimized and/or independent PCR
reactions undertaken in parallel to limit such biases. This has
important consequences for authenticating aDNA data and quantifying
post-mortem DNA damage, as expected misincorporation models require
tailoring to the exact experimental procedure followed45,64.
Double-stranded DNA libraries. Different DNA library
construction methods also show clear differences in efficiency.
Early aDNA libraries were based around 454-compatible blunt-end
approaches42–44,52 (FIG. 3a).
Box 1 | Human evolution insights: one of the principal
achievements of ancient genomics
An area of great interest in the study of human evolution is
clarifying the admixture history and the migration routes followed
by our ancestors to create contemporary patterns of genetic
variation112. Study of the historical hair of an Aboriginal
Australian revealed the existence of a migration from Africa or the
Middle East that reached Australia and that took place
20,000–30,000 years earlier than the migration that gave rise
to present-day Europeans and Asians14. The 36,200-year-old bone
remains from an Upper Paleolithic man from Kostenki, Russia, were
also found to be genetically closer to contemporary Europeans than
to contemporary Asians, suggesting an earlier date for the
separation between these populations27. The 24,000-year-old remains
of a child from Mal’ta, south-central Siberia, Russia, showed
strong genetic affinities not only with Europeans but also with
Native Americans, indicating a mixed population ancestry for the
first Americans24. The Solutrean theory, which assumed a European
origin across the Atlantic for the Paleo-Indian Clovis culture in
North America, could be ruled out because the 12,600-year-old
cranial remains of the Anzick individual belonging to this culture
shows greater genetic affinities to Native Americans than to
Europeans25.
The peopling of Europe and the effect of the agricultural
revolution have also received great
attention15,18,21,27,71,89,103,105,121. The main genetic components
present in modern Europeans seem to have already differentiated by
36,200 years ago27, and their later dispersal involved several
migration waves71,122. The expansion of the first Neolithic farmers
resulted in mixing hunter-gatherer Mesolithic and near-eastern
population backgrounds within western Europe ~7,500 years
ago18,21,71,121. A later extensive migration took place
~4,500 years ago from the steppes and was associated with the
spread of Indo-European languages into Europe71. The possibility to
gather genome-wide data at population scales from ancient
individuals now provides an opportunity for a fine reconstruction
of population migration and admixture patterns from classical
antiquity to modern times.
In some cases, ancient genomes have revealed direct genetic
continuity across different archaeological cultures, questioning
theories assuming that culture only changes through the migration
of peoples and not simply though the spread of ideas. The first
example is provided by the Paleo-Eskimos from the New World Arctic,
who represent distinct cultural units but were found to represent a
single population, first replaced by Inuit
-
T/A ligationA common DNA ligation technology that relies on
complementary pairing of thymine and adenine overhangs at the 3ʹ
ends of the adaptors and inserts to be ligated, respectively.
Shotgun sequencingThe sequencing of fragmented DNA in the
absence of any selection strategy.
However, as adaptor ligation is random, a fraction of the
constructs do not contain both of the different adap-tors and thus
cannot be sequenced using this method. Another possible limitation
is adaptor dimer formation during ligation; if amplified and
sequenced, these waste sequencing capacity. Illumina introduced T/A
ligation to overcome this in their original library construction
pro-cedure, in which aDNA fragments have an overhanging adenine
added (known as A-tailing) to facilitate liga-tion to T-tailed
adaptors (FIG. 3b). However, this strategy seems to be
suboptimal for aDNA, mostly because tem-plates starting with
thymines are less efficiently processed during ligation61. Thus the
(often substantial) fraction of templates containing deaminated
cytosine residues (thymine analogues) at their temini44 fails to
incorpo-rate into libraries61. TruSeq libraries, which also rely on
T/A ligation, have also been shown to introduce signifi-cant
amounts of palindromic artefacts, whereby short sequence segments
at read starts are copied towards read ends65.
Single-stranded DNA libraries. A subsequent devel-opment was
library construction directly on single-stranded DNA (ssDNA)
templates16,66. In this method, DNA is denatured using heat into
single strands and then ligated to a first adaptor, before
extension with Bst polymerase generates the complementary strand. A
second adaptor is ligated at the 3ʹ end of the comple-mentary
strand, and the full construct is then amplified by PCR
(FIG. 3c). Inclusion of biotin in the first adaptor allows
minimal DNA loss during purification using streptavidin-coated
paramagnetic beads. The develop-ment of this method enabled
characterization of the
Denisovan genome at ~30× coverage using DNA extracts generated
from 40 mg of bone material16. Although the method is sometimes
beneficial on highly degraded osseous materials31,32 (as both
strands and every single-strand break of endogenous DNA molecules
have 3ʹ ter-mini that are compatible with their incorporation into
libraries), its benefit on less-degraded and non-osseous materials
remains unverified.
Enriching for aDNAaDNA extracts are metagenomic mixtures. The
endog-enous DNA within most ancient specimens is usually embedded
within high levels of environmental microbial DNA. Although there
are notable exceptions (including some keratinized materials4,67,
particularly dense bones such as the petrosal bone18,25, and
intentionally preserved materials from museums or herbaria8), it is
unusual for the endogenous DNA content in most calcified remains to
account for more than a few percent of the total DNA content. DNA
preservation and environmental microbial contamination levels can
show extreme variation within a single bone. For example, extracts
and libraries con-structed from a single 36,000-year-old European
human bone yielded 0.1–8.0% of human DNA27, and even greater
variation (0.5–27.8%) was seen using the early Native American
‘Anzick’ cranial bone25.
High microbial contaminant DNA levels render shotgun sequencing
of genomes uneconomical. Thus, several methods have been developed
that improve accessibility to endogenous aDNA. These enrichment
strategies are used either during library construction, by
preferentially incorporating damaged aDNA frag-ments68, or after
library construction, by separating
Post-mortem base modification Post-mortem base modification
Nature Reviews | Genetics
Single-strand break
3′ 5′
Post-mortem DNA decay
YR RY YR RYYRRYCpGm
3′5′3′5′
5′ 5′ 5′
3′
CpG
5′ UpG TpG3′ 5′ 3′ 5′
Overhang
O
O 2
O
OOP
O
O
O N
CH
3′
O
O 2
O
OOP
O
O
O N
CH
3CH
3′
NH
2
O
OOP
O
O
O CH
3′
OH
Abasic site
NH
Figure 2 | Typical ancient DNA molecules. A diverse range of
degradation reactions affect DNA post-mortem and result in
extensive fragmentation (preferentially at purine nucleotides) and
base modifications. The most common base modification identified in
high-throughput sequencing data sets is deamination of cytosines
into uracils (red), or
thymines (blue) when cytosines were methylated (mC). Such
deaminations occur much faster at overhanging ends. Other
modifications include abasic sites (green) and single-strand breaks
(vertical lines). The chemical structures of three damage
by-products (uracils, thymines and abasic sites) are shown.
R, purine; Y, pyrimidine.
R E V I E W S
398 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics
© 2015 Macmillan Publishers Limited. All rights reserved
-
endogenous and exogenous fractions through anneal-ing to
pre-defined sets of probes (in solution69–72 or on
microarrays7,73). Intended capture targets range from whole
mitochondrial genomes (~16 kb31,32,69,72,74,75) or ancient
commensal and pathogenic bacterial genomes (~4 Mb7,10–13) to large
sets of single-nucleotide poly-morphisms (SNPs) (~400,000 SNPs71),
whole exomes (~30 Mb73,76), chromosomes (~30 Mb77) and even
whole
nuclear genomes (~3 Gb29,78–80). Other approaches that have been
demonstrated, although not used in the most recent relevant
studies, include targeted diges-tion of environmental microbial DNA
using restriction enzymes6 and primer extension capture (PEC)69.
Before discussing enrichment strategies further, we highlight that
currently none is able to recover 100% of the tar-get molecules,
and thus they come at a cost of reduced
Nature Reviews | Genetics
a dsDNA library
U
Extension
Heat denaturation
Supernatant
Adapter ligation
ssDNA adapter ligation
U (From strand 1)
(From strand 2)
(From strand 1)
U(Strand 2)
UA
UA
A
c ssDNA library(Strand 1) U
Adapter ligation
Fill in
A
A
A
b A-tailed library
End repair
U
Extension and A-tailing
Adapter ligation
A
AA
TT
A
End repair Denaturation
U
U
AAAU
U
U
U
Figure 3 | Constructing ancient DNA libraries. The three most
common types of ancient DNA (aDNA) libraries are shown.
5ʹ-phosphate groups are indicated with black circles, single-strand
DNA breaks are shown as vertical lines, biotinylated adaptor groups
are shown in red, and streptavidin-coated beads are shown in grey.
a | To construct a double-stranded DNA (dsDNA) library, aDNA is
first end-repaired. It is then ligated to double-stranded adaptors
(blue), and the resultant nicks are filled in to construct library
templates devoid of single-strand breaks. b | To construct an
A-tailed DNA library, aDNA is end-repaired and then A-tailed (that
is, an adenine is added to the 3ʹ ends of the strands) to
facilitate subsequent ligation to T-tailed adaptors while
disfavouring ligation between adaptor pairs. The adaptors are
typically Y‑shaped (that is, they are complementary
at the T-tailed end but have non-complementary arms at the other
end). The use of such adaptors results in aDNA strands being
flanked by distinct non-complementary adaptor sequences at each end
to enable subsequent unidirectional sequencing through the aDNA
fragment. Nicks resulting from ligation are filled-in through PCR
post-ligation. c | To construct a single-stranded DNA (ssDNA)
library, aDNA is first denatured into single strands using heat and
then ligated to biotinylated single-stranded adaptors. The original
DNA strand is then copied using DNA polymerase extension, and a
second adaptor is ligated to enable further PCR amplification and
sequencing. Purification steps are performed using
streptavidin-coated paramagnetic beads. Part c adapted with
permission from REF. 16, American Association for the
Advancement of Science.
R E V I E W S
NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 399
© 2015 Macmillan Publishers Limited. All rights reserved
-
Primer extension capture(PEC). An enrichment technology based on
the ligation of short 5ʹ-biotinylated oligonucleotides (including a
12-nucleotide-long spacer followed by a primer of 18–25 nucleotides
that is designed to match a particular region of interest) to
single-stranded target molecules. This is followed by a single
round of polymerase-based extension so as to increase the length
over which the molecules are hybridized.
Tiled probesProbes that overlap in their positioning on the
target so as to ensure that every target position is covered by
more than one different probe.
Chimeric DNA librariesRecombination between libraries containing
different template molecules during library PCR amplification,
resulting in hybrid (chimeric) sequences that do not represent true
biological sequences.
Double-indexed DNA librariesDNA libraries in which short (for
example, 8 bp long) unique nucleotide indexes are incorporated
within both adaptors used during library construction. Indexes are
bordered by known sequences that serve to prime index sequencing
reactions and also enable library attachment to the surface of the
sequencing flow cell.
library complexity. Therefore, the upper threshold on the
maximum sequencing depth attainable from a given library is
reduced, and users must consider the end goal of their analyses
before determining whether capture is a sensible strategy over
direct shotgun sequencing. If the goal is to sequence to high
coverage, highly complex libraries showing relatively high
endogenous content can be shotgun-sequenced4,6,18,22,26, but
enrichment of multiple libraries is advisable in other
cases71,72.
Damaged template enrichment. One approach selec-tively targets
damaged DNA molecules68 during ssDNA library preparation16,66.
After the DNA strand comple-mentary to the original template is
generated, con-structs are 5ʹ-phosphorylated, which enables
ligation to a non-phosphorylated adaptor (FIG. 4a). Following
extension with Bst polymerase to fill the nick located 5ʹ of this
adaptor, treatment with uracil DNA glycosy-lase and
endonuclease VIII (USER mix) is implemented to first replace
deaminated cytosines with abasic sites and then to cleave out these
abasic sites81. The new 3ʹ end is then dephosphorylated and used
for priming a new extension. Thus, all library strands that
originally harboured deaminated cytosines are reconstructed over
their full length and are available in the reaction supernatant for
further amplification and sequenc-ing. The undamaged DNA template
fraction remains attached to streptavidin-coated paramagnetic beads
and can be retained for other uses. This method has shown great
specificity when applied to samples from Late Pleistocene
Neanderthals showing extreme levels of deamination68. Importantly,
in all extracts tested, the relative contamination from modern
human DNA decreased by ~1.6-fold following selective enrichment,
suggesting that undamaged templates resulting from recent
manipulations of the specimen could readily be filtered.
Furthermore, the endogenous content of one sample increased by
3.7–5-fold, which markedly reduced the genome sequencing cost.
Future experi-ments will no doubt explore the wider potential of
this method. For now, users should bear in mind that any endogenous
undamaged molecules will not be retained and will thus be lost,
making the method only appro-priate for the most damaged samples.
Additionally, any DNA carrying damage will be enriched, potentially
providing access to the genomes of associated ancient
microorganisms (although these can show reduced DNA damage levels
compared to their human hosts9).
Extension-free target enrichment in solution. Target enrichment
approaches based on target–probe hybrid-ization are currently
widely used. These require heat denaturation of DNA libraries to
enable annealing of library inserts to overlapping tiled probes
along target regions. Probes can be economically generated using
long-range PCR, if fresh DNA material from closely related species
can be extracted70, through PCR ampli-con shearing and then
ligation to a biotinylated adaptor. This probe library can be
amplified (with biotinylated primers) and used in an unlimited
number of enrich-ment reactions. Following annealing at
stringencies that
can be adapted depending on the phylogenetic distance between
targets and probes, streptavidin-coated beads are washed to
eliminate library constructs with inserts showing no genetic
proximity to the targeted regions, and the final fraction is
amplified and sequenced.
This strategy has predominantly been used for sequencing
mitochondrial genomes72,74,75,82–85, bacte-rial plasmids86 and
short nuclear loci84. Hybridization is even successful when probes
diverge from targets by 10–13%82, which is useful if no close
living relative and/or reference genome is available. This can also
be exploited to detect probe carry-over post-sequencing if the DNA
from a distantly related organism was used for preparing probes
(for example, if DNA from a European bison was used when enriching
for aDNA from aurochs83). Alternatively, potential probe carry-over
can be eliminated before sequencing using dedicated molecular
tools. For example, replacement of deoxythy-midine triphosphate
(dTTP) by deoxyuridine triphos-phate (dUTP) in probes enables
subsequent digestion with uracil DNA glycosylase before
amplification and sequencing83.
Biotinylated probes can also be custom designed and synthesized,
which enables specific probe til-ing and in silico assessment
for secondary structures, homogeneous GC content and annealing
tempera-tures. Different manufacturers can now deliver such probes,
with related procedures apparently achieving similar efficiency80.
Depending on the overall size of the genomic regions targeted,
multiple libraries can, in theory, be enriched as pools to achieve
faster hands-on times. However, owing to the probable formation of
chimeric DNA libraries during post-capture PCRs, pooling of
libraries before capture should ideally be avoided, or if pooling
is used then the constituent libraries should at least be
double-indexed DNA libraries87 to enable chi-maera identification
and elimination from subsequent analyses. Increasing probe tiling
densities (11 bp ver-sus 24 bp) did not consistently improve
enrichment for ~670 nuclear loci in archaeological maize,
suggesting that even relatively reduced probe densities can be used
to efficiently recover the full molecular complexity of DNA
libraries88.
In general, custom-synthesized biotinylated probes are most
economical when targeting fairly small regions (hundreds of
kilobases to a few megabases) owing to probe synthesis costs.
However, microarrays can achieve extremely high probe numbers
(approximately 1 million each) and, if manufacturers consent,
can be chemically treated to cleave the probes from the microarray
sur-face, thus recovering large sets of probes at relatively
reasonable costs71,76,77. Synthetic DNA probes are built into
biotinylated probe libraries using biotinylated adap-tors of
minimal size (~20 bp) to limit interference during probe–target
annealing. The known adaptor sequence allows further amplification,
thereby immortalizing the probe set at low cost. In this way,
Fu et al.77 used 8.7 mil-lion probes to recover most
of the non-repetitive frac-tion of chromosome 21 from a
40,000-year-old human specimen from Tianyuan cave, China. In
addition, they targeted ~3,500 200-bp-long regions around
positions
R E V I E W S
400 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics
© 2015 Macmillan Publishers Limited. All rights reserved
-
Nature Reviews | Genetics
Fresh DNA extract
BiotinylatedRNA probes
Washingand elution
Target-enriched fraction
U
U
Probe DNA library
In vitro transcription
U
Hybridization
Heat denaturation
Hybridization
A
A
UA
UA
USER treatment
Adapter ligation
Extension andphosphorylation
ssDNA adapter ligation
Extension
U
U (From strand 1)
(From strand 2)
(From strand 1)
U
Extension
DenaturationExogenous fraction
(Strand 1) Endogenous fraction(Strand 2)
A
A Supernatant
a Selective uracil enrichment b WISC
Figure 4 | Enriching DNA libraries for ancient inserts. a |
Selective uracil enrichment is shown. 5ʹ-phosphate groups are
indicated with black circles, single-strand DNA breaks are shown as
vertical lines, biotinylated adaptor groups are shown in red, and
streptavidin-coated beads are shown in grey. A single-stranded DNA
(ssDNA) library is built until the polymerase extension step. DNA
is then phosphorylated to enable the ligation of the second
adaptor. This contrasts with the ssDNA library procedure, in which
the ligation occurs between the 5ʹ end of the second adaptor
and the 3ʹ end of the newly synthesized strand (FIG. 3c).
DNA is then treated with uracil DNA glycosylase and
endonuclease VIII (USER mix) to generate and then cleave out
abasic sites at cytosines that were deaminated into uracils
post-mortem. The 3ʹ-phosphate groups at these new termini are then
removed (not shown). The resulting 3ʹ-OH ends now serve to prime an
extension with a DNA polymerase, which copies throughout the whole
length of the strand complementary to where the damage was. As a
result, the supernatant now contains double-stranded DNA (dsDNA)
library templates corresponding to the original deaminated strands.
Other library templates remain unaffected and can be separated, as
they remain bound to streptavidin-coated paramagnetic beads. b | In
whole-genome in-solution capture (WISC), ssDNA templates from an
ancient DNA (aDNA) library are prepared. The target, endogenous
aDNA is shown as thin black lines, whereas the exogenous
contaminating DNA is shown as thin green lines; adaptors are shown
as thick blue lines. In parallel, a probe DNA library is prepared
from fresh modern DNA extracts (thin red lines) and used to
generate biotinylated RNA probes through in vitro
transcription. T7 adaptors to enable in vitro transcription
are shown in thick purple lines. The aDNA library is annealed to
the RNA probes, low-complexity DNA and adaptor blockers (the latter
two are not shown for simplicity). The library fraction of interest
is then recovered following elution from streptavidin-coated
paramagnetic beads. Part a adapted with permission from
REF. 68, Cold Spring Harbor Laboratory Press. Part b adapted
with permission from REF. 78, The American Society of Human
Genetics.
R E V I E W S
NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 401
© 2015 Macmillan Publishers Limited. All rights reserved
-
Mate pairsPairs of sequences derived from both ends of a DNA
library.
Edit distanceThe number of sequence mismatch counts between
reads and targets.
known to carry allelic variants in archaic and modern humans,
thereby enabling direct estimates of archaic hominin ancestry
within the Tianyuan specimen. The method was also used to obtain
the exome sequence of two Neanderthals from Spain and Croatia76
and, more recently, sequence data from ~400,000 loci within a
sin-gle reaction71. This target enrichment procedure reduced the
genotyping costs by at least 45-fold per ancient speci-men71 and
enabled genome-wide analyses of ancient individuals at population
scales. In this analysis, two 52-nucleotide-long probes were
selected to be located on each side of a polymorphic site, and two
were centred on the polymorphic site, each representing one of the
two possible alleles.
Solid-phase target enrichment. Direct application of microarrays
can also enrich large sets of targets, using approaches originally
described for modern DNA89. First used in the aDNA context to
characterize exome sequences from a 49,000-year-old Neanderthal
specimen73, microarrays have subsequently enabled whole-genome
sequencing from bacterial strains responsible for major historical
epidemiological out-breaks7,9–13, including the Black Death7.
Microarrays also provide interesting alternatives to real-time PCR
and shotgun sequencing for parallel screening of >100
pathogens12,90. This is particularly appropriate for iden-tifying
ancient pathogens, which often leave no physical skeletal evidence
and are generally found only as trace material. Possible drawbacks
are poor detection of the most divergent genomic regions and
omission of regions with important genomic rearrangements (such as
insertions) or unknown additional plasmids that do not segregate in
modern strains.
Whole-genome enrichment. There is a growing interest in
characterizing the entire genome sequence of ancient individuals at
population scales. However, none of the methods presented above is
appropriate for pulling down whole human genomes, as this requires
synthe-sizing gigabases of probes. Whole-genome in-solution capture
(WISC)78 and a commercial alternative with similar performance79,80
fill this niche, enabling eco-nomical whole-genome enrichment. WISC
starts with the preparation of a genome-wide RNA probe library from
a species with a genome that is closely related to the target
genome in the aDNA sample (FIG. 4b). These RNA probes are
generated from a genomic DNA library flanked by adaptors containing
T7 promoters that enable a relatively inexpensive reaction,
in vitro tran-scription. This in vitro transcription step
is carried out in the presence of biotin 16–UTP, so that the
resultant RNA probes are biotinylated. The biotinylated RNA probes
are annealed to the ssDNA of a heat-denatured aDNA library, while
low-complexity DNA and adaptor-blocking RNA oligonucleotides
improve stringency and reduce enrichment for highly repetitive
regions. Non-hybridized DNA is washed away, whereas the bound,
enriched library fraction is finally released following RNase
treatment (which precludes probe carry-over) and amplified before
sequencing.
WISC-like approaches consistently improve the pro-portion of
sequences that can be mapped to the human reference genome compared
to shotgun sequencing (6–159-fold), at least when based on
double-stranded DNA libraries29,78–80. As hybridization efficiency
increases with target length79, its efficacy may be reduced when
analysing libraries built using single-strand meth-ods16,66, which
routinely exhibit smaller mean target mol-ecule sizes. The fraction
of reads that align to repetitive regions also generally increases
with WISC, despite the use of an excess of low-complexity DNA.
Unsurprisingly, WISC-enriched libraries show reduced complexity, so
that almost every unique insert can be sequenced with minimal
sequencing efforts78. As an example, 5–10 mil-lion sequencing
reads generated using WISC-enriched libraries of a Bronze Age
Danish human hair sample and a pre-Columbian Peruvian human bone
were found to cover 7,000–21,000 ancestry-informative markers,
which proved to be sufficient for inferring the continental groups
that are the closest to these ancient individuals78.
Analysing aDNAFrom reads to genome alignments. Most available
paleo genomes were generated using Illumina technolo-gies, although
there are exceptions5,15,17. Analysis of the underlying sequence
data mainly relies on computa-tional approaches developed for
handling HTS data from modern DNA material, with some additional
par-ticularities. Most procedures are implemented within the
open-source PALEOMIX package91, in which reads are trimmed of
adaptor sequences using AdapterRemoval92 and collapsed when mate
pairs are available and over-lap significantly, filtered for a
minimal size of 25–30 bp and aligned against reference genomes of
interest using Burrows–Wheeler Aligner (BWA)93 or Bowtie 2
(REF. 94). Alignments showing low-quality scores and PCR
dupli-cates are further removed using the MarkDuplicates pro-gram
from Picard tools, and reads are locally realigned around small
insertions and deletions (indels) to improve overall genome quality
using the IndelRealigner tool from the Genome Analysis Toolkit
(GATK)95. PALEOMIX can also quantify DNA damage levels using
mapDamage2 (REF. 48) and perform phylogenomic and metagenomic
analyses using modules mostly based on inferences deriv-ing from
ExaML (Exascale Maximum Likelihood)96 and MetaPhlAn (Metagenomic
Phylogenetic Analysis)97, respectively.
Unlike sequences in other re-sequencing genome projects, in
which mismatches relative to the refer-ence genome generally are
derived from sequencing errors and polymorphisms, aDNA sequences
exhibit substantial fractions of nucleotide misincorporations that
result from sequencing damaged bases. As these misincorporations
cluster towards read termini, seed-ing approaches, whereby only the
most upstream part of the sequence is used for speeding up
identification of possible alignments along the genome, should be
avoided98. Parameters controlling acceptance thresholds for
read-to-reference edit distance should be adapted to the
phylogenetic distance to the reference genome, as overly
conservative procedures will under-represent the
R E V I E W S
402 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics
© 2015 Macmillan Publishers Limited. All rights reserved
https://github.com/MikkelSchubert/paleomixhttps://github.com/slindgreen/AdapterRemovalhttp://bio-bwa.sourceforge.net/http://bowtie-bio.sourceforge.net/bowtie2/index.shtmlhttp://broadinstitute.github.io/picardhttps://www.broadinstitute.org/gatk/http://ginolhac.github.io/mapDamage/http://sco.h-its.org/exelixis/web/software/examl/index.htmlhttp://huttenhower.sph.harvard.edu/metaphlan
-
Probabilistic alignersMapping algorithms that can accommodate
non-uniform distributions of sequencing errors along reads,
generally leading to improved alignments between reads and
reference genomes.
Thermal ageThe predicted time that it would have taken an
archaeological sample to produce the observed degree of DNA
degradation were the sample exposed to a constant temperature of 10
°C since deposition. Thermal age has been proposed to adjust the
chronological age of a sample to its thermal history and to help in
predicting the likelihood of DNA surviving in archaeological
remains.
HaplotypesThe DNA sequences of haploid chromosomes.
Derived allelesAlleles that are evolutionarily derived in a
lineage of interest and that are not represented in an ancestral
population or species.
Ancestral allelesAlleles in the ancestral state before a
mutation took place in a descending population, species or
lineage.
Nearly fixedFixed alleles are those that are derived and present
in all individuals in a descendent population or species. Nearly
fixed alleles therefore represent those that are present in nearly
all individuals (thus close to fixation, for example, showing
allelic frequencies of 99% in the population).
most polymorphic regions and under-estimate heterozy-gosity
levels. Conversely, overly permissive procedures will inflate the
alignment false-positive rate, resulting in regions with many reads
from different organisms, which is a particular challenge for aDNA
data, given its complex mixture of endogenous and exogenous
reads52,57.
Owing to the accumulation of nucleotide misincor-poration
towards read ends, probabilistic aligners based on position-scoring
matrices have been developed to embed aDNA features from the
aligning step. Available aligners include Mapping Iterative
Assembler (MIA)69, ANFO Short Read Aligner/Mapper6 and BWA-PSSM
(position-specific scoring matrix)99, and these generally show good
performance for short reads and/or low-quality data, although some
show running times that are compatible only with alignments against
relatively small reference genomes (for example, mitochondrial
genomes). Importantly, such probabilistic approaches handle
platform-specific error profiles in a sound statistical
framework.
Authenticating aDNA data. Following read alignment, analyses
often focus on authenticating whether sequenc-ing data are ancient.
Software such as mapDamage45,100 or pmdtools101 can test the
presence of typical nucleotide misincorporation patterns that
result from inflated cyto-sine deamination rates at overhangs. Such
patterns can be first obtained by preparing libraries on an aliquot
of the DNA extract, while saving the remaining fraction for
preparing almost damage-free libraries following USER treatment81.
This will limit nucleotide misincorpora-tion effects on downstream
analyses. Alternatively, mild USER treatment, which removes most,
but not all, of the damage signature, has been proposed to enable
sequence authentication and population analyses using the same
sample aliquot72.
Nucleotide misincorporation patterns can be exploited to fit
statistical models of post-mortem DNA damage and estimate cytosine
deamination rates and nick frequencies44,48. Even though
deamination rates at overhangs were reported to increase linearly
with time across a wide range of archaeological sites and
preserva-tion conditions46, this pattern has not been confirmed
within archaeological sites in permafrost17 or temperate
environments72. Additionally, different remains from the same
specimen and/or extracts from the same remain can show variable
levels of DNA damage27,32. This sug-gests complex relationships in
which both global condi-tions, as reflected in the thermal age of a
given specimen, and microenvironmental factors (within and between
remains) drive the amount of DNA damage ultimately measured. In our
opinion, these complex relationships, and the dependency of damage
quantification on the library preparation and amplification
procedures, pre-clude the use of strict minimal thresholds of
expected DNA damage levels as authentication criteria. Thus,
quantitative comparison with the levels observed for samples
excavated at the same or similar archaeological sites, and
processed with the same experimental tools, is recommended.
Statistical damage models also allow correction of base quality
scores depending on their probability of being the result of
nucleotide misincorporations at damage sites48, thus limiting their
possible effect on downstream analyses. However, we emphasize that
for low-coverage data — in which mismatches are observed on a few
reads at best and penalized when close to read termini — this
procedure can potentially inflate the genetic proximity to the
reference genome. SNP calling can also benefit from genotype
callers, such as SNPest4,102, that explicitly model post-mortem DNA
damage as a possible source of error. Furthermore, nucleotide
mis-incorporation patterns can be used by computational tools to
sort the fraction of reads that show evidence of post-mortem
damage101, which is useful when there is substantial modern DNA
contamination. Although extremely conservative and not
cost-effective (as not all aDNA molecules carry post-mortem DNA
damage and many true aDNA reads will be discarded), damage-based
filtering approaches have shown great success in characterizing
whole-mitochondrial sequences from extensively contaminated
Neanderthal specimens101 and an ~400,000-year-old hominin32.
Finally, com-paring analytical outcomes when considering the full
population of reads or only the most damaged frac-tion (and
disregarding mutations, such as transitions, that derive from
post-mortem damage40–44) can provide evidence that the results are
not driven by damage and contamination artefacts103.
In addition to revealing nucleotide misincorporation patterns,
mapDamage also delivers the base composition of the genomic regions
directly flanking DNA inserts and therefore tests depurination as
the main driver for DNA fragmentation44,48,100, which can also help
authen-tication. This pattern is substantially affected following
USER treatment, which mainly cleaves DNA down-stream of
unmethylated cytosine residues, therefore resulting in an excess of
cytosines at genomic positions just preceding read starts16,72.
Estimating contamination levels. Nucleotide mis-incorporation
and base compositional patterns can be detected in even
substantially contaminated samples. This can happen when treating
the outer sample surface with bleach before DNA extraction, which
can help to remove a fraction of fresh DNA contaminants but also
introduces signatures of DNA damage within the remain-ing
contaminants104. This can also happen when a mix-ture of highly
degraded aDNA templates and undamaged DNA contaminants is
incorporated into libraries. A suite of tools has thus been
developed for further authenticat-ing aDNA data (in particular for
human aDNA). The current methods available exploit the sequence
infor-mation at sites and/or haplotypes with known variation across
species and/or populations. For example, modern human contamination
in Neanderthal HTS data has been estimated using the relative
proportion of derived alleles and ancestral alleles observed at
mitochondrial sites showing nearly fixed derived alleles in modern
humans6. A similar rationale was used to estimate the possible
con-tribution of different human population backgrounds105
R E V I E W S
NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 403
© 2015 Macmillan Publishers Limited. All rights reserved
http://mia-assembler.sourceforge.net/https://bioinf.eva.mpg.de/anfo/http://bwa-pssm.binf.ku.dk/http://ginolhac.github.io/mapDamage/https://code.google.com/p/pmdtools/https://github.com/slindgreen/SNPest
-
EpiallelesAllelic variants showing identical genetic sequences
but different epigenetic marks, such as different methylation
patterns.
or species83 to final mitochondrial consensus sequences. A
statistically more powerful contamination estimator for
mitochondrial reads that uses linkage information at the read level
has been developed74.
As the cellular mitochondrial number is variable across cell
types and tissues, contamination estimates based on mitochondrial
sequence data do not directly reflect the true contamination levels
of the nuclear genome106. Heterozygosity levels observed on male
X chromosomes can be used as a nuclear contamina-tion proxy.
As males are haploid for most X chromo-some loci, base
discordance between overlapping reads should result only from
sequencing errors and should be distributed randomly along the
chromosome. However, if modern human DNA contamination is present,
dis-cordance rates should inflate at sites that are polymor-phic
within contemporary populations14. For archaic hominin specimens,
nuclear contamination rates can be calculated from fixed alleles
that are derived in mod-ern humans6. For female ancient human
samples, the presence of sequences that are known to be unique to
the Y chromosome can also reflect the presence of
con-tamination from male-derived sources106. Triallelic sites at
autosomes could potentially be used in the future to estimate
levels of nuclear contamination with modern human DNA, irrespective
of the sample gender.
Genome completion and error rates. Reliable contami-nation
estimates can generally be recovered from the data aligning to the
X chromosome using even low-depth information, as long as each
single genomic posi-tion is covered once on average (that is, ~1×
coverage). Ultimately, the exact fraction of the genome that is
cov-ered depends on the sequencing effort and the sequence length.
For aDNA sequence reads of 60 nucleotides, ~87% of the human
genome is non-repetitive, and there-fore reads of similar size (or
shorter) cannot be uniquely aligned to the remaining ~13% of the
genome4. For example, the genome of a Paleo-Eskimo Greenlander of
the Saqqaq culture was sequenced to ~16× coverage, with ~20% of the
genome missing. This achieved ~20× coverage at positions covered at
least once, although some variation was observed along the
chromosomes, as half of the positions showed a depth of coverage of
≤7×. Using DNA polymerases that reduce size and base com-positional
biases during library amplification62 can help to limit such
variation, although nucleosome protection can also lead to specific
patterns of depth-of-coverage variation along the genome (see
below).
Overall sequence accuracy of ancient genomes is another
parameter that is worth considering, as sequenc-ing errors will
have an impact on downstream analyses. Genome-wide error rates are
generally estimated using three-way alignments that include the
genome of a closely related outgroup (for example, the chimpanzee)
and a high-quality genome from a living conspecific individual107.
The excess of derived alleles observed in the genome of the ancient
individual provides an esti-mate for its error rate relative to the
high-quality modern genome. Unsurprisingly, this rate is highly
dependent on DNA damage levels and the molecular tools used
before
and during sequencing. The best alternative developed so far
involves USER treatment followed by paired-end sequencing81, which
can generally reduce error rates to
-
Ghost populationAn unsampled population that exchanges migrants
with other sampled populations and that can be identified based on
admixture signatures left in descending populations.
Introgressive block lengthsPopulation admixture introduces a
mosaic of ancestry blocks along the genome, the lengths of which
decrease with each subsequent generation owing to recombination.
Introgressive block lengths can therefore be exploited to determine
the date of admixture events.
found to correlate with known methylation levels at pro-moters,
exons, introns and CpG islands. This was not observed for other CpN
dinucleotides, confirming that methylation drives the signal. The
authors also used CpG→TpG substitutions at read starts (where
deami-nation rates are maximal) to infer ancient methylation levels
for genomic regions overlapping the loci from the Illumina Infinium
Human Methylation450BeadChip array. The inferred methylation
profile from the Saqqaq sample was found to cluster with hair
follicle meth-ylation profiles, which is in agreement with the
tissue originally used for DNA extraction.
Genome-wide nucleosome maps. Library inserts derived from
endogenous aDNA generally show uni-modal size distributions that
are typically centred
around 40–80 bp. However, several aDNA sequence data sets
exhibit striking 10-bp periodicity in their size
distribution4,21,25,26,34,79. Pedersen et al.34 proposed
that this results from nucleosome protection, with DNA
fragmentation preferentially occurring at nucleotides facing away
from nucleosomes. Assuming that nucle-osomes are strongly
positioned and phased along DNA scaffolds, and recalling that the
turn of the DNA helix is 10 bp long, only 1 nucleotide per 10
bp would be fully exposed to hydrolysis. If this is true, then
nucleo-some protection should also drive additional patterns. For
example, DNA fragmentation should occur pref-erentially within
spacers, which are nucleosome-free regions of ~50 bp separating
successive ~150-bp DNA blocks covered by nucleosomes. Fewer
endogenous reads should therefore map to spacer regions, lead-ing
to depth-of-coverage periodicities of ~200 bp, with peaks of
coverage corresponding to nucleosome centres and correlating with
both in silico predicted and experimentally derived nucleosome
maps. These predicted periodicities were confirmed in the Saqqaq
sample data, even following correction for base com-positional
effects, which can substantially affect depth-of-coverage variation
during library amplifica-tion62. This finding, together with
expected patterns of methylation and depth of coverage within CTCF
regions and splicing sites, confirmed the nucleosome protection
hypothesis.
Nucleosomes might protect DNA from cleavage that occurs during
cellular apoptosis and/or post-mortem34. As similar periodicity
patterns have been found not only in ancient hair follicles4,34,
which have undergone extensive apoptosis, but also in other ancient
tissues that are not particularly affected by apoptosis, such as
teeth21 and bones25,26,79, we expect that ancient nucleo-some maps
could, in the future, be reconstructed across a wide range of
samples. Recalling that such patterns are also absent from many of
the samples analysed so far, further work is needed to understand
which fac-tors drive the preservation of signatures of nucleosome
protection after death.
Assessing ancient gene expression levels. Post-mortem DNA damage
enables the reconstruction of ancient methylome and nucleosome
maps. Given the central role of epigenetic states in regulating
chromatin acces-sibility to transcription factors, this information
can be tentatively used to infer ancient gene expression levels.
Encouragingly, methylation ratios between gene bod-ies and promoter
regions (a proxy for gene expression) showed strong correlation
with hair follicle expression levels measured using high-throughput
RNA sequenc-ing (RNA-seq)34. However, further work is needed to
develop genuine proxies that accurately measure ancient gene
expression levels. The epigenome of each cell type is complex, and
ancient samples will necessarily span a range of tissues, with
unbalanced contributions from different cell types, which will
possibly result in vari-able validity of expression predictions
across samples, age, sex and health conditions. As one example,
genome hypermethylation is a known response to viral infection
Box 2 | Reconstructing population histories
One of the most common first steps in the analysis of
genome-wide data from ancient humans is the characterization of
their closest relatives among modern populations. Such inferences
are generally based on principal component analysis (PCA) or
statistical clustering, using software such as Admixture126. A
benefit of statistical clustering is that it also enables
documentation of contamination levels through determining whether
the ancient samples exhibit a genetic contribution that could be
derived from the research team4. With shotgun sequencing at low
depth of coverage (for example, ≤8×), genotypes cannot be reliably
determined, and analyses are generally performed using
pseudo-haploid data in which sequence reads from many loci consist
of a random sampling of only one of the two constituent alleles,
and thus individuals are considered to be homozygous for the unique
allele sampled at a given locus. The genomic regions covered across
multiple individuals are then also limited, which reduces the
number of orthologous loci overlapping known genetic variation in
modern populations. In such cases, the ancestry of each ancient
individual can be determined using multidimensional scaling (MDS),
which exploits pairwise measures of genetic distances in a panel of
individuals, calculated by normalizing the sum of all instances
where two individuals show different alleles by the total number of
loci with no missing data in each pair. This procedure is
implemented in the bammds package127. Additionally, Procrustes
transformation of individual PCA projections based on the
particular vector of single-nucleotide polymorphisms covered in
each specimen and the same reference panel can help to visualize
the population affinities of a group of ancient individuals within
a single analysis103.
However, PCA-based approaches reflect not only population
ancestries but also the temporal sampling between ancient and
modern individuals128. Thus, at best, MDS, PCA and clustering
analyses should be viewed as formulating evolutionary hypotheses,
which subsequently require testing using approaches such as
model-based inference, as well as coalescence simulations14,
D-statistics129 and population f-statistics130. Population
f-statistics methods, such as the f
3-statistics, have been developed for
detecting populations with mixed ancestries and identifying
populations that are closest to ancient individuals130.
D-statistics has received particular attention because it
originally supported the theory that admixture occurred between
Neanderthals and non-African modern humans6. D-statistics is based
on four-way alignments that include one outgroup (O) and three
populations (H
1, H
2 and H
3), of which two (H
1 and H
2) are
more closely related. For example, in the case of Neanderthals,
with the following configuration (O = Chimpanzee, H
3 = Neanderthals; H
2 = Eurasians, H
2 = Africans),
positive D-statistics indicate an excess of shared polymorphisms
and possible admixture between Neanderthals and Eurasians6,16,22.
However, this observation is also compatible with gene flow into
Africans from a currently unsampled and divergent ghost
population129, as well as with population subdivision in Africa,
with Neanderthal and Eurasian ancestors leaving Africa from related
population backgrounds131. Admixture events can be further dated
from the distribution of introgressive block lengths in modern and
ancient individuals27,122,132, as recombination reduces their size
over time. The resulting date seemed to be too recent to be
compatible with a scenario involving population subdivision in
Africa, which confirmed admixture with Neanderthals outside
Africa.
R E V I E W S
NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 405
© 2015 Macmillan Publishers Limited. All rights reserved
http://www.genetics.ucla.edu/software/http://dna.ku.dk/~sapfo/bammds.html
-
CTCF regionsGenomic regions targeted by CCCTC-binding factor
(CTCF) and involved in regulating the three-dimensional structure
of chromatin and transcription by mediating long-range interactions
between genomic sequences.
AdmixtureInterbreeding of individuals from multiple population
origins, resulting in the introduction of DNA from one population
into the genomes of a second population.
in plants, and methylation assays for ancient plant mate-rial
can therefore be used to monitor viral exposure in ancient
populations109.
ConclusionsRecent technical developments have enhanced our
understanding of the properties of aDNA molecules and how we should
best proceed to maximize their retrieval. In some environments,
this enables genomic characteri-zation throughout much of the past
million years17,31,32. Ongoing research and the increasing wealth
of sequenc-ing data generated will undoubtedly further improve
current approaches in the near future. DNA extraction represents an
area with great potential for improve-ment, especially if tailored
to the molecular structures, niches and microenvironmental
parameters that best preserve DNA.
The discovery that post-mortem cytosine deami-nation
preferentially occurs at overhangs was impor-tant for the
development of authentication criteria44. However, other base
modifications, including pyrimi-dine derivatives, have been
identified39. Improved characterization of the chemical features of
aDNA mol-ecules, as well as their methylation and nucleosome
pro-tection patterns, could therefore open new avenues for data
authentication. This will also improve our ability to correct
sequence analyses from as-yet-unidentified biases and provide
opportunities for targeting damaged templates before sequencing.
The development of engi-neered DNA polymerases that can bypass
specific DNA lesions introduced post-mortem110 could also
facilitate library construction and amplification.
Importantly, although the approaches outlined here improve aDNA
retrieval and analyses, the HTS
technologies themselves had the greatest impact on the field.
Although not originally designed for aDNA, their massive throughput
coupled with their ability to sequence short molecules rendered
them ideal for aDNA applica-tions. Therefore, it is likely that
future HTS platforms that directly sequence DNA bases and their
modifications with minimal (if any) library preparation will drive
the future of aDNA research. The results of the initial application
of true single-molecule DNA sequencing are encouraging, having
demonstrated substantial improvement in relative amounts of
accessible endogenous sequences17,45,56.
Although most paleogenomic studies have focused on a limited
number of individuals, current approaches allow the
characterization of genome-wide SNP vari-ation at ancient
population scales71,111. Future studies can be expected to
investigate genetic variation in large population samples on the
high-density SNP or even whole-genome scale, thus improving our
understanding of past demographic, adaptive and admixture
trajectories with greater detail112.
Besides delivering ancient genomes and epig-enomes, new
methodological developments have also provided access to ancient
transcriptomes113,114 and proteomes17,115,116. Owing to the
biochemical processes inherent in animal cell death, animal tissues
are unlikely to represent good reservoirs for long-term RNA
sur-vival. Materials still exist in other organisms that do not
undergo autolysis. One example is plant seed, a tissue that
requires RNA survival for germination and that has dem-onstrated
ancient RNA survival going back hundreds to thousands of
years113,114. Such materials may contribute to our understanding of
how gene expression path-ways have been remodelled during
domestication. Additionally, a wide range of ancient proteins have
been sequenced from Late115 and Middle17 Pleistocene speci-mens.
With half-lives exceeding that of DNA, ancient peptides might be
the only way to retrieve genetic information from the early
Pleistocene and even earlier time periods. Within a much more
recent time range, namely the past few thousand years, studies of
proteins have already delivered information that is not obtainable
from DNA, such as whether milk products were already consumed in
particular ancient societies117. Molecular analyses of dental
plaque, which offers a rich reservoir entrapping biomolecules
derived not only from the host but also from its diet and the oral
microbiome118,119, may also hold great promises, especially now
that computa-tional approaches have been developed to compare the
diversity of past and present microbiomes57.
A final question worth considering is whether the technological
breakthroughs in ancient genomics may offer pathways towards
de-extinction120. Bringing back lost species is of growing
interest, and although it is a topic fraught with challenges
ranging from the ethical to the technological, for many extinct
species a key starting requisite will be a well-characterized
reference genome. As new extraction and computational methods
expand the age range and quality of specimens from which such data
can reliably be obtained, so too will the range of species that
could be considered as possible targets for de-extinction
attempts.
Nature Reviews | Genetics
Depth ofcoverage
CpG→TpG
mCpGCpG
Nucleosome Spacer DNA
Figure 5 | Tracking ancient nucleosome and methylation maps. DNA
wrapped around nucleosomes can be protected post-mortem and
over-represented in high-throughput sequencing (HTS) data.
Therefore, depth-of-coverage patterns along the genome can be
exploited to position the location of nucleosomes on ancient
genomes. Similarly, post-mortem deamination at CpG sites transforms
methylated CpG (mCpG) sites into TpG sites but transforms
unmethylated CpG sites into UpG sites. With molecular tools
disabling the sequencing of the UpGs, CpG→TpG mutations in HTS data
provides an opportunity to detect ancient mCpGs, with
hypomethylated regions showing low CpG→TpG conversion rates and
hypermethylated regions showing high CpG→TpG conversion rates.
Adapted with permission from REF. 38, American Association for
the Advancement of Science.
R E V I E W S
406 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics
© 2015 Macmillan Publishers Limited. All rights reserved
-
1. Higuchi, R., Bowman, B., Freiberger, M.,
Ryder, O. A. & Wilson, A. C. DNA sequences
from the quagga, an extinct member of the horse family. Nature 312,
282–284 (1984).
2. Hagelberg, E., Sykes, B. & Hedges, R.
Ancient bone DNA amplified. Nature 342, 485 (1989).
3. Pääbo, S. Ancient DNA: extraction, characterization,
molecular cloning, and enzymatic amplification. Proc. Natl
Acad. Sci. USA 86, 1939–1943 (1989).
4. Rasmussen, M. et al. Ancient human genome sequence
of an extinct Paleo-Eskimo. Nature 463, 757–762 (2010).This study
takes advantage of the relative absence of environmental
microorganisms within ancient hairs to characterize the first
high-quality ancient human genome.
5. Miller, W. et al. Sequencing the nuclear genome of
the extinct woolly mammoth. Nature 456, 387–390 (2008).
6. Green, R. E. et al. A draft sequence of the
Neandertal genome. Science 328, 710–722 (2010).This paper reports
the first draft genome of an archaic hominin and many
methodological developments that are still commonly used for
characterizing and analysing ancient genomes.
7. Bos, K. I. et al. A draft genome of Yersinia
pestis from victims of the Black Death. Nature 478, 506–510
(2011).This paper reports the first genome isolated from an ancient
pathogenic bacterium, confirming the Black Death as a plague
epidemic. It revealed that no derived variant is unique to the
medieval strain, suggesting that non-genetic factors enhanced the
virulence of the pathogen.
8. Martin, M. D. et al. Reconstructing genome
evolution in historic samples of the Irish potato famine pathogen.
Nat. Commun. 4, 2172 (2013).
9. Schuenemann, V. J. et al. Genome-wide
comparison of medieval and modern Mycobacterium leprae. Science
341, 179–183 (2013).
10. Bos, K. I. et al. Pre-Columbian mycobacterial
genomes reveal seals as a source of New World human tuberculosis.
Nature 514, 494–497 (2014).
11. Devault, A. M. et al. Second-pandemic strain
of Vibrio cholera from the Philadelphia cholera outbreak.
N. Engl. J. Med. 370, 334–340 (2014).
12. Devault, A. M. et al. Ancient pathogen DNA in
archaeological samples detected with a microbial detection array.
Sci. Rep. 4, 4245 (2014).
13. Wagner, D. M. et al. Yersinia pestis and the
Plague of Justinian 541–543 AD: a genomic analysis. Lancet Infect.
Dis. 14, 319–326 (2014).
14. Rasmussen, M. et al. An Aboriginal Australian
genome reveals separate human dispersals into Asia. Science 334,
94–98 (2011).
15. Keller, A. et al. New insights into the Tyrolean
Iceman’s origin and phenotype as inferred by whole-genome
sequencing. Nat. Commun. 3, 698 (2012).
16. Meyer, M. et al. A high-coverage genome sequence
from an archaic Denisovan individual. Science 338, 222–226
(2012).This paper describes a novel method for constructing aDNA
libraries using ssDNA templates, which enabled the characterization
of the Denisovan genome at a quality rivalling that of modern
genomes, starting from only minute amounts of DNA extracts.
17. Orlando, L. et al. Recalibrating Equus evolution
using the genome sequence of an early Middle Pleistocene horse.
Nature 499, 74–78 (2013).This study takes advantage of both
second-generation (high-throughput, and library- and
amplification-dependent) and third-generation (high-throughput, and
library- and amplification-independent) sequencing technologies to
present the oldest genome sequence hitherto characterized: that of
an ~700,000-year-old horse.
18. Gamba, C. et al. Genome flux and stasis in a five
millennium transect of European prehistory. Nat. Commun. 5, 5257
(2014).
19. Jónsson, H. et al. Speciation with gene flow in
equids despite extensive chromosomal plasticity. Proc. Natl Acad.
Sci. USA 111, 18655–18660 (2014).
20. Malaspinas, A. S. et al. Two ancient human
genomes reveal Polynesian ancestry among the indigenous Botocudos
of Brazil. Curr. Biol. 24, R1035–R1037 (2014).
21. Olalde, I. et al. Derived immune and ancestral
pigmentation alleles in a 7,000-year-old Mesolithic European.
Nature 507, 225–228 (2014).
22. Prüfer, K. et al. The complete genome sequence of
a Neanderthal from the Altai Mountains. Nature 505, 43–49
(2014).
23. Raghavan, M. et al. The genetic prehistory of the
New World Arctic. Science 345, 1255832 (2014).
24. Raghavan, M. et al. Upper Paleolithic Siberian
genome reveals dual ancestry of Native Americans. Nature 505, 87–91
(2014).
25. Rasmussen, M. et al. The genome of a Late
Pleistocene human from a Clovis burial site in western Montana.
Nature 506, 225–229 (2014).
26. Schubert, M. et al. Prehistoric genomes reveal the
genetic foundation and cost of horse domestication. Proc. Natl
Acad. Sci. USA 111, E5661–E5669 (2014).
27. Seguin-Orlando, A. et al. Genomic structure in
Europeans dating back at least 36,200 years. Science 346,
1113–1118 (2014).
28. Ramirez, O. et al. Genome data from a sixteenth
century pig illuminate modern breed relationships. Heredity 114,
175–184 (2015).
29. Schroeder, H. et al. Genome-wide ancestry of 17th
century enslaved Africans from the Caribbean. Proc. Natl Acad.
Sci. USA 112, 3669–3673 (2015).
30. Metzker, M. L. Sequencing technologies — the next
generation. Nat. Rev. Genet. 11, 31–46 (2010).
31. Dabney, J. et al. Complete mitochondrial genome
sequence of a Middle Pleistocene cave bear reconstructed from
ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110,
15758–15763 (2013).
32. Meyer, M. et al. A mitochondrial genome sequence
of a hominin from Sima de los Huesos. Nature 505, 403–406
(2014).
33. Gokhman, D. et al. Reconstructing the DNA
methylation maps of the Neandertal and the Denisovan. Science 344,
523–527 (2014).
34. Pedersen, J. S. et al. Genome-wide nucleosome
map and cytosine methylation levels of an ancient human genome.
Genome Res. 24, 454–466 (2014).This study exploits DNA degradation
patterns in HTS data sets to characterize, for the first time,
genome-wide nucleosome and methylation maps from an ancient human
and infer ancient gene expression levels and the age at death of
the individual.
35. Ermini, L., Der Sarkissian, C.,
Willerslev, E. & Orlando, L. Major transitions in
human evolution revisited: a tribute to ancient DNA. J. Hum.
Evol. 79, 4–20 (2015).
36. Shapiro, B. & Hofreiter, M. A paleogenomic
perspective on evolution and gene function: new insights from
ancient DNA. Science 343, 1236573 (2014).
37. Orlando, L. & Cooper, A. Using ancient DNA to
understand evolutionary and ecological processes. Ann. Rev. Ecol.
Evol. Syst. 45, 573–598 (2014).
38. Orlando, L. & Willerslev, E. An epigenetic
window into the past? Science 345, 511–512 (2014).
39. Höss, M., Jaruga, P., Zastawny, T. H.,
Dizdaroglu, M. & Pääbo, S. DNA damage and DNA
sequence retrieval from ancient tissues. Nucleic Acids Res. 24,
1304–1307 (1996).
40. Hansen, A. J., Willerslev, E., Wiuf, C.,
Mourier, T. & Arctander, P. Statistical evidence for
miscoding lesions in ancient DNA templates. Mol. Biol. Evol. 18,
262–265 (2001).
41. Hofreiter, M., Jaenicke, V., Serre, D., von
Haeseler, A. & Pääbo, S. DNA sequences from multiple
amplifications reveal artifacts induced by cytosine deamination in
ancient DNA. Nucleic Acids Res. 29, 4793–4799 (2001).
42. Stiller, M. et al. Patterns of nucleotide
misincorporations during enzymatic amplification and direct
large-scale sequencing of ancient DNA. Proc. Natl Acad. Sci. USA
103, 13578–13584 (2006).
43. Gilbert, M. T. et al. Recharacterization of
ancient DNA miscoding lesions: insights in the era of
sequencing-by-synthesis. Nucleic Acids Res. 35, 1–10 (2007).
44. Briggs, A. et al. Patterns of damage in genomic
DNA sequences from a Neandertal. Proc. Natl Acad. Sci. USA 104,
14616–14621 (2007).This study characterizes typical nucleotide
misincorporation and fragmentation patterns using HTS data from
aDNA extracts, which have been subsequently used as essential
authentication criteria.
45. Orlando, L. et al. True single-molecule DNA
sequencing of a Pleistocene horse bone. Genome Res. 21, 1705–1719
(2011).
46. Sawyer, S. et al. Temporal patterns of nucleotide
misincorporations and DNA fragmentation in ancient DNA. PLoS ONE 7,
e34131 (2012).
47. Overballe-Petersen, S., Orlando, L. &
Willerslev, E. Next-generation sequencing offers new insights
into DNA degradation. Trends Biotechnol. 30, 364–368 (2012).
48. Jónsson, H. et al. mapDamage2.0: fast approximate
Bayesian estimates of ancient DNA damage parameters. Bioinformatics
29, 1682–1684 (2013).
49. Hansen, A. J. et al. Crosslinks rather than
strand breaks determine access to ancient DNA sequences from frozen
sediments. Genet. 173, 1175–1179 (2006).
50. Heyn, P. et al. Road blocks on paleogenomes —
polymerase extension profiling reveals the frequency of blocking
lesions in ancient DNA. Nucleic Acids Res. 38, e161 (2010).
51. Poinar, H. N., Kuch, M., McDonald, G.,
Martin, P. & Pääbo, S. Nuclear gene sequences from a
late Pleistocene sloth coprolithe. Curr. Biol. 13, 1150–1152
(2003).
52. Poinar, H. N. et al. Metagenomics to
paleogenomics: large-scale sequencing of mammoth DNA. Science 311,
393–394 (2006).This study reports the first genetic analysis of
ancient specimens based on a HTS technology, paving the way for
whole-genome sequencing from ancient specimens.
53. Allentoft, M. E. et al. The half-life of DNA
in bone: measuring decay kinetics in 158 dated fossils.
Proc. Biol. Sci. 279, 4724–4733 (2012).
54. Smith, C. I., Chamberlain, A. T.,
Riley, M. S., Stringer, C. &
Collins, M. J. The thermal history of human fossils and
the likelihood of successful DNA amplification. J. Hum. Evol.
45, 203–217 (2003).
55. Schwarz, C. et al. New insights from old bones:
DNA preservation and degradation in permafrost preserved mammoth
remains. Nucleic Acids Res. 37, 3215–2129 (2009).
56. Ginolhac, A. et al. Improving the performance of
true single molecule sequencing for ancient DNA. BMC Genomics 13,
177 (2012).
57. Der Sarkissian, C. et al. Shotgun microbial
profiling of fossil remains. Mol. Ecol. 23, 1780–1798 (2014).
58. Damgaard, P. et al. Improving access to endogenous
DNA in ancient bones and teeth. BioRxiv
http://dx.doi.org/10.1101/014985 (2015).
59. Salamon, M., Tuross, N., Arensburg, B. &
Weiner, S. Relatively well preserved DNA is present in the
crystal aggregates of fossil bones. Proc. Natl Acad. Sci. USA 102,
13783–13788 (2005).
60. Adler, C. J., Haak, W., Donlon, D. &
Cooper, A. Survival and recovery of DNA from ancient teeth and
bones. J. Archaeol. Sci. 38, 956–964 (2011).
61. Seguin-Orlando, A. et al. Ligation bias in
illumina next-generation DNA libraries: implications for sequencing
ancient genomes. PLoS ONE 8, e78575 (2013).
62. Dabney, J. & Meyer, M. Length and GC-biases
during sequencing library amplification: a comparison of various
polymerase-buffer systems with ancient and modern DNA sequencing
libraries. Biotechniques 87–94 (2012).
63. Young, A. L. et al. A new strategy for genome
assembly using short sequence reads and reduced representation
libraries. Genome Res. 20, 249–256 (2010).
64. Seguin-Orlando, A. et al. Amplification of TruSeq
ancient DNA libraries with AccuPrime Pfx: consequences on
nucleotide misincorporation and methylation patterns. STAR 1,
STAR2015112054892315Y.0000000005 (2015).
65. Star, B. et al. Palindromic sequence artifacts
generated during next generation sequencing library preparation
from historic and ancient DNA. PLoS ONE 9, e89676 (2014).
66. Gansauge, M. T. & Meyer, M.
Single-stranded DNA library preparation for the sequencing of
ancient or damaged DNA. Nat. Protoc. 8, 737–748 (2013).
67. Gilbert, M. T. et al. Whole-genome shotgun
sequencing of mitochondria from ancient hair shafts. Science 317,
1927–1930 (2007).
68. Gansauge, M. T. & Meyer, M. Selective
enrichment of damaged DNA molecules for ancient genome sequencing.
Genome Res. 24, 1543–1549 (2014).
69. Briggs, A. et al. Targeted retrieval and analysis
of five Neandertal mtDNA genomes. Science 325, 318–321 (2009).
70. Maricic, T., Whitten, M. & Pääbo, S.
Multiplexed DNA sequence capture of mitochondrial genomes using PCR
products. PLoS ONE 5, e14004 (2010).
71. Haak, W. et al. Massive migration from the steppe
was a source for Indo-European languages in Europe. Nature
http://dx.doi.org/10.1038/nature14317 (2015).
72. Rohland, N., Harney, E., Mallick, S.,
Nordenfelt, S. & Reich, D. Partial
uracil–DNA–glycosylase treatment for screening of ancient DNA.
Phil. Trans. R. Soc. B 370, 20130624 (2015).
R E V I E W S
NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 407
© 2015 Macmillan Publishers Limited. All rights reserved
-
73. Burbano, H. A. et al. Targeted investigation
of the Neandertal genome by array-based sequence capture. Science
328, 723–725 (2010).This paper reports the first characterization
of an ancient exome using target enrichment approaches on
microarrays.
74. Fu, Q. et al. A revised timescale for human
evolution based on ancient mitochondrial genomes. Curr. Biol. 23,
553–559 (2013).
75. Vilstrup, J. T. et al. Mitochondrial
phylogenomics of modern and ancient equids. PLoS ONE 8, e55950
(2013).
76. Castellano, S. et al. Patterns of coding variation
in the complete exomes of three Neandertals. Proc. Natl Acad. Sci.
USA 111, 6666–6671 (2014).
77. Fu, Q. et al. DNA analysis of an early modern
human form Tianyuan Cave, China. Proc. Natl Acad. Sci. USA 110,
2223–2227 (2013).This paper describes a target enrichment procedure
exploiting millions of DNA probes cleaved from user-designed DNA
microarrays to characterize the almost complete sequence of the
non-repetitive fraction of chromosome 21 for an ~40,000-year-old
human.
78. Carpenter, M. L. et al. Pulling out the 1%:
whole-genome capture for the targeted enrichment of ancient DNA
sequencing libraries. Am. J. Hum. Genet. 93, 852–864
(2013).This paper reports the first whole-genome target enrichment
method, which makes use of self-generated RNA probes. The method
substantially reduces the operational cost of target enrichment and
allows genetic analyses of specimens with only minute amounts of
aDNA templates.
79. Enk, J. M. et al. Ancient whole genome
enrichment using baits built from modern DNA. Mol. Biol. Evol. 31,
1292–1295 (2014).
80. Avila-Arcos, C. et al. Comparative performance of
two whole-genome capture methodologies on ancient DNA Illumina
libraries. Methods Ecol. Evol.
http://dx.doi.org/10.1111/2041-210X.12353 (2015).
81. Briggs, A. et al. Removal of deaminated cytosines
and detection of in vivo methylation in ancient DNA. Nucleic
Acids Res. 38, e87 (2010).This paper presents an enzymatic
procedure based on the treatment of DNA extracts with USER mix,
which can considerably reduce the sequencing error rate of ancient
genomes by limiting the effect of nucleotide misincorporations at
damaged sites.
82. Mason, V. C., Li, G., Helgen, K. M.
& Murphy, W. J. Efficient cross-species capture
hybridization and next-generation sequencing of mitochondrial
genomes from noninvasively sampled museum specimens. Genome Res.
21, 1695–1704 (2011).
83. Zhang, H. et al. Morphological and genetic
evidence for early Holocene cattle management in northeastern
China. Nat. Commun. 4, 2755 (2013).
84. Fabre, P. H. et al. Rodents of the Caribbean:
origin and diversification of hutias unraveled by next-generation
museomics. Biol. Lett. http://dx.doi.org/10.1098/rsbl.2014.0266
(2014).
85. Foote, A. D. et al. Tracking niche variation
over millennial timescales in sympatric killer whale lineages.
Proc. Biol. Sci. 280, 20131481 (2013).
86. Schuenemann, V. J. et al. Targeted enrichment
of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis
from victims of the Black Death. Proc. Natl Acad. Sci. USA 108,
E746–E452 (2011).
87. Kircher, M., Sawyer, S. & Meyer, M.
Double indexing overcomes inaccuracies in multiplex sequencing on
the Illumina platform. Nucleic Acids Res. 40, e3 (2012).
88. Avila-Arcos, M. C. et al. Application and
comparison of large-scale solution-based DNA capture-enrichment
methods on ancient DNA. Sci. Rep. 1, 73 (2011).
89. Hodges, E. et al. Hybrid selection of discrete
genomic intervals on custom-designed microarrays for massively
parallel sequencing. Nat. Protoc. 4, 960–974 (2009).
90. Bos, K. I. et al. Parallel detection of
ancient pathogens via array-based DNA capture. Phil. Trans.
R. Soc. B 370, 20130375 (2015).
91. Schubert, M. et al. Characterization of ancient
and modern genomes by SNP detection and phylogenomic and
metagenomic analysis using PALEOMIX. Nat. Protoc. 9, 1056–1082
(2013).This paper presents a fully automated pipeline performing
all sequence analyses associated with re-sequencing genomic
projects, phylogenomic inference and metagenomic profiling. It is
applicable to both modern and ancient sequence data sets.
92. Lindgreen, S. AdapterRemoval: easy cleaning of
next-generation sequencing reads. BMC Res. Notes 5, 337 (2012).
93. Li, H. & Durbin, R. Fast and accurate short
read alignment with Burrows–Wheeler transform. Bioinformatics 25,
1754–1760 (2009).
94. Langmead, B. & Salzberg, S. L. Fast
gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359
(2012).
95. McKenna, A. et al. The genome analysis toolkit: a
MapReduce framework for analyzing next-generation DNA sequencing
data. Genome Res. 20, 1297–1303 (2010).
96. Kozlov, A. M., Aberer, A. J. &
Stamatakis, A. ExaML version 3: a tool for phylogenomic
analyses on supercomputers. Bioinformatics
http://dx.doi.org/10.1093/bioinformatics/btv184 (2015).
97. Segata, N. et al. Metagenomic microbial community
profiling using unique clade-specific marker genes. Nat. Methods 9,
811–814 (2012).
98. Schubert, M. et al. Improving ancient DNA read
mapping against modern reference genomes. BMC Genomics13, 178
(2012).
99. Kerpedjev, P., Frellsen, J., Lindgreen, S.
& Krogh, A. Adaptable probabilistic mapping of short reads
using position specific scoring matrices. BMC Bioinformatics 15,
100 (2014).
100. Ginolhac, A., Rasmussen, M., Gilbert, M. T.,
Willerslev, E. & Orlando, L. mapDamage: testing for
damage patterns in ancient DNA sequences. Bioinformatics 27,
2153–2155 (2011).
101. Skoglund, P. et al. Separating endogenous ancient
DNA from modern day contamination in a Siberian Neandertal. Proc.
Natl Acad. Sci. USA 111, 2229–2234 (2014).
102. Lindgreen, S., Krogh, A. &
Pedersen, J. S. SNPest: a probabilistic graphical model
for estimating genotypes. BMC Res. Notes 7, 698 (2014).
103. Skoglund, P. et al. Origins and genetic legacy of
Neolithic farmers and hunter-gatherers in Europe. Science 336,
466–469 (2012).
104. García-Garcerà, M. et al. Fragmentation of
contaminant and endogenous DNA in ancient samples determined by
shotgun sequencing; prospects for human paleogenomics. PLoS ONE 6,
e24161 (2011).
105. Sánchez-Quinto, F. et al. Genomic affinities of
two 7,000-year-old Iberian hunter-gatherers. Curr. Biol. 22,
1494–1499 (2012).
106. Green, R. E. et al. The Neandertal genome
and ancient DNA authenticity. EMBO J. 28, 2494–2502 (2009).
107. Reich, D. et al. Genetic history of an archaic
hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060
(2010).
108. Llamas, B. et al. High-resolution analysis of
cytosine methylation in ancient DNA. PLoS ONE 7, e30226 (2012).
109. Smith, O. et al. Genomic methylation patterns in
archaeological barley show de-methylation as a time-dependent
diagenetic process. Sci. Rep. 4, 5559 (2014).
110. d’Abbadie, M. et al. Molecular breeding of
polymerases for amplification of ancient DNA. Nat. Biotech.
25, 939–943 (2007).
111. da Fonseca, R. R. et al. The origin and
evolution of maize in the American Southwest. Nat. Plants
1, 14003 (2015).
112. Pickrell, J. K. & Reich, D. Toward a new
history and geography of human genes informed by ancient DNA.
Trends Genet. 30, 377–389 (2014).
113. Fordyce, S. L. et al. Deep sequencing of RNA
from ancient maize kerne