Top Banner
A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen
22

A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Dec 28, 2015

Download

Documents

Gyles Arnold
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

A Bayesian method for DNA barcoding

Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen,

University of Copenhagen

Page 2: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Varieties of barcoding

• Assignment to existing species.

• Identification of new species.

• Assignment to taxonomic levels in general

Page 3: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Motivation

1. Environmental aDNA samples.

2. Putative Neandertal DNA.

• Often short query sequences.– Little information.

• Permissive PCR conditions.– Not always from the intended locus.

Page 4: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Given a set of database reference sequences from different species

– according to which criteria should we assign new query sequences to taxonomic levels?

?

Page 5: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

True species assignment

• Requires proper population genetic analyses quantifying variablity within species.

• Often not possible...– small database sample size for each species.– short query PCR products.

Page 6: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Phylogenetic alternative

- Purely phylogenetic criteria which ignore population genetic problems.

- Taxonomic annotation of database sequences is used to map phylogenetic groups to taxonomic levels.

- The simpler approach has its own advangates:

Less data required / Fewer assumptions

Page 7: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Monophyletictaxonomic group

Ingroup or outgroup?

Query

Page 8: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Estimating trees

• Estimation of a single tree is not sufficient because of the uncertainty regarding the phylogeny.

• We suggest instead to use a Bayesian approach which quantifies this uncertainty

Page 9: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Bayesian approach

• Let Q be the query sequence, X the database data, G a gene tree, and F a desired taxonomic group, then

where Gi is the ith gene tree sampled from p(G | X).

k

ii

G

GFQIk

dGXGpGFQIXFQ

1

)in icmonophylet ,(1

)|()in icmonophylet ,()|Pr(

Page 10: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Assignment pipeline

SummaryStatistics

QuerySequence

Homologyset

Taxonomysummary

Sampledtrees

Alignment

Database(GenBank)

NCBI blastRetrieval of sequences and taxonomy annotation

ClustalW

MrBayes

Page 11: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Summary statistics

• For each tree:– Find the sister group to the query.– Find the list of taxonomic levels shared by the

sequences in the sister group (consensus taxonomy)

Sister group Query

Page 12: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Summary statistics

• For each tree:– Find the sister group to the query.– Find the list of taxonomic levels shared by the

sequences in the sister group (consensus taxonomy)

• For each name of each taxonomic level:– Find the fraction of samples trees where the

consensus taxonomy include that name.

Page 13: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Example taxonomy summary

Page 14: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Environmental Samples

• 379 environmental samples (aDNA)

• RBCL and TRNL markers.

• Aim is the identification of environmental flora

Page 15: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Orders >90%

Asterales Brassicales Caryophyllales Coniferales

Dipsacales Ericales Fabales Fagales

Lamiales Lepidoptera Malpighiales Poales

Pottiales Ranunculales Rosales Sapindales

Saxifragales Solanales Zingiberales

Page 16: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Families >90%

Amaranthaceae Asteraceae Betulaceae Brassicaceae

Caprifoliaceae Caryophyllaceae Ericaceae Fabaceae

Fagaceae Juncaceae Musaceae Papaveraceae

Pinaceae Plantaginaceae Poaceae Rosaceae

Rutaceae Salicaceae Saxifragaceae Solanaceae

Taxaceae Theaceae

Page 17: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Genera >90%

Achillea Alnus Aruncus Cerastium

Fagus Musa Picea Pinus

Plantago Poa Saxifraga Symphoricarpos

Taxus

Page 18: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Botanical evaluation

Temperate climate

similar to central Sweden.

Page 19: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Testing putative Neandertal DNA

• Needless to say we have had several negative examples ...

• One positive example:– Posterior probability of 91%.

Page 20: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Problems

• No population genetic modelling:– Outgroup problem.– Species issues are is not addressed.– Lineage sorting - not reciprocal monophyli.

• Incomplete database

Page 21: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Advantages

• Phylogenetic uncertainty and statistical uncertainty of assignment is addressed.

• Posterior probability of assignment.

• Alternative to single tree assignment.

• Can be used on any database.

Page 22: A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Conclusions

• The phylogenetic barcoding does not model the coalescence process.

• It is the appropriate method for assignment with little data, or when assigning to higher taxonomic levels.

• Bayesian approach offers a measure of confidence in assignment.