A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.

Post on 28-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

A Bayesian method for DNA barcoding

Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen,

University of Copenhagen

Varieties of barcoding

• Assignment to existing species.

• Identification of new species.

• Assignment to taxonomic levels in general

Motivation

1. Environmental aDNA samples.

2. Putative Neandertal DNA.

• Often short query sequences.– Little information.

• Permissive PCR conditions.– Not always from the intended locus.

Given a set of database reference sequences from different species

– according to which criteria should we assign new query sequences to taxonomic levels?

?

True species assignment

• Requires proper population genetic analyses quantifying variablity within species.

• Often not possible...– small database sample size for each species.– short query PCR products.

Phylogenetic alternative

- Purely phylogenetic criteria which ignore population genetic problems.

- Taxonomic annotation of database sequences is used to map phylogenetic groups to taxonomic levels.

- The simpler approach has its own advangates:

Less data required / Fewer assumptions

Monophyletictaxonomic group

Ingroup or outgroup?

Query

Estimating trees

• Estimation of a single tree is not sufficient because of the uncertainty regarding the phylogeny.

• We suggest instead to use a Bayesian approach which quantifies this uncertainty

Bayesian approach

• Let Q be the query sequence, X the database data, G a gene tree, and F a desired taxonomic group, then

where Gi is the ith gene tree sampled from p(G | X).

k

ii

G

GFQIk

dGXGpGFQIXFQ

1

)in icmonophylet ,(1

)|()in icmonophylet ,()|Pr(

Assignment pipeline

SummaryStatistics

QuerySequence

Homologyset

Taxonomysummary

Sampledtrees

Alignment

Database(GenBank)

NCBI blastRetrieval of sequences and taxonomy annotation

ClustalW

MrBayes

Summary statistics

• For each tree:– Find the sister group to the query.– Find the list of taxonomic levels shared by the

sequences in the sister group (consensus taxonomy)

Sister group Query

Summary statistics

• For each tree:– Find the sister group to the query.– Find the list of taxonomic levels shared by the

sequences in the sister group (consensus taxonomy)

• For each name of each taxonomic level:– Find the fraction of samples trees where the

consensus taxonomy include that name.

Example taxonomy summary

Environmental Samples

• 379 environmental samples (aDNA)

• RBCL and TRNL markers.

• Aim is the identification of environmental flora

Orders >90%

Asterales Brassicales Caryophyllales Coniferales

Dipsacales Ericales Fabales Fagales

Lamiales Lepidoptera Malpighiales Poales

Pottiales Ranunculales Rosales Sapindales

Saxifragales Solanales Zingiberales

Families >90%

Amaranthaceae Asteraceae Betulaceae Brassicaceae

Caprifoliaceae Caryophyllaceae Ericaceae Fabaceae

Fagaceae Juncaceae Musaceae Papaveraceae

Pinaceae Plantaginaceae Poaceae Rosaceae

Rutaceae Salicaceae Saxifragaceae Solanaceae

Taxaceae Theaceae

Genera >90%

Achillea Alnus Aruncus Cerastium

Fagus Musa Picea Pinus

Plantago Poa Saxifraga Symphoricarpos

Taxus

Botanical evaluation

Temperate climate

similar to central Sweden.

Testing putative Neandertal DNA

• Needless to say we have had several negative examples ...

• One positive example:– Posterior probability of 91%.

Problems

• No population genetic modelling:– Outgroup problem.– Species issues are is not addressed.– Lineage sorting - not reciprocal monophyli.

• Incomplete database

Advantages

• Phylogenetic uncertainty and statistical uncertainty of assignment is addressed.

• Posterior probability of assignment.

• Alternative to single tree assignment.

• Can be used on any database.

Conclusions

• The phylogenetic barcoding does not model the coalescence process.

• It is the appropriate method for assignment with little data, or when assigning to higher taxonomic levels.

• Bayesian approach offers a measure of confidence in assignment.

top related