Top Banner
1 Tricks for trees: Having reconstructed Tricks for trees: Having reconstructed phylogenies phylogenies what can we do with them? what can we do with them? DIMACS, June 2006 Mike Steel Mike Steel Allan Wilson Centre for Allan Wilson Centre for Molecular Ecology and Evolution Molecular Ecology and Evolution Biomathematics Research Centre Biomathematics Research Centre University of Canterbury, University of Canterbury, Christchurch, New Zealand Christchurch, New Zealand
45

1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

1

Tricks for trees: Having reconstructed Tricks for trees: Having reconstructed phylogenies phylogenies what can we do with them?what can we do with them?

DIMACS, June 2006

Mike Steel Mike Steel

Allan Wilson Centre for Allan Wilson Centre for Molecular Ecology and EvolutionMolecular Ecology and EvolutionBiomathematics Research CentreBiomathematics Research Centre

University of Canterbury, University of Canterbury, Christchurch, New ZealandChristchurch, New Zealand

Page 2: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

2

Where are phylogenetic trees used? Evolutionary biology – species

relationships, dating divergences, speciation processes, molecular evolution.

Ecology – classifying new species; biodiversity, co-phylogeny, migration of populations.

Epidemiology – systematics, processes, dynamics

Extras - linguistics, stematology, psychology.

Page 3: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

3

Phylogenetic trees[Definition] A phylogenetic X-tree is a tree T=(V,E) with a set X of labelled leaves, and all other vertices unlabelled and of degree >3.

If all non-leaf vertices have degree 3 then T is binary

Page 4: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

4

Trees and splits

}:|{)( EeBAT ee

ee BAe |

),( XP

1

2

3

45

6

Partial order:

)'()(' TTTT

Buneman’s Theorem

Page 5: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

5

Quartet trees• A quartet tree is a binary phylogenetic tree on 4 leaves (say, x,y,w,z) written xy|wz.

• A phylogenetic X-tree displays xy|wz if there is an edge in T whose deletion separates {x,y} from {w,z}

x

y

w

z

ry

zu

x

s

w

Page 6: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

6

Corresponding notions for rooted trees

Clusters (in place of splits)

Triples in place of quartets

Page 7: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

7

How are trees useful in epidemiology?

Systematics and reconstruction

How are different types/strains of a virus related?

When, where, and how did they arise?

What is their likely future evolution?

What was the ancestral sequence?

Page 8: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

8

How are trees useful in epidemiology?

Processes and dynamics (“Phylodynamics”)

How do viruses change with time in a population? Population size etc

What is their rate of mutation, recombination, selection?

Within-host dynamcs

How do viruses evolve in a single patient?

How is this related to the progression of the disease?

How much compartmental variation exists?

Page 9: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.
Page 10: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

10

What do the shapes of these trees tell us about the processes governing their evolution?

Eg. Population dynamics, selection

Coalescent prediction

Page 11: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

11

a b c d e

Tree shapes (non-metric)

George Yule

Page 12: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.
Page 13: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

13

Why do trees on the same taxa disagree?

1. Model violation1. “true model” differs from “assumed model”2. “true model = assumed model” but estimation method

not appropriate to model 3. model true but too parameter rich (non-identifyability)

2. Sampling error (and factors that make it worse!)3. Alignment error4. Evolutionary processes

1. Lineage sorting 2. Recombination3. Horizontal gene transfer; hybrid taxa4. Gene duplication and loss

Page 14: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

14

Sampling error that’s hard to deal with

?

T4

T3

T2T1

Time

Page 15: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

15

Example: Deep divergence in the Metazoan

phylogeny

Fungi

Choanoflagellates

Arthropods

Nematodes

Deuterostomes

Platyhelminthes

ActinopterMammaliaCnidaria

Monosiga ovata

CryptococcusPhanerochaete

Ustilago

Schizosaccharomyces

Saccharomyces

Candida

Paracooccidioides

Gibberella

MagnaporthNeurospora

Glomus

Neocallimastix

Schistosoma mansoniSchistosoma japonicum

FasciolaEchinococcus

Dugesia

Strongyloides

Caenorhabditis briggsaeCaenorhabditis elegans

AncylostomaPristionchus

Brugia

Ascaris

Heterodera

Trichinella

Glossina

DrosophilaAnopheles

Monosiga brevicollis

Urochordata

Echinodermata

Ctenophora

Meloidogyne

Tardigrades

Chelicerata

HemipteraHymenoptera

Coleoptera

SiphonapteraLepidoptera

Crustacea

AnnelidaMolluscaCephalochordata

From Huson and Bryant, 2006

Page 16: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

16

Models

2

1

k

1

2

3

4

1

3

2

4

vs

Finite state Markov process

Page 17: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

17

Models

1

2

3

4

vs

1

2

3

4

•“site saturation”

• subdividing long edges only offers a partial remedy (trade-off).

Page 18: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

18

Why do trees on the same taxa disagree?

1. Model violation1. “true model” differs from “assumed model”2. “true model = assumed model” but estimation method

not appropriate to model3. model true but too parameter rich (non-identifyability)

2. Sampling error (and factors that make it worse!)3. Alignment4. Evolutionary processes

1. Lineage sorting 2. Recombination3. Horizontal gene transfer; hybrid taxa4. Gene duplication and loss

Page 19: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

19

Gene trees vs species trees

Theorem J. H. Degnan and N.A. Rosenberg, 2006.

For n>5, for any tree, there are branch lengths and population sizes for which the most likely gene tree is different from the species tree.

Discordance of species trees with their most likely gene trees. PLoS Genetics, 2(5), e68 May, 2006

a b c a b c

Page 20: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

20

Example

Orangutan Gorilla Chimpanzee Human

Adapted From the Tree of the Life Website,University of Arizona

?

Page 21: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

21

Distinguishing between signals

Lineage sorting vs sampling error vs HGT

A B C

A B C

A C B

Page 22: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

22

Why do trees on the same taxa disagree?

1. Model violation1. “true model” differs from “assumed model”2. “true model = assumed model” but estimation method

not appropriate to model3. model true but too parameter rich (non-identifyability)

2. Sampling error (and factors that make it worse!)3. Alignment4. Evolutionary processes

1. Lineage sorting 2. Recombination3. Horizontal gene transfer; hybrid taxa4. Gene duplication and loss

Page 23: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

23

Given a tree what questions might we want to answer?

How reliable is a split? Where is the root of the tree? Relative ranking of vertices?

Dating? How well supported is some ‘deep divergence’ resolved? What model best describes the evolution of the sequences

(molecular clock? dS/dN ratio constant? etc)

Statistical approaches: Non-parametric bootstrap Parametric bootstrap Likelihood ratio tests Bayesian posterior probabilities Tests (KH, SH, SOWH)Goldman, N., J. P. Anderson, and A. G. Rodrigo. 2000. Likelihood-based tests of topologies in phylogenetics. Systematic Biology 49: 652-670.

Page 24: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

24From Steve Thompson, Florida State Uni

Page 25: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

25

Example

Page 26: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

26

Non-parametric bootstrap

Page 27: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

27

Page 28: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

28

Dealing with incompatibility: Consensus trees

Strict Majority rule Semistrict consensus

Page 29: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

29

Consensus networks

Take the splits that are in at least x% of the trees and represent them by a graph

Splits Graph (G()) – Dress and HusonEach split is represented by a class of ‘parallel’ edges

Simplest example (n=4).

Page 30: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

30

R.nivicola

(NS)

(SS)

(SS)

(NS)

(SS)

(NS)

(NS)

(SS)

chloroplast

JSA tree

(A)(NS)

(SS)

(N,NS)

(A)

(SS)

(SS)

(SS)

(SS) (NS)

(C,S)

(NS)

(N)

(NS, N)

Page 31: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

31

R.nivicola

nuclear

ITS tree

(SS)

(NS)

(NS)

(SS)

(NS) (SS)

(SS)

(NS)

(SS)

(SS) (SS)

(SS)

(NS)

(SS)

(NS,N) (NS)

(NS) (SS)

(NS,N)

(A)

(A)

(N)

(SS,NS)

Page 32: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

32

consensus network (ITStree+JSAtree)

I

III

IIR.nivicola

Page 33: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

33

Maximum agreement subtrees

Concept

Computational complexity

Page 34: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

34

Comparing trees

Splits metric (Robinson-Foulds)

Statistical aspects.

Tree rearrangement operations – the graph of

trees (rSPR).

Cophylogeny

Page 35: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

35

Co-phylogeny (m. charleston)

Page 36: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

36

Supertrees

Compatibility concept Compatibility of rooted trees (BUILD) Why do we want to do this? Extension – higher order taxa, dates Methods for handling incompatible trees(MRP; mincut variants; minflip)

Page 37: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

37

Compatibility

Example: Q={12|34, 13|45, 14|26}

1

2

3

4

5

6

A set Q of quartets is compatible if there is a phylogenetic X-tree T that displays each quartet of Q

Complexity?

Page 38: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

38

Supertrees

Compatibility concept Compatibility of rooted trees (BUILD) Why do we want to do this? Extension – higher order taxa, dates Methods for handling incompatible trees(MRP; mincut variants; minflip)

Page 39: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

39

Phylogenetic networks

Consensus setting: consensus networks Minimizing hybrid/reticulate vertices Supernetworks – Z closure, filtering

Page 40: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

40

a

Networks can represent: Reticulate evolution (eg. hybrid species) Phylogenetic uncertainty (i.e. possible alternative trees)

Z-closure Given T1,…, Tk on overlapping sets of species,

let construct spcl2() and construct the ‘splits graph’ of the resulting splits that are ‘full’.

cb d a bc d a cbd

)()( 1 kTT

Page 41: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

41

AA22

BB22

Split closure operation (Meacham 1986)

AA11

BB11

AA22

BB22

AA11

BB11UUBB22

AA11UUAA22

BB22

,,

AA11

BB11

Page 42: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

42

Page 43: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

43

Page 44: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

44

Reconstructing ancestral sequences

Methods (MP, Likelihood, Bayesian)

Quiz. MP for a balanced tree = majority state?

Information-theoretic considerations

Page 45: 1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.

45

Statistics of parsimony (clustering on a tree)