Top Banner
1 Dan Graur Methods of Methods of Tree Tree Reconstruction Reconstruction
61

Methods of Tree Reconstruction

Jan 29, 2016

Download

Documents

Leona

Methods of Tree Reconstruction. Dan Graur. Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state (based on character states) 3. maximum likelihood (based on both character states and distances). DISTANCE-MATRIX METHODS - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Methods of Tree Reconstruction

1

Dan Graur

Methods of Methods of Tree Tree

ReconstructionReconstruction

Page 2: Methods of Tree Reconstruction

2

Page 3: Methods of Tree Reconstruction

3

Page 4: Methods of Tree Reconstruction

4

Page 5: Methods of Tree Reconstruction

5

Page 6: Methods of Tree Reconstruction

6

Molecular phylogenetic approaches:

1. distance-matrix (based on distance measures)

2. character-state (based on character states)

3. maximum likelihood (based on both character states and distances)

Page 7: Methods of Tree Reconstruction

7

DISTANCE-MATRIX METHODS

In the distance matrix methods, evolutionary distances (usually the number of nucleotide substitutions or amino-acid replacements between two taxonomic units) are computed for all pairs of taxa, and a phylogenetic tree is constructed by using an algorithm based on some functional relationships among the distance values.

Page 8: Methods of Tree Reconstruction

8

GCGGCTCA TCAGGTAGTT GGTG-G SpinachGCGGCCCA TCAGGTAGTT GGTG-G RiceGCGTTCCA TC--CTGGTT GGTGTG MosquitoGCGTCCCA TCAGCTAGTT GTTG-G MonkeyGCGGCGCA TTAGCTAGTT GGTG-A Human*** ** * * *** * **

Multiple AlignmentMultiple Alignment

Page 9: Methods of Tree Reconstruction

9

Compute pairwise distances Compute pairwise distances by correcting for multiple by correcting for multiple hits at a single siteshits at a single sites

Number of differences Number of differences

Number of changes (e.g., number of Number of changes (e.g., number of nucleotide substitutions, number of nucleotide substitutions, number of amino acid replacements)amino acid replacements)

Page 10: Methods of Tree Reconstruction

10

Distance Matrix**

Spinach Rice Mosquito Monkey HumanSpinach 0.0 9 106 91 86

Rice 0.0 118 122 122

Mosquito 0.0 55 51

Monkey 0.0 3

Human 0.0

**Units: Numbers of nucleotide substitutions per 1,000 nucleotide sites

Page 11: Methods of Tree Reconstruction

11

Distance Methods:

UPGMA

Neighbor-relations

Neighbor joining

Page 12: Methods of Tree Reconstruction

12

UPGMA UPGMA Unweighted pair-group method with Unweighted pair-group method with

arithmetic meansarithmetic means

Page 13: Methods of Tree Reconstruction

13

UPGMA employs a sequential clustering algorithm, in which local topological relationships are identified in order of decreased similarity, and the tree is built in a stepwise manner.

Page 14: Methods of Tree Reconstruction

14

simple OTUs

Page 15: Methods of Tree Reconstruction

15

composite OTU

Page 16: Methods of Tree Reconstruction

16

Page 17: Methods of Tree Reconstruction

17

Page 18: Methods of Tree Reconstruction

18

UPGMA yields the correct answer only if the distances are ultrametric!

Q: What happens if the distances are only additive?

Q: What happens if the distances are not even additive?

Page 19: Methods of Tree Reconstruction

19

Neighborliness methods

The neighbors-relation

method (Sattath & Tversky)

The neighbor-joining method (Saitou & Nei)

Page 20: Methods of Tree Reconstruction

20

In an unrooted bifurcating tree, two OTUs are said to be neineigghborshbors if they are connected through a single internal node.

Page 21: Methods of Tree Reconstruction

21

If we combine OTUs A and B into one composite OTU, then the composite OTU (AB) and the simple OTU C become neighbors.

Page 22: Methods of Tree Reconstruction

A

B

C

D

+ < + = +

Four-Point Conditiond(A,B)+d(C,D)<d(A,C)+d(B,D) =d(A,D)+d(B,C)

Page 23: Methods of Tree Reconstruction

23

The Neighbor Joining Method

Page 24: Methods of Tree Reconstruction

24

In distance-matrix methods, it is assumed:

SimilaritySimilarity KinshipKinship

Page 25: Methods of Tree Reconstruction

25

Page 26: Methods of Tree Reconstruction

26

Similarities among OTUs can be due to:

• Ancestry:– Shared ancestral characters (symplesiomorphies)

– Shared derived characters (synapomorphy)• Homoplasy:

– Convergent events – Parallel events– Reversals

From Similarity to From Similarity to RelationshipRelationship

Page 27: Methods of Tree Reconstruction

27

Parsimony Methods:

Willi HennigWilli Hennig1913-19761913-1976

Page 28: Methods of Tree Reconstruction

28

William of Occam (ca. 1285-1349)English philosopher & Franciscan monk

William of Occam was “solemnly” excommunicated by Pope John XXII.

[Entities must not be multiplied [Entities must not be multiplied beyond necessity]beyond necessity]

Page 29: Methods of Tree Reconstruction

29

MAXIMUM PARSIMONY METHODS

Maximum parsimony involves the identification of a topology that requires the smallest number of evolutionary changes to explain the observed differences among the OTUs under study.

In maximum parsimony methods, we use discrete character states, and the shortest pathway leading to these character states is chosen as the “best” or maximum parsimony tree.

Often two or more trees with the same minimum number of changes are found, so that no unique tree can be inferred. Such trees are said to be equally parsimonious.

Page 30: Methods of Tree Reconstruction

30

Site

____________________________________________

Sequences 1 2 3 4 5 6 7 8 9

____________________________________________

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T* * *

invariantinvariant

Page 31: Methods of Tree Reconstruction

31

Site

____________________________________________

Sequences 1 2 3 4 5 6 7 8 9

____________________________________________

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T* * *

variantvariant

Page 32: Methods of Tree Reconstruction

32

Site

____________________________________________

Sequences 1 2 3 4 5 6 7 8 9

____________________________________________

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T* * *

uninformativeuninformative

Page 33: Methods of Tree Reconstruction

33

Site

____________________________________________

Sequences 1 2 3 4 5 6 7 8 9

____________________________________________

1 A A G A G T T C A

2 A G C C G T T C T

3 A G A T A T C C A

4 A G A G A T C C T* * *

informativeinformative

Page 34: Methods of Tree Reconstruction

34

Page 35: Methods of Tree Reconstruction

35

Page 36: Methods of Tree Reconstruction

36

Page 37: Methods of Tree Reconstruction

37

Page 38: Methods of Tree Reconstruction

38

In the case of four OTUs, an informative site can only favor one of the three possible alternative trees.

Thus, the tree supported by the largest number of informative sites is the most parsimonious tree.

Page 39: Methods of Tree Reconstruction

39

Inferring the maximum Inferring the maximum parsimony tree:parsimony tree:

1. Identify all the informative sites. 2. For each possible tree, calculate the minimum number of substitutions at each informative site. 3. Sum up the number of changes over all the informative sites for each possible tree.4. Choose the tree associated with the smallest number of changes as the maximum parsimony tree.

Page 40: Methods of Tree Reconstruction

Maximum parsimony (Practice):Maximum parsimony (Practice):

DataData1.TGCA2.TACC3.AGGT4.AAGT

Step 1. Identify all the informative sites.

***

Page 41: Methods of Tree Reconstruction

41

Maximum parsimony (Practice):Maximum parsimony (Practice):

DataData1.TGC2.TAC3.AGG4.AAG

Step 2. For each possible tree, calculate the minimum number of substitutions at each informative site.

Page 42: Methods of Tree Reconstruction

42

Maximum parsimony (Practice):Maximum parsimony (Practice):

DataData1.TGC2.TAC3.AGG4.AAG

Step 3. Sum up the number of changes over all the informative sites for each possible tree.

4

5

6

Page 43: Methods of Tree Reconstruction

43

Maximum parsimony (Practice):Maximum parsimony (Practice):

DataData1.TGC2.TAC3.AGG4.AAG

Step 4. Choose the tree associated with the smallest number of changes as the maximum parsimony tree.

4

5

6

Page 44: Methods of Tree Reconstruction

44

Problem (exaggerated)Problem (exaggerated)

Page 45: Methods of Tree Reconstruction

45

Fitch’s (1971) method for inferring nucleotides at internal nodes

The set at an internal node is the intersection () of the two sets at its immediate descendant nodes if the intersection is not empty.

The set at an internal node is the union (of the two sets at its immediate descendant nodes if the intersection is empty.

When a union is required to form a nodal set, a nucleotide substitution at this position must be assumed to have occurred.

Page 46: Methods of Tree Reconstruction

46

Fitch’s (1971) method for inferring nucleotides at internal nodes

4 substitutions 3 substitutions

Page 47: Methods of Tree Reconstruction

47

Testing properties of ancestral proteins

The ability to infer in silico the sequence of ancestral proteins, in conjunction with some astounding developments in synthetic biology, allow us to “resurrect” putative ancestral proteins in the laboratory and test their properties. These properties, in turn, can be used to test hypotheses concerning the physical environment which the ancestral organism inhabited (its paleoenvironment).

Page 48: Methods of Tree Reconstruction

48

Testing properties of ancestral proteins

Gaucher et al. (2003) used EF-Tu (Elongation-Factor thermounstable) gene sequences from completely sequenced mesophile eubacteria to reconstruct candidate ancestral sequences at nodes throughout the bacterial tree. These inferred ancestral proteins were, then, synthesized in the laboratory, and their activities and thermal stabilities were measured and compared to those of extant organisms.

Thermostability curves The temperature profile of the inferred ancestral protein was 55°C, suggesting that the ancestor of extant mesophiles was a thermophile.

Page 49: Methods of Tree Reconstruction

49

Ancestral reconstruction is not possible with morphological data.

Page 50: Methods of Tree Reconstruction

50

________________________________________________Number of OTUs Number of possible rooted tree________________________________________________

2 13 34 15

5 1056 954

7 10,3958 135,1359 2,027,025

10 34,459,42515 213,458,046,676,87520 8,200,794,532,637,891,559,375

________________________________________________

The impossibility of exhaustively searching for the maximum-parsimony tree when the number of OTUs is large

Page 51: Methods of Tree Reconstruction

51

Exhaustive = Examine allall trees, get the bestbest tree (guaranteed).

Branch-and-Bound = Examine somesome trees, get the bestbest tree (guaranteed).

Heuristic = Examine some trees, get a tree that may may or may not be or may not be the bestbest tree.

Page 52: Methods of Tree Reconstruction

52

Exhaustive

Page 53: Methods of Tree Reconstruction

53

Branch-and-BoundRationale: The

length of a

tree with n+1 OTUs can either

be equal to or

larger than the

length of a

tree with n OTUs.

Reminder: The total number of substitutions in a tree = tree length

Page 54: Methods of Tree Reconstruction

54

Branch-and-Bound

Obtain a tree by a fast method. (e.g., the neighbor-joining method)

Compute numbers of substitutions (L) for this tree.

Turn L into an upper bound value.

Rationale: the maximum parsimony tree must be either equal in length to L or shorter.

Page 55: Methods of Tree Reconstruction

55

Branch-and-BoundThe magnitude of the search will depend on the data (i.e., luckluck).

Page 56: Methods of Tree Reconstruction

56

Heuristic

Page 57: Methods of Tree Reconstruction

57

Page 58: Methods of Tree Reconstruction

58

Likelihood

• Example: Coin tossing• Data: 10 tosses: 6 heads + 4 tails

• Hypothesis: Binomial distribution€

L = data | hypothesis( )

Page 59: Methods of Tree Reconstruction

59

LIKELIHOOD IN MOLECULAR PHYLOGENETICS

• The data are the aligned sequences

• The model is the probability of change from one character state to another (e.g., Jukes & Cantor 1-P model).

• The parameters to be estimated are: Topology & Branch Lengths

L = sequences | tree( )

Page 60: Methods of Tree Reconstruction

60

Page 61: Methods of Tree Reconstruction

Based on “Bayes Theorem”

Thomas Bayes (1701–1761)

A = a proposition, a hypothesis.B = the evidence.P(A) = the prior, the initial degree of belief in A.P(A|B) = the posterior, the new degree of belief in A given B (the evidence). P(B|A)/P(B) = represents the support B provides for A.

Bayesian Phylogenetics