Top Banner
. Phylogenetic Trees Lecture 2 Based on: Durbin et al Section 7.3, 7.4, 7.8
43

Phylogenetic Trees Lecture 2

Jan 06, 2016

Download

Documents

Melia

Phylogenetic Trees Lecture 2. Based on: Durbin et al Section 7.3, 7.4, 7.8. The Four Points Condition. Theorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that: d ( i,k ) + d ( j,l ) = d ( i,l ) + d ( k,j ) ≥ d ( i,j ) + d ( k,l ) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Phylogenetic Trees Lecture 2

.

Phylogenetic Trees

Lecture 2

Based on: Durbin et al Section 7.3, 7.4, 7.8

Page 2: Phylogenetic Trees Lecture 2

2

The Four Points ConditionTheorem: A set M of L objects is additive iff any subset of four objects can be labeled i,j,k,l so that:

d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l)

We call {{i,j},{k,l}} the “split” of {i,j,k,l}.

The four point condition doesn’t provides an algorithm to construct a tree from distance matrix, or to decide whether there is such a tree.

The first methods for constructing trees for additive sets used neighbor joining methods:

Page 3: Phylogenetic Trees Lecture 2

3

Constructing additive trees:The neighbor joining problem

Let i, j be neighboring leaves in a tree, let k be their parent, and let

m be any other vertex.

The formula

shows that we can compute the distances of k to all other leaves.

This suggest the following method to construct tree from a

distance matrix:

1. Find neighboring leaves i, j in the tree,

2. Replace i, j by their parent k and recursively construct a tree T

for the smaller set.

3. Add i, j as children of k in T.

)],(),(),([),( jidmjdmidmkd 2

1

Page 4: Phylogenetic Trees Lecture 2

4

Neighbor Finding

How can we find from distances alone a pair of nodes which are neighboring leaves?

Closest nodes aren’t necessarily neighboring leaves.

AB

CD

Next we show one way to find neighbors from distances.

Page 5: Phylogenetic Trees Lecture 2

5

Neighbor Finding: Saitou & Nei method

Theorem [Saitou & Nei] Assume all edge weights are positive. If D(i, j) is minimal (among all pairs of leaves), then i and j are neighboring leaves in the tree.

ij

kl

m

T1T2

is a leaf

For a leaf , let ( , ).im

i r d i m Definition: Let , be leaves Then

( , ) ( 2) ( , ) ( )where is the number of leaves in

i j

i jD i j L d i j r r

L T

The proof is rather involved !

Page 6: Phylogenetic Trees Lecture 2

6

Neighbor Joining Algorithm Set L to contain all leavesIteration: Choose i, j such that D(i, j) is minimal Create new node k, and set

Remove i, j from L, and add kTerminate:

when |L| =2 , connect two remaining nodes

1( , ) ( ( , ) ( , ) ( , )) (for some )

2( , ) ( , ) ( , )

1for each node , ( , ) ( ( , ) ( , ) ( , ))

2

d i k d i j d i m d j m m

d j k d i j d i k

m d k m d i m d j m d i j

ij

k

m

Page 7: Phylogenetic Trees Lecture 2

7

Saitou & Nei’s Idea:

is a leaf

For a leaf , let ( , ).im

i r d i m

D12 = (a+c+d) – (1/3)(a+b + a+c+d + a+c+e+f+ a+c+e+g + d+c+a + d+c+b + d+e+f +

d+e+g)

D13 = (a+b) – (1/3)(a+b + a+c+d + a+c+e+f+ a+c+e+f + b+a + b+c+d + b+c+e+f +

b+c+e+g)

Hence D12 - D13 = (4/3) c

2

5•

a

b

c d

e

fg

1

3 4•

• •

•2

1

LLet (i, j) = d(i, j) – (ri + rj)

“ L-2 ” is crucial!

D

Page 8: Phylogenetic Trees Lecture 2

8

Saitou & Nei’s proof

Notations used in the proof :

p(i, j) = the path from vertex i to vertex j; P(D,C) = (e1, e2, e3) = (D, E, F, C)

For a vertex i, and an edge e=(i , j):Ni(e) = |{k : e is on p(i, k), k is a leave}|.e.g. ND(e1) = 3, ND(e2) = 2, ND(e3) = 1NC(e1) = 1

A B

CD

e1

e3

e2

EF

Page 9: Phylogenetic Trees Lecture 2

9

Saitou & Nei’s proof: Crucial Observation

( , )

( , )

For leaves , connected by a path ( , ,.., , ):

( )[ ( ) ( )]

( 2)[ ( , ) ( , )] ( )[ ( ) ( )]

i j i je p i j

i je p l k

i j i l k j

r r lth e N e N e

L d i l d k j lth e N e N e

i

j

kl

Rest of T is a leaf

Observe that ( , ) ( ) ( ), i im e E

r d i m lth e N e

Page 10: Phylogenetic Trees Lecture 2

10

Saitou & Nei’s proof

Proof of Theorem: Assume for contradiction that D(i, j) is minimized for i, j which are not neighboring leaves.Let (i, l, ..., k, j) be the path from i to j. let T1 and T2 be the subtrees rooted at k and l which do not contain edges from P(i,j) (see figure).

ij

kl

T1T2

Notation: |T| = #(leaves in T).

Page 11: Phylogenetic Trees Lecture 2

11

Saitou & Nei’s proofCase 1: i or j has a neighboring leaf. WLOG j has a neighbor leaf m.A. D(i,j) - D(m,j)=(L-2)(d(i,j) - d(j,m) ) – (ri+rj) + (rm+ rj)

=(L-2)(d(i,k)-d(k,m) )+rm-ri

B. rm-ri ≥ (L-2)(d(k,m)-d(i,l)) + (4-L)d(k,l)

i j

kl

mT2

Substituting B in A:D(i,j) - D(m,j) ≥

(L-2)(d(i,k)-d(i,l)) + (4-L)d(k,l) = 2d(k,l) > 0,

contradicting the minimality assumption.

(since for each edge eP(k,l), Nm(e) ≥ 2 and Ni(e) L-2)

Page 12: Phylogenetic Trees Lecture 2

12

Saitou & Nei’s proof

Case 2: Not case 1. Then both T1 and T2 contain 2 neighboring leaves.WLOG |T2| ≥ |T1| . Let n,m be neighboring leaves in T1. We shall prove that D(m,n) < D(i,j), which will again contradict the minimality assumption.

i j

kl

mn

p

T1

T2

Page 13: Phylogenetic Trees Lecture 2

13

Saitou & Nei’s proof

i j

kl

mn

p

T1

T2

A. 0 ≤ D(m,n) - D(i,j)= (L-2)(d(m,n) - d(i,j) ) + (ri+rj) – (rm+rn)

B. rj-rm< (L-2)(d(j,k) – d(m,p)) + (|T1|-|T2|)d(k,p)C. ri-rn < (L-2)(d(i,k) – d(n,p)) + (|T1|-|T2|)d(l,p)

Adding B and C, noting that d(l,p)>d(k,p):D. (ri+rj) – (rm+rn) < (L-2)(d(i,j)-d(n,m)) + 2(|T1|-|T2|)d(l,p)

Substituting D in the right hand side of A:D(m,n ) - D(i,j)< 2(|T1|-|T2|)d(l,p) ≤ 0, as claimed. QED

Page 14: Phylogenetic Trees Lecture 2

14

A simpler neighbor finding methodSelect an arbitrary node r. For each pair of labeled nodes (i, j) let C(i, j) be

defined by the following figure:

C(i,j)

i

j

r

Claim: Let i, j be such that C(i, j) is maximized.Then i and j are neighboring leaves.

)],(),(),([),( jidrjdridjiC 2

1

Page 15: Phylogenetic Trees Lecture 2

15

Neighbor Joining Algorithm Set M to contain all leaves, and select a root r. |M|=L If L =2, return tree of two vertices

Iteration: Choose i, j such that C(i, j) is maximal Create new vertex k, and set

remove i, j, and add k to M Recursively construct a tree on the smaller set, then add i, j as

children on k, at distances d(i,k) and d(j,k).

ij

k

m

)],(),(),([),(

),(),(),(

)],(),(),([),(

jidmjdmidmkdm

kidjidkjd

rjdridjidkid

2

1 , nodeeach for

2

1

Page 16: Phylogenetic Trees Lecture 2

16

Complexity of Neighbor Joining Algorithm

Naive Implementation:Initialization: Θ(L2) to compute the C(i, j)’s.

Each Iteration: O(L) to update {C(i, k): i L} for the new node k. O(L2) to find the maximal C(i, j).

Total of O(L3).

ij

k

m

Page 17: Phylogenetic Trees Lecture 2

17

Complexity of Neighbor Joining Algorithm

Using Heap to store the C(i, j)’s :

Initialization: Θ(L2) to compute and heapify the C(i,j)’s.

Each Iteration: O(1) to find the maximal C(i,j). O(L logL) to delete {C(m,i), C(m,j)} and add C(m,k) for

all vertices m.

Total of O(L2 log L).

(implementation details are omitted)

Page 18: Phylogenetic Trees Lecture 2

18

Ultrametric trees as special weighted trees

Definition: An Ultrametric tree is a rooted weighted tree all of whose leaves are at the same depth. Edge weights can be represented by the distances of internal vertices from the leaves.

E.g., the tree produced by UPGMA.

Note: each internal vertex has at least two children

8

A E D CB

5

3

3

0:

3333

2

5

5

3

Page 19: Phylogenetic Trees Lecture 2

19

Ultrametric trees A more recent (and more efficient) way for constructing and identifying additive trees.Idea: Reduce the problem to constructing trees by the “heights” of the internal nodes. For leaves i, j, D(i, j) represent the “height” of the common ancestor of i and j.

AE

D C

B

8

5

3

3

Page 20: Phylogenetic Trees Lecture 2

20

Ultrametric Trees Definition: T is an ultrametric tree for a symmetric positive real

matrix D (called ultrmetric matrix) if:1. The leaves of T correspond to the rows and columns of D2. Internals nodes have at least two children, and the Least Common

Ancestor of i and j is labeled by D(i, j).3. The labels decrease along paths from root to leaves

A B C D E

A 0 8 8 5 3

B 0 3 8 8

C 0 8 8

D 0 5

E 0AE

D C

B

8

5

3

3

Page 21: Phylogenetic Trees Lecture 2

21

We will study later the following question:

Given a symmetric positive real matrix D,

Is there an ultrametric tree T for D?

Centrality of Ultrametric Trees

But first we show ultrametric trees can be used to construct trees for additive sets and other related problems.

Page 22: Phylogenetic Trees Lecture 2

22

Use the labels to define weights for all internal edges in the natural way.For this, consider the labels of leaves to be 0. We get an additive ultrametric tree whose height is the label of the root.

E

D C

B

8

5

3

3

2

53

A

3 3

5

3

3

Transforming Ultrametric Trees to Weighted Trees

Note that in this tree all leaves are at the same height. This is why it is called ultrametric.

Page 23: Phylogenetic Trees Lecture 2

23

Transforming Weighted Trees to Ultrametric Trees

A weighted Tree T can be transformed to an ultrametric tree T’ as follows:

Step 1: Pick a node k as a root, and “hang” the tree at k.

a

b

c

d

2

23

4

1

a

b

c d

2

13

4 2

k=a

Page 24: Phylogenetic Trees Lecture 2

24

Transforming Weighted Trees to Ultrametric Trees

Step 2: Let M = maxid(i, k). M is taken to be the height of T’.Label the root by M, and label each internal node j by M - d(k, j). “ k ” is the root.

a

b

c

d

2

23

4

1

a

b

c d

2

13

4 2

9

7

4

k = a, M = 9

Page 25: Phylogenetic Trees Lecture 2

25

Transforming Weighted Trees to Ultrametric Trees

Step 3: “Stretch” edges of leaves so that they are all at distance M from the root

M = 9

a

b

c d

2

13

4 2

9

7

4

(9)

(6)

(2)(0)

a

b

c d

7

9

7

4

2

3

4

9

4( M-d(k,i) )

k

mi

Page 26: Phylogenetic Trees Lecture 2

26

Re-constructing Weighted Trees from Ultrametric Trees

M = 9

Weight of an internal edge is the difference between the labels (heights) its endpoints. Assume that the distance matrix D = [d(i, j)] of the original unrooted tree is given.Weights of an edge to leaf i is obtained by subtracting “M - d(k, i)” from its current weight.

a

b

c d

7(-6)

9

7

4

4

9 (-9)

4(-2)

a

b

c d

1

2

3

4

0

2

k

m

i

(M–d(k,m))–(M–d(k,i)) = d(i,m)

Page 27: Phylogenetic Trees Lecture 2

27

How D’ is constructed from D

a

b

c d

2

13

4 2

9

7

1( , ) ( ( , ) ( , ) ( , ))

2(Here, a, b, c)

d k m d i k d j k d i j

k i j

D’(i, j) should be the height of the Least Common Ancestor of i and j in T’, the ultrametric tree hanged at k.

Let M = maxi d(i, k) and m is the LCA of i and j.Thus, D’(i, j) = M - d(k, m), where d(k, m) is computed by:

k

m

i

j

Note that this can be computed without the additive tree!

Page 28: Phylogenetic Trees Lecture 2

28

The transformation of D to D’

a b c d

a 9 9 9

b 7 7

c 4

d

a b c d

a 3 9 7

b 8 6

c 6

d

Distance matrix D

a

b

c d

2

13

4 2

Ultrametric matrix D’

a

b

c d

9

7

4

M=9

T T’

Page 29: Phylogenetic Trees Lecture 2

29

Identifying Ultrametric Trees

Definition: A distance matrix D is ultrametric if for each 3 indices i, j, k

D( i, j ) ≤ max {D( i, k ), D( j, k )}.

(i.e., there is a tie for the maximum value)

Theorem (U): D has an ultrametric tree iff it is ultrametric.

(to be proved later)

Page 30: Phylogenetic Trees Lecture 2

30

Theorem: D is an additive distance matrix if and only if D’ is an ultrametric matrix.

Note that the construction of D’ is independent of the additive tree.

Proof. ( ) Use the conversion from an additive tree to an ultrametric tree and Theorem (U) .

( ) Use Theorem (U) and the conversion from an ultrametric tree to an additive tree and check that the additive tree indeed realizes the distance matrix.

Page 31: Phylogenetic Trees Lecture 2

31

Solving the Additive Tree Problem by the Ultrametric Problem: Outline

We solve the additive tree problem by reducing it to the ultrametric problem as follows:

1. Given an input matrix D = D(i, j) of distances,

transform it to a matrix D’= D’(i, j) , where D’(i,j) is

the height of the Least Common Ancestor of i and j in

the corresponding ultrametric tree T’. (If not

ultrametric, then the input matrix is not additive!)

2. Construct the ultrametric tree, T’, for D’.

3. Reconstruct the additive tree T from T’.

Page 32: Phylogenetic Trees Lecture 2

32

LCA and distances in Ultrametric Tree

Let LCA(i, j) denote the lowest common ancestor of leaves i and j. Let D(i, j) be the height of LCA(i, j), and dist(i,j) be the distance from i to j.

Claim: For any pair of leaves i, j in an ultrametric tree:

D(i, j)= 0.5 dist(i, j).A B C D E

A 0 8 8 5 3

B 0 3 8 8

C 0 8 8

D 0 5

E 0A E D

CB

8

5

33

Page 33: Phylogenetic Trees Lecture 2

33

Identifying Ultrametric Distances

Definition: A distance matrix D of dimension L by L is ultrametric iff for each 3 indices i, j, k :

D( i, j ) ≤ max { D( i, k ), D( j, k ) }.

j k

i 9 6

j 9

Theorem(U): The following conditions are equivalent for an LL symmetric matrix D:

1. D is ultrametric

2. There is an ultrametric tree of L leaves such that for each pair of leaves i, j :

D(i, j) = height(LCA(i, j)) = ½ dist(i, j).

Note: D(i, j) ≤ max {D(i, k), D(j, k)} is easier to check than the 4 points condition. Therefore the theorem implies that ultrametric sets are easier to characterize then an additive sets.

Page 34: Phylogenetic Trees Lecture 2

34

Properties of ultrametric matrix used in the Proof of the Theorem (U)

Definition: Let D be an L by L matrix, and let S {1,...,L}.

D[S] is the submatrix of D consisting of the rows and columns with indices from S.

Claim 1: D is ultrametric iff for every S {1,...,L}, D[S] is ultrametric.

Claim 2: If D is ultrametric and maxi,jD(i, j)=m, , then m appears in every row of D above the row where the max occurs.

j k

? ?

j m

One of the “?” Must be m

Page 35: Phylogenetic Trees Lecture 2

35

Ultrametric tree Ultrametric matrix

There is an ultrametric tree s.t. D(i, j) = ½ dist(i, j).

D is an ultrametric matrix: By properties of Least Common Ancestors in trees

ijk

D(k, i) = D(j, i) ≥ D(k, j)

Page 36: Phylogenetic Trees Lecture 2

36

Ultrametric matrix Ultrametric tree

Proof of D is an ultrametric matrix D has an ultrametric tree :

By induction on L, the size of D.Basis: L= 1: T is a leaf

L= 2: T is a tree with two leaves

0 9

0

0

i

j

i j

i

i

9

ji

Page 37: Phylogenetic Trees Lecture 2

37

Induction step

Inductive Hyp.: Assume that it’s true for 1, 2, … , L-1.

Induction step: L > 2. Let m = m1 be the maximum distance.

Let Si ={l: D(1, l) = mi}, and { S1 , S2 , … Sk } form a partition of the leaves into k classes. (note: |S1| > 0)

By Claim 1, D[Si], i = 1, 2, …, k are all ultrametric and

hence we can construct tree T1 for S1, rooted at m and trees Ti for Si with root labeled mi < m for i = 2, …, k.

(if mi = 0 then Ti is a leaf).

Page 38: Phylogenetic Trees Lecture 2

38

Notice that on any ultrametric tree the path from the root to the leave “1” must have exactly k+1 nodes, where k is the number of classes.

Each node on this path must be labeled by one of the distinct entries in row 1, and those labels must appear in decreasing order on the path.

1 2 3 4 5 6 7 8

1234..

0 4 3 4 6 4 3 60 4 2 6 1 4 6

13,7

2,4,6

5,8

6

4

3

T1

T2

T3

Page 39: Phylogenetic Trees Lecture 2

39

Correctness Proof

By Inductive Hypothesis, Ti ’s are all ultrmetric trees, and

we assemble them along the path from the root to leave “1” to form the tree T.

To prove that T is an ultrametric tree for D, need to check that D(i, j) is the label of the LCA of i and j in T.

If i and j are in the same subtree, this holds by induction;

otherwise the label of the node that the higher tree attaches to the path, which is the LCA, is indeed D(i, j) .

QED

Page 40: Phylogenetic Trees Lecture 2

40

Complexity AnalysisLet f (L) be the time complexity for L×L matrix.

f (1)= f (2) = constant. For L > 2: Constructing S1 and S2: O(L). Let |S1| = k, |S2| = L-k.

Constructing T1 and T2: f (k) + f (L-k).

Joining T1 and T2 to T: Constant.

Thus we have:

f (L) ≤ maxk[ f (k) + f (L-k)] +cL, 0 < k < L.

f (L) = cL2 satisfies the above.

Need an appropriate data structure!

Page 41: Phylogenetic Trees Lecture 2

41

Recall: identifying Additive Trees via Ultrametric trees

We solve the additive tree problem by reducing it to the

ultrametric problem as follows:

1. Given an input matrix D = D(i, j) of distances,

transform it to a matrix D’= D’(i, j), where D’(i, j) is

the height of the LCA of i and j in the

corresponding ultrametric tree T’.

2. Construct the ultrametric tree, T’, for D’.

3. Reconstruct the additive tree T from T’.

Page 42: Phylogenetic Trees Lecture 2

42

How D’ is constructed from D

D’(i, j) should be the height of the Least Common Ancestror of i and j in T’, the ultrametric tree hanged at k:

Thus, D’(i,j) = M - d(k, m), where d(k, m) is computed by:

a

b

c d

2

13

4 2

9

7

1( , ) ( ( , ) ( , ) ( , ))

21

(For =a, =b, =c, '(b,c) 9 (3 9 8) 72

d k m d i k d j k d i j

k i j D

Page 43: Phylogenetic Trees Lecture 2

43

The transformation D D’ T’T

a b c d

a 0 9 9 9

b 0 7 7

c 0 4

d 0

a b c d

a 0 3 9 7

b 0 8 6

c 0 6

d 0

D

a

b

c d

2

13

4 2

D’

a

b

c d

9

7

4

M=9

T T’