Top Banner
Intro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran and by Ydo Wexler. Modifications by Benny Chor Evolution The Tree of Life Source: Alberts et al ! " #$%&$ Tree of life- a better picture
19

Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

Dec 28, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

Intro to Phylogenetic TreesLecture 6

Sections 7.1, 7.2, in Durbin et al.Chapter 17 in Gusfield

Slides by Shlomo Moran and by Ydo Wexler. Modifications by Benny Chor �

Evolution

����������� ������� ����������

� ���������� � ��������������������������������������������� ��������������

� ����� �� � ��� � � �������������������������������������������������������� � � ���� ��������

� � ������� �����

The Tree of Life

Sour

ce: A

lber

tset

al

� ���!�����" ���#���$%&$

Tree of life- a better picture

Page 2: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

Primate evolution

� ������� � ��������������������������������������������������������������� ��������������������������'������������������ ������������

Historical Note�Until mid 1950’s phylogenies were constructed by

experts based on their opinion (subjective criteria)

�Since then, focus on objective criteria for constructing phylogenetic trees� Thousands of articles in the last decades

� Important for many aspects of biology� Classification � Understanding biological mechanisms

Morphological vs. Molecular

�Classical phylogenetic analysis: morphologicalfeatures: number of legs, lengths of legs, etc.

�Modern biological methods allow to use molecularfeatures� Gene sequences� Protein sequences

�Analysis based on homologous sequences (e.g., globins) in different species

Morphological topology

(��� �) ��� ��*��+ �, ������- �� �����������(�����������)�� � ������(���������(����. ����/������ �������- �� ������ ������� 0 �����������������1��/���������0 �� ���������/��������" �����������1���������������23 ��#���������2+ ����3 ��4 ���)��/���, �������- �������� ��� ����3 �����5�#�5��" ��������� ��- ����)� � �����(��� ����6� ����- ���� ����� �#��" ����7�������. ���������������� ������#, �������" ���������� ��)��� ���������� 1��/��� ������ - � ���+ �����������������" �������, �� ���+ ���� �� �� ����(�������. �������8 ������5�������

� ������

, �����

9 ������

)�������

7���������

: �������

;(�����+ �<�� ��(����$&&=>

Page 3: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

� �������� ������������������������������;����+ ������������� � �������������>�

DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPig

Little red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat

Asiatic shrewLong-clawed shrew

MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon

White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus

5�������������

)�������

)��������������

3������ $

" ��������3������ ?

5��� ����

)���������+ ����@- ��� �� ���������

: �������1���� �����@- �������

Mitochondrial topology;(�����5��#� ������>

��

Nuclear topology

Round Eared Bat

Flying Fox

Hedgehog

Mole

Pangolin

Whale

Hippo

Cow

Pig

Cat

Dog

Horse

Rhino

Rat

Capybara

Rabbit

Flying Lemur

Tree Shrew

Human

Galago

Sloth

Hyrax

Dugong

Elephant

Aardvark

Elephant Shrew

Opossum

Kangaroo

$

?

A

B

)��������������

� ���������

)���������

������������

, �����

: �������

)�������

5�������������

- �������@� ��� ������

5��������

5��� ���

;������+ �����>

;(�����5��#� ����������>

��

Theory of Evolution

�Basic idea� speciation events lead to creation of different

species.� Speciation caused by physical separation into

groups where different genetic variants become dominant

�Any two species share a (possibly distant) common ancestor

Page 4: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

Phylogenenetic trees

� Leafs - current day species� Nodes - hypothetical most recent common ancestors� Edges length - “time” from one speciation to the next

Aardvark Bison Chimp Dog Elephant

��

Types of Trees

A natural model to consider is that of rooted trees

CommonAncestor

��

Types of treesUnrooted tree represents the same phylogeny without

the root node

Depending on the model, data from current day species does not distinguish between different placements of the root.

��

������������� ������ �����Tree a

ab

Tree b

c

Tree c

3���������������������������

Page 5: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

Positioning Roots in Unrooted Trees

�We can estimate the position of the root by introducing an outgroup: � a set of species that are definitely distant from all

the species of interest

Aardvark Bison Chimp Dog Elephant

Falcon

Proposed root

��

Type of Reconstruction

�Distance-based� Input is a matrix of distances between species� Can be fraction of residue they disagree on, or

alignment score between them, or …

�Character-based� Examine all characters (AAs or DNA bases).� Do not ``summarize’’ sequences or pairs of

sequences by a single number.� Major methods: Parsimony; Likelihood.

Two Approaches to Tree Construction

� ����� ��/ � ��������������������*���������������� �������C�����

� ��������������� D � ������������� �*�����C��������������������������������������������;� �C��� ��������������� ������#�������>�

We start with distance based methods, considering the following question:Given a set of species (leaves in a supposed tree), and distances between them – construct a phylogeny which best “fits” the distances.

Exact solution: Additive sets

Given a set M of L objects with an L×L distance matrix:� d(i,i)=0, and for i�j, d(i,j)>0� d(i,j)=d(j,i).� For all i,j,k it holds that d(i,k) � d(i,j)+d(j,k).

Can we construct a weighted tree which realizes these distances?

Page 6: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

Additive Distances (cont)

We say that the set of distances M over L objects is additive if there is a tree T, L of its nodes correspond to the L objects, with positive weights on the edges, such that for all i,j,d(i,j) = dT(i,j), the length of the path from i to j in T.

Note: Sometimes the tree is required to be binary, and then the edge weights are required to be just non-negative.

��

Distances for three objectsare always additive:

For L=3, here is always a (unique) tree with one internal node (by simple linear algebra)

( , )( , )( , )

d i j a bd i k a cd j k b c

� �

� �

� ��

i

j

k

m

Thus0

21

����� )],(),(),([),( jidkjdkidmkdc

��

How about four objects?

Not all distance matrices with 4 objects are additive, evenif they satisfy triangle inequality.E.g., no tree realizes these distances:

0l

30k

220j

2220i

lkji

��

The Four Points ConditionTheorem: A set M of distances is additive iff any subset of four objects can be labeled i,j,k,l so that:

d(i,k) + d(j,l) = d(i,l) +d(k,j) � d(i,j) + d(k,l)

ik

lj

Proof:By inspecting the figure, additivity � 4 points condition...

We call (i,j),(k,l) the “split” of {i,j,k,l}.

Page 7: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

4P Condition � Additivity:Induction on the number of objects, L.For L � 3 the condition is empty and tree exists. Consider L=4. Denote B = d(i,k) +d(j,l) = d(i,l) +d(j,k) � d(i,j) + d(k,l) = A

Let y = (B – A)/2 � 0 (length of internal edge).

Then the tree should look as follows:We want to find the distances a,b, c and f.

a b

i j

k

m

c

y

l

n

f

Again, an instance of linear algebra

��

Tree construction for L=4

ab

i

j

k

m

c

y

l

n

f

Construct the tree by the given distances as follows:1. Construct a tree for {i, j,k}, with internal vertex m2. Add vertex n ,d(m,n) = y3. Add edge (n,l), c+f=d(k,l)

n

f

n

f

n

fRemains to prove: d(i,l) = dT(i,l)d(j,l) = dT(j,l)

��

Proof for L=4

a

b

i

j

k

m

c

y

l

n

f

By the 4 points condition and the definition of y:d(i,l) = d(i,j) + d(k,l) +2y - d(k,j) = a + y + f = dT(i,l) (the middle equality holds since d(i,j), d(k,l) and d(k,j) are realized by the tree)d(j,l) = dT(j,l) is proved similarly.

��

Splits Approach to Proof: Intuition

i

j

k l

Suppose 4 points condition holds with strict inequality, >,for every four leaves.

This defines a (2,2) partition of every quartet.Can use 4 points condition to show all quartets are consistent.

This in turn used to construct tree (homework assignment).

Finally show tree distances agreewith original distances using linearAlgebra.

Page 8: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

Linear Algebraic Approach : Induction�Remove L-th object from the set�By induction, there is a tree, T’, for {1,2,…,L-1}.�For each pair of labeled nodes (i,j) in T’, let aij, bij, cij

be defined by the following figure:

aij

bij

cij

i

j

L

mij

1[ ( , ) ( , ) ( , )]

2ijc d i L d j L d i j� � �

Induction step:�Pick i and j that minimize cij.�T is constructed by adding L (and possibly mij) to T’,as in the figure. Then d(i,L) = dT(i,L) and d(j,L) = dT(j,L)� Remains to prove: For each k � i,j: d(k,L) = dT(k,L).

aij

bij

cij

i

j

L

mij

T’

��

Induction step (cont.)� Let k i,j be an arbitrary node in T’ , and let n be the

branching point of k in the path from i to j. � By the minimality of cij , (i,j),(k,L) is not a split of {i,j,k,L}. � Assume WLOG that (i,L),(j,k) is a split of {i,j, k,L}.

aij

bij

cij

i

j

L

mij

T’

k

n

��

Induction step (end)Since (i,L),(j,k) is a split, by the 4 points condition

d(L,k) = d(i,k) + d(L,j) - d(i,j)d(i,k) = dT(i,k) and d(i,j) = dT(i,j) by induction, and d(L,j) = dT(L,j) by the construction.

Hence d(L,k) = dT(L,k).QED

aij

bij

cij

i

j

L

mij

T’

k

n

Page 9: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

From Additive Distance to a Tree

By following the proof, the four point condition can be used to construct a tree from a distance matrix, or to decide that there is no such tree (namely that the distance is not additive).

But this algorithm will go over all quartets, resulting in O(L4) many steps for L species (too sllllllllllllow).

The most popular method for constructing trees for additive sets uses the neighbor joining approach.

��

Constructing additive trees:The neighbor joining problem

• Let i, j be sisters (neighboring leaves) in a tree, let k be their father, and let m be any other vertex.• Using eq. we can compute the distances from k to all other leaves.

This suggest the following method to construct tree from an additive distance matrix: 1. Find sisters i,j in the tree,2. Replace i,j by their father, k, and recursively construct a

tree T for the smaller set.3. Add i,j as children of k in T.

[ ( , ) ( , )( , ) ( , )]/ 2d i m dd j m d i jk m � ��

��

Neighbor FindingHow can we find from distances alone a pair of sisters

(neighboring leaves)? Closest nodes are not necessarily neighboring leaves.

A B

CD

Next, we show a way to find neighbors from distances.��

Neighbor Finding: Seitou & Nei method

Theorem (Saitou&Nei) Assume d is additive, with all tree edge weights positive. If D(i,j) is minimal (among all pairs of leaves), then i and j are sistertaxa in the tree.

ij

kl

m

T1T2

is a leaf

For a leaf , le ( , )t . im

i r d i m� �

, ).: Let be two leaves (out of leaves in Definitiondivergenc ( ,Then their is e ( , ) ( ) /( 2) ) i j

i j L TD i j d i j r r L� � ��

The proof is rather involved, and will be skipped (no tears pls).

Page 10: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

A simpler neighbor finding method:Select an arbitrary (fixed) node r.�For each pair of labeled nodes (i,j) let C(i,j) be defined

by the following expression (also see figure):

C(i,j)

i

j

r

Claim: Let i, j be such that C(i,j) is maximized.Then i and j are neighboring leaves.

)],(),(),([),( jidrjdridjiC ���21

Sisters Identification: Example

A B

CD

5 4 6

2025

)],(),(),([),( jidrjdridjiC ���21

Select arbitrarily r=A.C(B,C)=(15+25-30)/2=5C(B,D)=(15+34-31)/2=8C(C,D)=(25+34-49)/2=5

Claim: Let i, j be such that C(i,j) is maximized.Then i and j are neighboring leaves.

Neighbor Joining Algorithm� Set M to contain all leaves, and select a root r. |M|=L� If L =2, return a tree of two verticesIteration:� Choose i,j such that C(i,j) is maximal� Create a new vertex k, and update distances

� remove i,j, and add k to M� Recursively construct a tree on the smaller set.� When done, add i,j as children on k, at distances d(i,k) and d(j,k).

ij

k

m

[ ( , ) ( , ) ( , )] / 2( , ) ( , )

1for each other node ,

( , )(

[ ( , ) ( , ) (

, )

( , , )]2

)

d i j d i r d j r

d i j

d

d

i k

d j k

d

i k

m d i m d j m d jm ik

� � �

� �

� � �

��

Complexity of Neighbor Joining Algorithm

Naive Implementation:Initialization: �(L2) to compute the C(i,j)’ s.Each Iteration:�O(L) to update {C(i,k):i� L} for the new node k.�O(L2) to find the maximal C(i,j).Total of O(L3).

ij

k

m

Page 11: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

��

Complexity of Neighbor Joining Algorithm

Using a Heap to store the C(i,j)’s:Initialization: �(L2) to compute and heapify the C(i,j)’ s.Each Iteration:�O(1) to find the maximal C(i,j).�O(L log L) to delete {C(m,i), C(m,j)} and add C(m,k) for

all vertices m.Total of O(L2 log L).(implementation details are omitted)

��

Reconstructing Trees from Additive Matrices

0E70D670C7470B74720AEDCBA

A

C

1

B

1

1

2

2D

E

3

3

Given a distance matrix constituting an additive metric, the topology of the corresponding additive tree is unique.

Q: Do we have to test additivity before running NJ?

A: This would be bad news, as this takes O(L4) time!

��

Reconstructing Trees from Additive Matrices

0E70D670C7470B74720AEDCBA

A

C

1

B

1

1

2

2D

E

3

3

Q: Do we have to test additivity before running NJ?

A: By Seito-Nei, if matrix is additive, NJ will construct the correct tree. Algorithm does not care about awareness and need not know anything about the matrix!

��

NJ Algorithm: Example

1

( , )n

ij

r d i j�

��

• Identify i,j� as neighbours if their divergence is minimal.

• Combine i,j into a new node u.

• update the distance matrix.

• If only 3 nodes are left – finish.

Let ri be the sum of distances

from i to every other node

Here, we use the divergence,

( , ) ( ) /( 2, ) )( i jD d i j r ri Lj � �� �

i m

j n

0.1 0.1 0.1

0.40.4

k l

Page 12: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

��

Distance Matrix

0665D

6033C

5302B

6320A

DCBA

17111011 ���� DCBA rrrr

( , ) 8.5( , ) 8( , ) 8( , ) 7.5( , ) 8.5( , ) 8

D A BD A C

D A D

D B CD B D

D C D

� �

� �

� �

� �

� �

� �

U

BA

��

Distance Matrix

065.5D

603C

5.530U

DCU

5.1195.8 ��� DCU rrr( , ) 5.75( , ) 4.5

( , ) 4.25

X U CX U D

X C D

� �

� �

� �

U

BA

Y

C

��

Distance Matrix

05.6D

5.60Y

DY

U

BA

Y

C

D

Z

Reconstructing Trees from non Additive Matrices

�� .��������������������2������������E

� � .�������������� 0 F

���(���������� ��������������������������E

� ��� ��������� 3������������������������������� ���������� ����������������� ���� ���

Page 13: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

Almost Additive Matrix

� ������������2� ��G��������������H��������2������������� �����2� ��������

, ,,' '| | min{| |} mi

(n

2)

i j i ji j ed d d d

l e� �� � � �

Atteson: If d’ is almost additive with respect to a tree T, then the output of NJ is a tree T’ with the same topology as T����

��

Distance Matrix

��

Unrooted Tree - NJ

Root

��

Output - NJ

Branch lengthis proportional

to distance

Page 14: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

��

N-J Method produces an Unrooted, Additive tree

��

PAM Spinach Rice Mosquito Monkey HumanSpinach 0.0 84.9 105.6 90.8 86.3Rice 84.9 0.0 117.8 122.4 122.6Mosquito 105.6 117.8 0.0 84.7 80.8Monkey 90.8 122.4 84.7 0.0 3.3Human 86.3 122.6 80.8 3.3 0.0

What is required for the Neighbour joining method?

Distance matrix0. Distance Matrix

Neighbor-Joining MethodAn Example

��

5� + �������A �A ;" ���/ + �#��>������������-� �I��C��" �����+ �#����+ �" �� �� �I��������������� ���������

Mon-Hum

MonkeyHumanSpinachMosquito Rice

1. First Step

��

After we have joined two species in a subtree we have to compute the distances from every other node to the new subtree. We do this with a simple average of distances:Dist[Spinach, MonHum]

= (Dist[Spinach, Monkey] + Dist[Spinach, Human])/2 = (90.8 + 86.3)/2 = 88.55

Mon-Hum

MonkeyHumanSpinach

2. Calculation of New Distances

Page 15: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

��

PAM Spinach Rice Mosquito MonHumSpinach 0.0 84.9 105.6 88.6Rice 84.9 0.0 117.8 122.5Mosquito 105.6 117.8 0.0 82.8MonHum 88.6 122.5 82.8 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)

3. Next Cycle

PAM Spinach Rice MosMonHumSpinach 0.0 84.9 97.1Rice 84.9 0.0 120.2MosMonHum 97.1 120.2 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)Spin-Rice

4. Penultimate Cycle

PAM SpinRice MosMonHumSpinach 0.0 108.7MosMonHum 108.7 0.0

HumanMosquito

Mon-Hum

MonkeySpinachRice

Mos-(Mon-Hum)Spin-Rice

(Spin-Rice)-(Mos-(Mon-Hum))

5. Last Joining

��

Human

Monkey

MosquitoRice

Spinach

The result:Unrooted Neighbor-Joining Tree

Page 16: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

��

Dangers of Paralogs

Speciation events

Gene Duplication

1A 2A 3A 3B 2B 1B

If we happen to consider genes 1A, 2B, and 3A of species 1,2,3, we get a wrong tree that does not represent the ����������������������������������������������������������������������� ��������

7��������� �����������������������������������

-

--

Distance Based Reconstruction: We now move to character

based methods

��

Character-based methodsfor constructing phylogenies

In this approach, trees are constructed by comparing the characters of the corresponding species. Characters may be morphological (teeth structures, hip joint) or molecular (homologous DNA sequences). The most popular approaches are maximum parsimony (MP) and maximum likelihood (ML)

In both methods, we will assume independence of characters (no interactions). Each method has a well defined objective function. Goal is to find the tree or trees that optimize (maximize or minimize) respective function.

��

1. Maximum Parsimony� ��J���������������������J� � , �� � � �, , � �� , � ��#�����������������

������ J.������������������������2������������������E

� � �� � �

� � �� � �

� � � � � �

� � �

21 1

Here, total #substitutions = 4

�� �! �� ;�������������������>J5��#����������������������������������������������������� ;���������������������>��� �������C�������� ;�����������������������>��������������������������*���

Page 17: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

��

Example ContinuedThere are many trees possible. For example:

� � �� � �

� � �� � �

� � � � � �

� � �

11

1

Total #substitutions = 3

� � �� � �

� � �� � �

� � � � � �

� � �

11 2

Total #substitutions = 4The left tree is preferred over the right tree.

� ������������������������������������" � ��������

��

Example With One Letter

�Suppose we have five species, such that three have ‘C’ and two ‘T’ at a specified position

�Minimal tree has one evolutionary change:

C

C

CC

C

T

T

T

T � C

��

Extension to Many Letters

�What is the parsimony score of

Aardvark Bison Chimp Dog Elephant

A: CAGGTAB: CAGACAC: CGGGTAD: TGCACTE: TGCGTA

.����������������������������'�����������������������������������������

Weighted Parsimony Scores

# �������� ����" � ��������

� ����������� ����������������;���>�� � �� ��������������������������������������������� ���;���>KL���;���>K$�������� ��

Page 18: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

Evaluating Weighted Parsimony Scores

Each position is independent and computed by itself.Use Dynamic Programming on a given tree.� if k is a node with children i and j, then

S(k,a) = minx(S(i,x)+c(a,x)) + miny(S(j,y)+c(a,y))

k

ij

-;��2>

S(k,a)�the minimum score of subtree rooted at k when k has character a.

-;C��>

-;#��>

��

Evaluating Parsimony ScoresDynamic programming on a given treeInitialization:� For each leaf � set -;���>KL if � is labeled by �, otherwise -;���>K�

Iteration:� if # is node with children � and C, then -;#��>K��2;-;��2>@�;��2>>@���;-;C��>@�;���>>

Termination:� cost of tree is ��2-;��2> where � is the root

Comment:

To reconstruct an optimal assignment, we need to keep in each node k and for each character a the two characters x, y that bring about the minimum when k has character a.

��

Cost of Evaluating Parsimony for binary trees

If there are nodes, � characters, and #possible values for each character, then complexity is 8;�#?>�

Of course, we still need to search over possible trees and find the best one. One usually resorts to heuristic search techniques.

��

2. Perfect Phylogeny

Data on species is given by a Character State Matrix.Cell (p,i) has value j iff character i of object (species) p has state j.Goal: constructing evolution tree for the species.

10011E01430D13323C12102B00211Ac5c4c3c2c1Object

Character

Page 19: Intro to Phylogenetic Trees Lecture 6bchor/CG05/CG6-trees.pdfIntro to Phylogenetic Trees Lecture 6 Sections 7.1, 7.2, in Durbin et al. Chapter 17 in Gusfield Slides by Shlomo Moran

��

Motivation: Evolution Tree

7����������������������������������� �����������������;���������>�����������

� ���������J$�� ����������;�������������������>

?�� �����������;�������������������������>