Phylogenetic trees
Chimp HumanGorillaHuman ChimpGorilla
=
Chimp GorillaHuman
= =
Human GorillaChimp
Trees
A branch =An edge
External node - leaf
Human ChimpChicken Gorilla
The root
Internal nodes
Terminology
Human ChimpChicken Gorilla
INGROUPOUTGROUP
Ingroup / Outgroup:
The maximum parsimony principle.
(The shortest path)
Modified from Inferring Phylogenies (Book),Author: Prof. Joe Felsenstein
Genes: 0 = absence, 1 = presence
speciesg1g2g3g4g5g6
s1100110
s2001000
s3110000
s4110111
s5001110
s1 s4 s3 s2 s5
Evaluate this tree…
s1 s4 s3 s2 s5
1
s1 s4 s3 s2 s5
01
s1 s4 s3 s2 s5
11 0
s1 s4 s3 s2 s5
1 1 1 0 0
Gene number 1
s1 s4 s3 s2 s5
Gene number 1.
The most parsimonious ancestral character states
1 1 1 0 0
10
1
s1 s4 s3 s2 s5
Gene number 1, Option number 1.
1 1 1 0 0
1
0
1
1
s1 s4 s3 s2 s5
Gene number 1, Option number 2.
Minimal number of changes for gene 1 (character 1) = 1
1 1 1 0 0
1
0
0
1
s1 s4 s3 s2 s5
0 0
Gene number 2,
s1 s4 s3 s2 s5
Gene number 2, Option number 1.
0 1 1 0 0
1
0
0
1
s1 s4 s3 s2 s5
Gene number 2, Option number 2.
0 1 1 0 0
1
0
1
1
Gene number 2, Option number 2.
s1 s4 s3 s2 s5
0 1 1 0 0
0
0
0
0Number of changes for gene 2 (character 2) = 2
Gene number 2, Option number 3.
Sum of changes = 9
Genes: 0 = absence, 1 = presence
speciesg1g2g3g4g5g6
s1100110
s2001000
s3110000
s4110111
s5001110
Total number of changes
given the tree
121221
Can we do better?
Sum of changes = 9
YES WE CAN!
Sum of changes = 8
Sum of changes = 9
The MP (most parsimonious) tree:
s1 s4 s3 s2 s5
The MP (most parsimonious) tree:
Sum of changes for this tree topology = 8
Intermediate Summary
MP tree = one for which minimal number of changes are needed to explain the data
We can now search for the best tree under the MP criterion
Challenges
Evaluating big tree “by hand” can be problematic. We want the computer to do it.
Going over all the trees? How many trees are there?
Can we generalize to nucleotides? To amino acids?
Is the parsimony criterion ideal?
MP for nucleotides
Positions :
speciesp1p2p3p4p5p6
s1AAGTAA
s2CAAAAC
s3CAGGAA
s4AAATAC
s5GCGCCA
s1 AAGTAA
s2 CAAAAC
s3 CAGGAA
s4 AAATAC
s5 GCGCCA
s1 s4 s3 s2 s5
G
Position number 1
A A C C
s1 s4 s3 s2 s5
G
Position number 1
A
A
C CA
C
C
C Number of changes for position 1 = 2
GACA GGGACAAG GCGAGAAA
Human ChimpChicken GorillaDuck
Find the MP score of the tree for these sequences
Exercise
How to efficiently compute the MP score of a tree
A GC CA
Human ChimpChicken GorillaDuck
{A,G}
{A,C,G}
{A,C}
{A,C}
Postorder tree scan. In each node, if the intersection between the leaves is empty: we apply a union operator. Otherwise, an intersection.
The Fitch algorithm (1971):
A GC CA
Human ChimpChicken GorillaDuck
{A,G}
{A,C,G}
{A,C}
{A,C}
Total number of changes = number of union operators.
Rooting the tree
From Wiki commons
Positions :
speciesp1p2p3p4p5p6
HumanAAGTAA
ChimpAATTAC
GorillaACATAA
A A A A A AA A A
C H G G C HH C G
Total number of changes = 0
For all 3 possible tree topologies
Positions :
speciesp1p2p3p4p5p6
HumanAAGTAA
ChimpAATTAC
GorillaACATAA
A A C C A AA A C
C H G G C HH C G
Total number of changes = 1
For all 3 possible tree topologies
Positions :
speciesp1p2p3p4p5p6
HumanAAGTAA
ChimpAATTAC
GorillaACATAA
T G A A T GG T A
C H G G C HH C G
Total number of changes = 2
For all 3 possible tree topologies
Positions :
speciesp1p2p3p4p5p6
HumanAAGTAA
ChimpAATTAC
GorillaACATAA
C H G G C HH C G
Total number of changes is always the same
for all 3 possible tree topologies
With 4 taxa
Orangutan
G O HC H C GOO C HG
G H CO H O CGO H GC
G C OH H O GCO C GH O C GH
O H GC
O C HG
C H GO
C O HG
C O GH
G O HC H C GO
O C GH O C GH
C O HG
G O HCH C GO
O C GH
C
C GH
C O HG
1
5
4 3
2
O
OG
H
The position of the root does not affect the MP score.
Conclusion
Chimp
Orangutan
Gorilla
Human
C
GC A
G
G
G
G
G
G
A
G
After “bending” the trees, the association of changes and branches does not change!
Rooting does not change MP score
G
Chimp
Orangutan
Gorilla
Human
C
GC C
G
G
G
C
C
G
C
G
C
After “bending” the trees, the association of changes and branches does not change!
Rooting does not change MP score
Back to solving the relationships between human, chimp and gorilla…
Using an outgroup
1
2
3
3 1
2
No MP with 3 species
Back to solving the relationships between human, chimp and gorilla…
Using an outgroup
Human
Chimp
Chicken
Gorilla
Human
Gorilla
Chimp
Chicken
Human
Chicken
Chimp
Gorilla
With 4 taxa, there are 3 difference unrooted trees.
Human
Chimp
Chicken
Gorilla
Human
Gorilla
Chimp
Chicken
Human
Chicken
Chimp
Gorilla
One tree gets a better score (less changes) than the other trees.
Human
Chimp
Chicken
Gorilla
We then use an external knowledge, that chicken is the outgroup and get a rooted tree
C
X
Y
H
X
O
CHY O
Can you root the unrooted tree to obtain the tree below?
Exercise
How many rooted trees result from an unrooted tree with n taxa?
Exercise
Assume you have three sequences and the MP score of the unrooted tree is X. You now add another sequence. Can the score of the 4-taxa tree be lower than that of the 3 taxa tree?
Exercise