Top Banner
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University
55

Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees

Jurgen Mourik &

Richard VogelaarsUtrecht University

Page 2: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees2

Overview

• Background

• Making a tree from pairwise distances;

• Parsimony;– <break>;

• Assessing the trees: the bootstrap;

• Simultaneous alignment and phylogeny;

• Application: Phylip

Page 3: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees3

Background

• Phylogenetic tree: diagram showing evolutionary lineages of species/genes

• Trees are used:– To understand lineage of various species– To understand how various functions evolved– To inform multiple alignments

Page 4: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees4

Phylogenetic tree approaches

• Distance:– UPGMA– Neighbour-joining

• Parsimony:– Traditional parsimony– Weighted parsimony

Page 5: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees5

Making a tree from pairwise distances

• Given a set of sequences you want to build a tree.

• Compute the distances dij between each pair i, j of the sequences.

• There are many different distance measures.

• Average distance between pairs of sequences from each cluster.

Page 6: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees6

UPGMA

• Unweighted Pair Group Method using arithmetic Averages.

• It works by clustering the sequences, at each stage combining two clusters and at the same time creating a new node in a tree, using a distance measure.

Page 7: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees7

Distance between points

• |Ci| and |Cj| denote the number of sequences in clusters i and j.

ji , q in Cp in C

pq

ji

ij dCC

d1

3

2 4

i

l

j

411

1 )(d

*d ilil

Page 8: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees8

Distance between clusters

• Let Ck be the union of clusters Ci and Cj,then dkl

• Where Cl is any other cluster.

ji

jjliil

klCC

CdCdd

3

4k

l

5.32

7

11

1*31*4

kld

i

j

Page 9: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees9

Building the tree: UPGMA

Initialisation:

Assign each sequence i to its own cluster Ci,

Define one leaf of T for each sequence, and place at height zero.Iteration:

Determine the two clusters i, j for which dij is minimal.

Define a new cluster k by , and define dkl for all l.

Define a node k with daughter nodes i an j, and place it at height dij /2.

Add k to the current clusters and remove i and j.Terminiation:

When only two clusters i, j remain, place the root at height dij /2.

jik CCC

Page 10: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees10

UPGMA: Initialisation

Page 11: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees11

UPGMA: Iteration 1

Page 12: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees12

UPGMA: Iteration 2

Page 13: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees13

UPGMA: Iteration 3

Page 14: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees14

UPGMA: Terminiation

Page 15: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees15

Properties of UPGMA

• Molecular clock & ultrametric property of distances

• Additivity

Page 16: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees16

Properties of UPGMA:Molecular clock & ultrametric

• The molecular clock assumption: divergence of sequences is assumed to occur at the same rate at all points in the tree.

• If this does holds, then the data is said to be ultrametric.

Page 17: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees17

Properties of UPGMA:Additivity

• Given a tree, its edge lengths are said to be additive if the distance between any pair of leaves is the sum of the lengths of the edges on the path connecting them.

j

i

m

k

)(21

ijjmimkm

jkikij

kmjkjm

kmikim

dddd

ddd

ddd

ddd

Page 18: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees18

Neighbour-joining

• N-j constructs a tree by iteratively joining subtrees (like UPGMA).

• Produces an unrooted tree.

• Doesn’t make the molecular clock assumption, therefore the ultrametric property does not hold.

Page 19: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees19

Distances in Neighbour-joining

• Given a new internal node k, the distance to another node m is given by:

)dd(dd ijjmimkm 21

)dd(dd jmimijik 21

ikijjk ddd j

i mk

Page 20: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees20

Distances in Neighbour-joining

• Generalizing this so that the distance to all other leaves are taken into account:

• Where

• And |L| denotes the size of the set L of leaves.

)rr(dd jiijik 21

Lm

imi dL

r2

1j

i mk

Page 21: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees21

Building the tree:Neighbour-joining

Initialisation:Define T to be the set of leaf nodes, one for each given sequence, and put L=T.

Iteration:Pick a pair i, j in L for which defined by is minimal.Define a new node k and set , for all m in L.Add k to T with edges of lengths , joining k to i and j, respectively.Remove i and j from L and add k.

Termination:When L consists of two leaves i and j add the remaining edge between i and j, with length dij.

)rr(dd jiijik 21

ikijjk ddd )dd(dd ijjmimkm 2

1

)( jiijij rrdD

Lm

imi dL

r2

1

Page 22: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees22

Rooting trees

• Finding a root in an unrooted tree is sometimes accomplished by using an outgroup:– A species known to be more

distantly related to remaining species than they are to each other

• The point where the outgroup joins the rest of the tree is the best candidate for root position j

i

m

k

outgroup

Candidateroot

l

Page 23: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees23

Comments on distance based methods

• If the given data is ultrametric (and these distances represent real distances), then UPGMA will identify the correct tree.

• If the data is additive (and these distances represent real distances), then Neighbour-joining will identify the correct tree.

• Otherwise, the methods may not recover the correct tree, but they may still be reasonable heuristics.

Page 24: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees24

Phylogenetic tree approaches

• Distance:– UPGMA– Neighbour-joining

• Parsimony:– Traditional parsimony– Weighted parsimony

Page 25: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees25

Parsimony

• Most widely used tree building algorithm(?).• Finds the tree that explains the data with a

minimal number of changes.• Instead of building a tree, it assigns a cost to a

given tree.• Two components of the parsimony algorithm can

be distinguished:– The computation of a cost for a given tree;– A search through all trees, to find the overall

minimum of this cost.

Page 26: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees26

Parsimony example

• Given the following sequences: AAG,AAA,GGA,AGA.

• Several trees could explain the phylogeny

Page 27: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees27

Traditional Parsimony

• Count the number of substitutions

• At each node keep:– a list of minimal cost residues– the current cost

• Post-order traversal of the tree

Page 28: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees28

Traditional Parsimony

Initialisation:Set current cost C=0 and k =2n-1, the number of the root node.

Recursion: To obtain the set Rk:If k is a leaf node:

SetIf k is not a leaf node:

Compute Ri , Rj for the daughter i, j of k, and set if this intersection is not empty, or else

set and increment C.Termination:

Minimal cost of tree = C.

kuk xR

jik RRR jik RRR

Page 29: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees29

Weighted Parsimony

• Extension of the traditional parsimony.

• Adds a cost function S(a,b) for each substitution of a by b.

• Post-order traversal of the tree

• Aim is now to minimize the cost.

Page 30: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees30

Weighted Parsimony

Initialisation:Set k =2n-1, the number of the root node

Recursion: Compute Sk(a) for all a as follows:If k is a leaf node:

Set , otherwiseIf k is not a leaf node:

Compute Si(a), Sj(a) for all a at the daughter i, j and define

Termination:

Minimal cost of tree = minaS2n-1(a).

)),()((min)),()((min)( baSbSbaSbSaS jbibk

)( ,for )( aSxaaS kkuk

Page 31: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees31

Break

• Questions so far?

• After the break:– Assessing the trees: the bootstrap;– Simultaneous alignment and phylogeny;– Application: Phylip

Page 32: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees32

Branch and bound

• Parsimony itself can not build a tree!

• Using simple enumeration methods the number of trees become very large very fast.

• How to build the trees?– Stochastically– Branch and bound

Page 33: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees33

Branch and bound

• B&B uses the parsimony algorithm.

• It guarantees to find the overall best tree.

• It systematically builds trees by increasing the number of leaves.

• Abandons a particular avenue of tree building whenever the current incomplete tree (T*) has a cost(T*)>cost(Tmin).

Page 34: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees34

The Bootstrap

• A measure how much a tree should be trusted.

• Use the bootstrap as a method of assessing the significance of some phylogenetic feature.

Page 35: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees35

The Bootstrap (2)

• The bootstrap works as follows:– Given a dataset of an alignment of sequences.– Generate an artificial dataset of the same size as the original

dataset by picking columns from the alignment at random with replacement.

– Apply the tree building algorithm to this artificial dataset.– Repeat selection and tree building procedure n times.– The feature with which a chosen phylogenetic features

appears is taken to be a measure of the confidence we can have in this feature.

Page 36: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees36

Simultaneous alignment and phylogeny

• Simultaneously aligning sequences and finding a plausible phylogeny:– Sankoff & Cedergren’s gap-substitution algorithm;– Hein’s affine cost algorithm.

• Both find an optimal alignment given a tree.

Page 37: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees37

Sankoff & Cedergren’s gap-substitution algorithm

• Guarantees to find ancestral sequences, and alignments of them and the leaf sequences.

• It uses a character-substitution model of gaps

• Together this minimizes a tree-based parsimony-type cost.

• The algorithm is a combination of two known methods:– Dynamic programming method (Chapter 6);– Weighted Parsimony algorithm.

Page 38: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees38

Hein’s affine cost algorithm

• It uses affine gap penalties.

• Faster than the Sankoff & Cedergren algorithm.

• The aim is to find sequences z at a given node aligned to both of the sequences x and y at the daughter nodes satisfying:

• Where S is the total cost for a given alignment of two sequences. (mismatch cost =1 and 0 otherwise)

),(),(),( yxSyzSzxS

Page 39: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees39

Hein’s affine cost algorithm

• Compared to equation (2.16) (alignment with affine gap scores) here the algorithm searches for the minimal cost path.

• The affine gap cost for a gap of length k isd+(k-1)e, where e<=d.

ejiV

djiVjiV

ejiV

djiVjiV

yxSjiV

yxSjiV

yxSjiV

jiV

Y

MY

X

MX

iiY

iiX

iiM

M

)1,(

)1,(min),(

),1(

),1(min),(

),()1,1(

),()1,1(

),()1,1(

min),(

Page 40: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees40

Dynamic programming matrix for two sequences

VM

VX

VY

d=2

e=1

i

j

Page 41: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees41

Hein’s affine cost algorithm

• Find the z for whichis minimal.

• From the matrix follows: – C - - A C -– C A C - - -

• CAC could be possible z.

),(),(),( yxSyzSzxS

CAC(?)

CAC CTCACA

Page 42: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees42

Hein’s affine cost algorithmCAC(?)

CAC CTCACA

CACACA(?)

CAC CTCACA

CACAC(?)

CAC CTCACA

Which z could serve best as

ancestor?

Page 43: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees43

Hein’s affine cost algorithm

CAC

CACACA

CACAC

12),(

0),(

edCTCACACACS

CACCACS12),( edCTCACACACS

1),(

2),(

CTCACACACACAS

edCACCACACAS12),( edCTCACACACS

1),(

),(

dCTCACACACACS

edCACCACACS12),( edCTCACACACS

Page 44: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees44

Sequence graph

• Follow a path through the dynamic programming matrix.

• Derive a graph from this matrix.

• Whenever a cell is used by an optimal path a vertex is added to the graph.

Page 45: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees45

Sequence graph

Graph 1

Page 46: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees46

Sequence graph:line arrangement

Graph 1

Graph 2

Page 47: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees47

Sequence graph:replacing the dummy edges

Graph 2

Graph 3

Page 48: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees48

Dynamic Programming matrix:TAC – Graph 3

Page 49: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees49

Ancestors

• Possible ancestral sequences for the leaf sequences TAC, CAC and CTCACA given the tree shown.

• Derived from the sequence graphs.CAC

CTCACA

CACTAC

CAC

1

5

Page 50: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees50

Limitations of Hein’s model

• Hein’s algorithm takes the minimal cost sequences at each node upward.

• This can fail to give the overall optimum.

• Suppose the cost for a gap of length k is:– 13+3(k-1)

• Mismatch:– 4

• Suppose the leaves G and GTT.

Page 51: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees51

Limitations of Hein’s model

• A eligible ancestor of G and GTT would be themselves, since they both have a cost of 13+3=16.

• GT would not be eligible because of the total cost of 2*13=26.

• Now we want to branch to the ancestor of G and GTT and there is a third leave GT.– The total cost for ineligible GT would be lower than

for either G or GTT.

Page 52: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees52

Application: PHYLIP (Phylogeny Inference Package)

• Many features, among:– Traditional (unrooted) parsimony – Branch and bound to find all most parsimonious

trees

Page 53: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Building phylogenetic trees53

Application: PHYLIP

• Test dataset:Jurgen AACGUGGCCAAAU

Alpha ACCGCCGCCAAAU

Beta AAGGUCGCCAAAC

Gamma CAUUUCGUCACAA

Delta GGUAUCUCGGCCU

Epsilon GAAAUCUCGAUCC

Richard GGGCUCUCGGCUC

Page 54: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Demo

Page 55: Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.

Questions?