Top Banner
Probabilistic Probabilistic Approaches to Approaches to Phylogeny Phylogeny Wouter Van Gool & Thomas Jellema
49

Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to Probabilistic Approaches to PhylogenyPhylogeny

Wouter Van Gool & Thomas Jellema

Page 2: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 3: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.1 Introduction8.1 Introduction Goal: • Formulate probabilistic models for phylogeny• Infer trees from sets of sequences

Aim Probability-based Phylogeny:Rank trees according to - likelihood P(data |tree) - posterior probability P(tree|data)

Page 4: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.1 Introduction8.1 Introduction

Compute probability of a set of data given

A tree:

P(x* |T, t* )

x*: set of n sequences xj (j=1…n)

T : tree with n leaves, with sequence j at leaf j

t* : edge lengths of the tree

Page 5: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.1 Introduction8.1 Introduction

Example

Page 6: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 7: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Given the sequence at the leafs x1…xn:1. Pick a model of evolution: P(x |y,t ),P(x)

2. Enumerate all possible tree topologies with n leaves

3. For each T, maximize over all possible edge lengths t:

4. Pick the T and t that have the largest probability

Page 8: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Simplifying Assumptions:1. Single base substitions only: ungapped alignments only

2. Each base evolves independently with the same model of evolution based on a substitution matrix

Page 9: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Substitution Matrix for PhylogenyMany important families of substitution matrices are

multiplicative: S(t)S(s) = S(T+s)

Substitution matrices used in Phylogeny: Jukes & Cantor Model [1969] Kimura DNA Model [1980] PAM Matrix [1978]

Page 10: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Jukes-Cantor Model

Page 11: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

Kimura DNA model

Page 12: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.2 Probabilistic Models of Evolution8.2 Probabilistic Models of Evolution

PAM matrix model

Page 13: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 14: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.3 Calculating the likelihood for 8.3 Calculating the likelihood for ungapped alignmentsungapped alignments

Example: The likelihood of two nucleotide sequences

Page 15: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Likelihood for general case

Where node α(i) is the ancestor of node i

A fixed set of values t1…t2n-1 and topology T is required

Page 16: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Likelihood for general case

Where node α(i) is the ancestor of node i

A fixed set of values t1…t2n-1 and topology T is required

Page 17: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Felsenstein’s recursive algorithmDefine a table of probabilities Fk,a for each site u and

all tree nodes k and input characters a:

= probability at a site u for subtree below node k

assuming character u at node k is a

Page 18: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Felsenstein’s recursive algorithm

Page 19: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Likelihood for general case

Overall algorithm:• Enumerate each tree topology t• Enumerate sets of values t (using some n-

dimensional optimisation technique)• Run Felsenstein’s recursive algortihm for each site

u and multiply likelihoods• Return best T&t

Page 20: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.3 calculating the likelihood for 8.3 calculating the likelihood for ungapped alignmentsungapped alignments

Reversibility & independence of root position The score of the optimal tree is independent of the

root position if and only if:

- the substitution matrix is multiplicative

- the substitution matrix is reversible A substititution matrix is reversible if for all a,b

and t:

Page 21: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 22: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 23: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

DemoDemo

Page 24: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 25: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Maximum likelihood: The best tree “could be “ the tree that maximises the

likelihood Computationally demanding

Page 26: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Sampling from the posterior distribution: We use Bayes’ rule to compute the posterior probability This is the probability of a model given the data

Page 27: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Example

Model name prior chance of model data

Model 1 10 100% A

Model 2 40 50% A 50% B

Model 3 50 100% B

Page 28: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Sampling from the posterior distribution: We use Bayes’ rule to compute the posterior probability This is the probability of a model given the data

100

30

33 10

Page 29: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm It samples from the trees with probabilities given by their

posterior distribution. It is a sampling procedure that generates a sequence of

trees, each from the previous one.

Page 30: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Metropolis algorithm

Page 31: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

8

7

6

A proposal distribution

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 32: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

6

7

8

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 33: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

6

7

8

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 34: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

6

7

8

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 35: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Metropolis algorithm

1Tim

e fr

om r

oot

Order of traversal

2

3

4

5

6

7

8

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 36: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Metropolis algorithm

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 37: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Other phylogenetic uses of sampling

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

AATC AATT

Page 38: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Other phylogenetic uses of sampling

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

AATC AATT

AATC

Page 39: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Other phylogenetic uses of sampling

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

AATT TTAA

Page 40: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Other phylogenetic uses of sampling

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

AATC AATT

TCAAAATC

AAAA

TTAA TCAA

Page 41: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Other phylogenetic uses of sampling Inferring the history of populations

Probability density of a coalesence in time =

Probability of a coalesence between any pair

= * =

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 42: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Inferring the history of populations When the value of n is large and the value of p is close to 0

the binomial distribution with parameters n and p can be approximated by a Poisson

distribution with mean n*p

n*p = = and x = 1

The probability of a coalesence at the end of the period tk

The total probability of the tree

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 43: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

The bootstrap The bootstrap can give a approximation to the posterior. To much labour, so it is an unattractive alternative for

sampling. The bootstrap is probably more useful for non-

probabilistic tree building methods.

8.4 Using the likelihood for 8.4 Using the likelihood for inferenceinference

Page 44: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 45: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

DemoDemo

Page 46: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 47: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Conclusion• The methods of today can be used to find the most

probable tree.• Most of the methods were computationally demanding• More realistic evolutionary models are explained Thursday

Probabilistic Approaches to Probabilistic Approaches to PhylogenyPhylogeny

Page 48: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Probabilistic Approaches to PhylogenyProbabilistic Approaches to Phylogeny

Contents• Introduction/Overview Wouter• Probabilistic Models of Evolution Wouter• Calculating the Likelihood Wouter• Pause • Evolution Demo Thomas• Using the likelihood for inference Thomas• Phylogeny Demo Thomas • Summary/Conclusion Thomas• Questions

Page 49: Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.

Questions????

Probabilistic Approaches to Probabilistic Approaches to PhylogenyPhylogeny