Top Banner
. Maximum Likelihood ( ML ) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from Nir Friedman’s HU course, available at www.cs.huji.ac.il/~pmai . Changes made by Dan Geiger, Ydo Wexler, and finally by Benny Chor.
24

. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

.

Maximum Likelihood (ML) Parameter Estimation

with applications to reconstructing phylogenetic

trees

Comput. Genomics, lecture 6b

Presentation taken from Nir Friedman’s HU course, available at www.cs.huji.ac.il/~pmai.

Changes made by Dan Geiger, Ydo Wexler, and finally by Benny Chor.

Page 2: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

2

The Setting We have a probabilistic model, M, of some

phenomena. We know exactly the structure of M, but not the values of its probabilistic parameters, .

Each “execution” of M produces an observation, x[i] , according to the (unknown) distribution induced by M.

Goal: After observing x[1] ,…, x[n] , estimate the model parameters, , that generated the observed data.

Page 3: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

3

Maximum Likelihood Estimation (MLE)

The likelihood of the observed data, given the model parameters , as the conditional probability that the model, M, with parameters , produces x[1] ,…, x[n] .

L()=Pr(x[1] ,…, x[n] | , M),

In MLE we seek the model parameters, , that maximize the likelihood.

Page 4: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

4

Maximum Likelihood Estimation (MLE) In MLE we seek the model parameters, , that maximize the likelihood. The MLE principle is applicable in a wide variety of applications, from speech recognition, through natural language processing, to computational biology.

We will start with the simplest example: Estimating the bias of a coin. Then apply MLE to inferring phylogenetic trees. (will later talk about MAP - Bayesian inference).

Page 5: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

5

Example: Binomial Experiment

When tossed, it can land in one of two positions: Head (H) or Tail (T)

Head Tail

We denote by the (unknown) probability P(H).

Estimation task: Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)= and P(T) = 1 -

Page 6: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

6

Statistical Parameter Fitting (restement)

Consider instances x[1], x[2], …, x[M]

such that The set of values that x can take is known Each is sampled from the same distribution Each sampled independently of the rest

i.i.d.Samples(why??)

The task is to find a vector of parameters that have generated the given data. This vector parameter can be used to predict future data.

Page 7: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

7

The Likelihood Function How good is a particular ?

It depends on how likely it is to generate the observed data

The likelihood for the sequence H,T, T, H, H is

( ) ( | ) ( [ ] | )Dm

L P D P x m

( ) (1 ) (1 )DL

0 0.2 0.4 0.6 0.8 1

L()

Page 8: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

8

Sufficient Statistics

To compute the likelihood in the thumbtack example we only require NH and NT

(the number of heads and the number of tails)

NH and NT are sufficient statistics for the binomial distribution

( ) (1 )HD

TN NL

Page 9: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

9

Sufficient Statistics

A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood

Datasets

Statistics

Formally, s(D) is a sufficient statistics if for any two datasets D and D’

s(D) = s(D’ ) LD() = LD’ ()

Page 10: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

10

Maximum Likelihood Estimation

MLE Principle:

Choose parameters that maximize the likelihood function

This is one of the most commonly used estimators in statistics

Intuitively appealing One usually maximizes the log-likelihood

function, defined as lD() = ln LD()

Page 11: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

11

Example: MLE in Binomial Data

Taking derivative and equating it to 0,

we get

log log 1D H Tl N N

1H TN N

0 0.2 0.4 0.6 0.8 1

L()

Example:(NH,NT ) = (3,2)

MLE estimate is 3/5 = 0.6

ˆ H

H T

N

N N

(which coincides with what one would expect)

Page 12: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

12

From Binomial to Multinomial

Now suppose X can have the values 1,2,…,K (For example a die has K=6 sides)

We want to learn the parameters 1, 2. …, K

Sufficient statistics:N1, N2, …, NK - the number of times each outcome is observed

Likelihood function:

MLE: (proof @ assignment 3)

1

( ) k

KN

D kk

L

ˆ kk

N

N

Page 13: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

13

Example: Multinomial

Let be a protein sequence We want to learn the parameters q1, q2,…,q20

corresponding to the frequencies of the 20 amino acids

N1, N2, …, N20 - the number of times each amino acid is observed in the sequence

Likelihood function:20

1

( ) kND k

k

L q q

1 2.... nx x x

kk

Nq

nMLE:

Page 14: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

14

Inferring Phylogenetic Trees

Let be n sequence (DNA or AA).

Assume for simplicity they are all same length, l. We want to learn the parameters of a

phylogenetic tree that maximizes the likelihood.

But wait: Should first specify a model.

1 2, ,.... , nS S S

Page 15: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

15

A Probabilistic Model

Our models will consist of a “regular” tree, where

in addition, edges are assigned substituion probabilities.

For simplicity, assume our “DNA” has only two

states, say X and Y. If edge e is assigned probability pe , this means

that the probability of substitution (X Y)

across e is pe .

Page 16: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

16

A Probabilistic Model (2)

Our models will consist of a “regular” tree, where

in addition, edges are assigned substituion probabilities.

For simplicity, assume our “DNA” has only two

states, say X and Y. If edge e is assigned probability pe , this means

that the probability of substitution (X Y)

across e is pe .

Page 17: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

17

A Probabilistic Model (3) If edge e is assigned probability pe , this means

that the probability of more involved patterns of

substitution across e (e.g. XXYXY YXYXX)

is determined, and easily computed: pe2

(1- pe)3

for this pattern. Q.: What if pattern on both sides is known, but pe is

not known? A.: Makes sense to seek pe that maximizes

probability of observation. So far, this is identical to coin toss example.

Page 18: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

18

A Probabilistic Model (4)

Now we don’t know the states at internal node(s), nor

the edge parameters pe1, pe2, pe3

XXYXY YXYXX

YYYYX

pe1

pe2

pe3

But a single edge is a fairly boring tree…

?????

Page 19: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

19

Two Ways to Go

1. Maximize over states of internal node(s)

2. Average over states of internal node(s)

In both cases, we maximize over edge parameters

XXYXY YXYXX

YYYYX

pe1

pe2

pe3?????

Page 20: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

20

Two Ways to Go

In the first version (average, or sum over states of internal

nodes) we are looking for the “most likely” setting of tree edges.

This is called maximum likelihood (ML) inference of

phylogenetic trees.

ML is probably the inference method most widely

(some would say wildly ) used.

XXYXY YXYXX

YYYYX

pe1

pe2

pe3?????

Page 21: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

21

Two Ways to Go

In the second version (maximize over states of

internal nodes) we are looking for the “most likely”

ancestral states. This is called ancestral maximum

likelihood (AML).

In some sense AML is “between” MP (having

ancestral states) and ML (because the goal is still to

maximize likelihood).

XXYXY YXYXX

YYYYX

pe1

pe2

pe3?????

Page 22: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

22

Back to the Probabilistic Model

A reconstruction method is called statistically consistent if the model it reconstructs converges to

the “true tree” as the length of the sequences goes

to infinity.

XXYXYYXYXY

YYYYX

pe1

pe2

pe3

pe4

XXXYX

pe5

Page 23: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

23

Consistency, and Beyond A reconstruction method is called statistically consistent if the model it reconstructs converges to the “true tree” as the length of the sequences goes to infinity.

We would like a reconstruction method that is

(1)Statistically Consistent(2)Computationally efficient

Page 24: . Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.

24

Status Report

Let us examine the three character based methods

we saw in light of these two criteria:Statistical consistency & Computational efficiency

method consistency efficiency

MP No (sketch on board)

No (NP complete, as seen earlier)

AML No No (NP complete)

ML Yes ! No (NP complete)