Top Banner
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 spring 2002 Modified Modified by Benny Chor, using also some slides of Nir by Benny Chor, using also some slides of Nir Friedman (Hebrew Univ.), for the Computational Genomics Friedman (Hebrew Univ.), for the Computational Genomics Course, Tel-Aviv Univ., Dec. 2002 Course, Tel-Aviv Univ., Dec. 2002
34

Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Dec 18, 2015

Download

Documents

Irene Tyler
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Lecture 8: Hidden Markov Models (HMMs)

Michael Gutkin Shlomi Haba

Prepared byPrepared by

Originally presented at Yaakov Stein’s DSPCSP Seminar, spring Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 20022002

ModifiedModified by Benny Chor, using also some slides of Nir Friedman by Benny Chor, using also some slides of Nir Friedman (Hebrew Univ.), for the Computational Genomics Course, Tel-Aviv (Hebrew Univ.), for the Computational Genomics Course, Tel-Aviv Univ., Dec. 2002Univ., Dec. 2002

Page 2: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Outline

Discrete Markov Models Hidden Markov Models Three major questions: Q1. Computing the probability of a given observation. A1. Forward – Backward (Baum Welch) DP algorithm. Q2. Computing the most probable sequence, given

an observation. A2. Viterbi DP Algorithm Q3. Given an observation, learn best model. A3. Expectation Maximization (EM): A Heuristic.

Page 3: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Markov Models A discrete (finite) system:

N distinct states. Begins (at time t=1) in some initial state. At each time step (t=1,2,…) the system moves from current to next state (possibly the same

as the current state) according to transition probabilities associated with current state.

This kind of system is called aDiscrete Markov Model

Page 4: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Discrete Markov Model Example: Discrete Markov

Model with 5 states Each of the aij represents

the probability of moving from state i to state j

The aij are given in a matrix A = {aij}

The probability to start in a given state i is i , The vector represents these

startprobabilities.

Page 5: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Types of Models Ergodic model

Strongly connected - directed

path w/ positive probabilities

from each state i to state j

(but not necessarily complete directed graph)

Page 6: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Types of Models (cont.) Left-to-Right (LR) model

Index of state non-decreasing with time

Page 7: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline AuxiliaryDiscrete Markov Model - Example

States – Rainy:1, Cloudy:2, Sunny:3

Matrix A –

Problem – given that the weather on day 1 (t=1) is sunny(3), what is the probability for the observation O:

Page 8: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Discrete Markov Model – Example (cont.)

The answer is -

Page 9: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary Hidden Markov Models (probabilistic finite state automata)

Often we face scenarios where states cannot be directly observed.

We need an extension: Hidden Markov Modelsa11 a22

a33 a44

a12 a23a34

b11 b14

b12

b13

12 3

4

Observed

phenomenon

aij are state transition probabilities.

bik are observation (output) probabilities.

b11 + b12 + b13 + b14 = 1,

b21 + b22 + b23 + b24 = 1, etc.

Page 10: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline AuxiliaryExample: Dishonest Casino

Actually, what is hidden in this model?

Page 11: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline AuxiliaryBiological Example: CpG islands

In human genome, CpG dinucleotides are relatively rare

CpG pairs undergo a process called methylation that modifies the C nucleotide

A methylated C can (with relatively high probability) mutate to a T

Promoter regions are CpG rich These regions are not methylated, and

thus mutate less often These are called CpG islands

Page 12: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

CpG Islands We construct two

Markov chains: One for CpG rich, one for CpG poor regions.

Using observations from 60K nucleotide, we get two models, + and - .

Page 13: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

HMMs – Question I Given an observation sequence O = (O1 O2 O3 … OT),

and a model M = {A, B, }how do we efficiently compute P(O|M), the probability that the given model M produces the observation O in a run of length T ?

This probability can be viewed as a measure of the

quality of the model M. Viewed this way, it enables discrimination/selection among alternative models.

Page 14: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline AuxiliaryHMM – Question II (Harder)

Given an observation sequence, O = (O1 O2 O3 … OT), and a model, M = {A, B, }how do we efficiently compute the most probable sequence(s) of states, Q?

That is, the sequence of states Q = (Q1 Q2 Q3 … QT) , which maximizes P(O|Q,M), the probability that the given model M produces the given observation O when it goes through the specific sequence of states Q .

Recall that given a model M, a sequence of observations O, and a sequence of states Q, we can efficiently compute P(O|Q,M) (should watch out for numeric underflows)

Page 15: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline AuxiliaryHMM – Question III (Hardest)

Given an observation sequence O = (O1 O2 O3 … OT), and a

class of models, each of the form M = {A, B, }, which

specific model “best” explains the observations? A solution to question I enables the efficient computation of P(O|M) (the probability that a specific model M produces the observation O). Question III can be viewed as a learning problem: We want to use the sequence of observations in order to

“train” an HMM and learn the optimal underlying model parameters (transition and output probabilities).

Page 16: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline AuxiliaryHMM Recognition (question I)

For a given model M = { A, B, } and a given state sequence

Q1 Q2 Q3 … QT ,, the probability of an observation sequence O1 O2 O3 … OT is P(O|Q,M) = bQ1O1 bQ2O2 bQ3O3

… bQTOT

For a given hidden Markov model M = { A, B, }the probability of the state sequence Q1 Q2 Q3 … QT

is (the initial probability of Q1 is taken to be Q1)

P(Q|M) = Q1 aQ1Q2 aQ2Q3 aQ3Q4 …

aQT-1QT

So, for a given hidden Markov model, Mthe probability of an observation sequence O1 O2 O3 … OT

is obtained by summing over all possible state sequences

Page 17: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

HMM – Recognition (cont.)

P(O| M) = P(O|Q) P(Q|M)

= Q Q1 bQ1O1 aQ1Q2 bQ2O2 aQ2Q3 bQ2O2 …

Requires summing over exponentially many paths But can be made more efficient

Page 18: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

HMM – Recognition (cont.) Why isn’t it efficient? – O(2TQ )

For a given state sequence of length T we have about 2T calculations

P(Q|M) = Q1 aQ1Q2 aQ2Q3 aQ3Q4 …

aQT-1QT

P(O|Q) = bQ1O1 bQ2O2 bQ3O3 …

bQTOT

There are Q possible state sequence So, if Q=5, and T=100, then the algorithm

requires 2 100 5 1.6 10 computations

We can use the forward-backward (F-B) algorithm

T

xx 100 ~~ x 72

T

Page 19: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

The F-B Algorithm Some definitions 1. Legal final state – a state at which a path through the model may

end.

2. - a “forward-going”

3. – a “backward-going”

4. a(j|i) = aij ; b(O|i) = biO

5. O = the observation O1O2…Ot in times 1,2,…,t (O1 on t=1, O2 on t=2, etc.)

1

t

Page 20: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

The F-B Algorithm (cont.) can be recursively calculated

Stopping condition

Moving from state i to state j

But we can enter state j from all others states

Page 21: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

The F-B Algorithm (cont.) Now we can work sequentially

And on time t=T we get what we wanted -

Page 22: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

The F-B Algorithm (cont.) The full algorithm –

Run Demo

Page 23: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

The F-B Algorithm (cont.) The likelihood is measured using any

sequence of states of length T This is known as the “Any Path” Method

We can choose an HMM by the probability generated using the best possible sequence of states We’ll refer to this method as the “Best

Path” Method

Page 24: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Most Probable States Sequence (ques. II)

Idea: If we know the value of Qi , then the

most probable sequence on i+1,…,n does not depend on observations before time i

Let Vl(i) be the probability of the best sequence Q1,…,Qi such that Qi = l

Page 25: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Viterbi Algorithm A DP problem

Grid X – frame index, t (time) Q – State index, i

Constraints Every path must advance in time by one, and

only one, time step for each path segment Final grid points on any path must be of the

form (T, if ), where if is a legal final state in a model

Page 26: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Viterbi Algorithm (cont.) Cost

Node (t,i) – the probability to emit the observation y(t) on state i = biy

Transition from (t-1,i) to (t,j) – the probability to change state from i to j = aij

The total cost associated with the path is given by the product of the costs (type B)

Initial Transition cost: a0i = i

Goal The best path will be the one of maximum cost

Page 27: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Viterbi Algorithm (cont.) We can use the trick of taking

negative logarithms Multiplications of probabilities are

expansive and numerically problematic Sums of numerically stable numbers are

simpler The problem is turned into a minimal-cost

path search

Page 28: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Viterbi Algorithm (cont.) Run Demo

Page 29: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

HMM – EM Training Using the Baum-Welch algorithm

Is an EM algorithm Estimate – approximate the result Maximize – and if needed, re-estimate

The estimation algorithm is based on DP algorithms (F-B & Viterbi)

Page 30: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

HMM – EM Training (cont.) Initializing

Begin with an arbitrary model M Estimate

Evaluate the likelihood P(O|M) Along the way, keep track of some tallies Recalculate the matrixes A and B

e.g, aij = Maximize

If P(O|M) – P(O|M) ≥ , re-estimate with M=M Use several initial models to find a

favorable local maximum of P(O|M)

number of transitions from i to j

number of transitions exiting state i

Page 31: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

HMM – Training (cont.) Why a local maximum?

Page 32: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Auxiliary

Physiology Model

Page 33: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Auxiliary cont.

Articulation

Page 34: Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Hidden Markov Models – Computational Genomics

Previous Next Back Outline Auxiliary

Auxiliary cont.

Spectrogram

Patterson - Barney Diagram

Mapping by the formants