Hidden Markov Models, II. Notation, Problems, Algorithmssdunbar1/ProbabilityTheory/Lessons/Hidden... · Hidden Markov Models, II. Notation, Problems, Algorithms Steven R. Dunbar Notation

Hidden MarkovModels, II.Notation,Problems,Algorithms

Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi

Toy Example:The VariableFactory

Example: TheCheating Casino

Hidden Markov Models, II. Notation,Problems, Algorithms

Steven R. Dunbar

February 24, 2017

1 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Outline

1 Notation and Process

2 The Three Problems

3 Algorithms

4 Viterbi

5 Toy Example: The Variable Factory

6 Example: The Cheating Casino

2 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Notation

T = length of observation sequence.N = number of states in the model.M = number of observation symbols.Q = {q0, q1, . . . , qN−1} = states of the Markov chain.V = {0, 1, 2, . . . ,M − 1} = set of possible observations.A = (aij) = (P [qj | qi]) = state transition probability matrix.B = (bi(j)) = (bij) = P [vj | qi] observation probability.π = {πj} = initial state distribution at time 0.

O = (O0,O1, . . .OT−1) = the observation sequence.

The HMM is denoted by λ = (A,B, π).3 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Process

Set t = 0.Choose an initial state i0 according to the initialstate distribution π.Choose Oj according to bi(·) the symbol probabilitydistribution in state i.Set t = t+ 1.Choose the new state qj according to the probabilitytransition matrix A.Return to step 3 if t < T ; otherwise terminate theprocess.

4 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Diagram of Hidden Markov Model

Markov process:

Observations:

X0 X1 X2. . . XT−1

O0 O1 O2. . . OT−1

A A A A

B B B B

5 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Evaluation Problem 1

Given the model λ = (A,B, π) and a sequence ofobservations O, find P [O |λ]. That is, determine thelikelihood of the observed sequence O, given the model.

Problem 1 is the evaluation problem: given a modeland observations, how can we compute the probabilitythat the model produced the observed sequence. We canalso view the problem as: how we “score” or evaluate themodel. If we think of the case in which we have severalcompeting models, the solutions of problem 1 allows usto choose the model that best matches the observations.

6 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Problem 2: Uncover the Hidden States

Given the model λ = (A,B, π) and a sequence ofobservations O, find an optimal state sequence. In otherwords, we want to uncover the hidden part of the HiddenMarkov Model.

Problem 2 is the one in which we attempt to uncover thehidden part of the model, i.e. the state sequence. This isthe estimation problem. Use an optimality criterion todiscriminate which sequence best matches theobservations. Two optimality criteria are common, andso the choice of criterion is a strong influence on therevealed state sequence.

7 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Problem 3: Training and Parameter Fitting

Given an observation sequence O and the dimensions Nand M , find the model λ = (A,B, π) that maximizes theprobability of O. This can interpreted as training amodel to best fit the observed data. We can also viewthis as search in the parameter space represented by A,B and π.

The solution of Problem 3 attempts to optimize themodel parameters so as best to describe how theobserved sequence comes about. The observed sequenceused to solve Problem 3 is called a training sequencesince it is used to train the model. This training problemis the crucial one for most applications of hidden Markovmodels since it creates best models for real phenomena.

8 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Forward Algorithm

The forward algorithm or alpha pass. Fort = 0, 1, 2, . . . T − 1 and i = 0, 1, . . . , N − 1, define

αt(i) = P [O0,O1,O2, . . .Ot, xt = qi |λ]

Then αt(i) is the probability of the partial observation ofthe sequence up to time t with the Markov process instate qi at time t.

9 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Recursive computation

The crucial insight is that the α(i) can be computedrecursively as follows

1 Let α0(i) = πibi(O0), for i = 0, 1, . . . N − 1.2 For t = 1, 2, . . . , T − 1 and i = 0, 1, . . . N − 1,

compute

αt(i) =

[N−1∑j=0

αt−1(j)aji

]bi(Ot)

3 Then it follows that

P [O |λ] =N−1∑i=0

αT−1(i).

The forward algorithm only requires about N2Tmultiplications, a large improvement over the 2TNT

multiplications for the naive approach.

10 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Backward Algorithm

The backward algorithm, or beta-pass. This isanalogous to the alpha-pass described in the solution tothe first problem of HMMs, except that it starts at theend and works back toward the beginning.

It is independent of the forward algorithm, so it can bedone in parallel.

11 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Recursive computation

For t = 0, 1, 2, . . . T − 1 and i = 0, 1, . . . , N − 1, define

βt(i) = P [Ot+1,Ot+2, . . .OT−1 |xt = qi, λ] .

The crucial insight again is that the βt(i) can becomputed recursively as follows

1 Let βT−1(i) = 1, for i = 0, 1, . . . N − 1.2 For t = T − 2, T − 3, . . . , 0 and i = 0, 1, . . . N − 1,

compute

βt(i) =N−1∑j=0

aijbj(Ot+1)βt+1(j)

12 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



The Viterbi (Forward-Backward) Algorithm

For t = 0, 1, 2, . . . T − 1 and i = 0, 1, . . . N − 1 definethe posteriors

γt(i) = P [xt = qi | O, λ] .

Since αt(i) measures the probability up to time t andβt(i) measures the probability after time t

γt(i) =αt(i)βt(i)

P [O |λ] .

Recall P [O |λ] =N−1∑i=0

αT−1(i).

The most likely state at time t is the state qi for whichγi(t) is a maximum.13 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Baum-Welch Algorithm

For t = 0, 1, . . . , T − 2 and i, j ∈ {0, 1, . . . , N − 1},define the di-gamma passes as

γt(i, j) = P [xt = qi, xt+1 = j | O, λ]so γt(i, j) is the probability of being in state qi at time tand transitioning to state qj at time t+ 1. The gammascan be written in terms of α, β, A and B as

γt(i, j) =αt(i)aijbj(Ot+1)βt+1(j)

P [O |λ]For t = 0, 1, . . . , T − 2, the γt(i) and γt(i, j) are relatedby

γt(i) =N−1∑j=0

γt(i, j)

14 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Training and Estimating Parameters

Re-estimation is an iterative process. First we initializeλ = (A,B, π) with a best guess, or if no reasonableguess is available, we choose random values such thatπi ≈ 1/N and aij ≈ 1/N and bj(k) ≈ 1/M . It is criticalthat A, B and π be randomized since exactly uniformvalues will results in a local maximum from which themodel cannot climb. As always, A, B, π must be rowstochastic.

15 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Iterative Process

1 Initialize λ = (A,B, π) with a best guess.2 Compute αt(i), βt(i), γt(i, j), and γt(i).3 Re-estimate the modelλ = (A,B, π).4 If P [O |λ] increases by at least some predetermined

threshold or the predetermined maximum number ofiterations has not been exceeded, go to step 2.

5 Else stop and output λ = (A,B, π).

16 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Andrew Viterbi

Born 1935 in Bergamo ItalyItalian Jewish family emigrated to US, August 1939Graduated BS/MS/EE from MIT in 1957JPL/NASA from 1958-1962Ph.D./EE USC in 1962Professor EE at UCLA, 1963-1973Viterbi ALgorithm, March 19661969 co-founds startup LinkabitAcquired by Microwave Assoc. Communications,1980Co-founds Qualcomm, 1985-20002004: Gives $52 Million to USC School ofEngineering

17 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



The Variable Factory 1

Production process in a factory

If in good state 0 then, independent of the past,with probability 0.9 it will be in state 0 during thenext period,with probability 0.1 it will be in bad state 1,once in state 1 it remains in that state forever.

Classic Markov chain:

A =

( 0 1

0 0.9 0.11 0 1

).

18 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



The Variable Factory 2

Each item produced in state 0 is acceptable qualitywith probability 0.99.Each item produced in state 1 is acceptable qualitywith probability 0.96.

P [u | 0] = 0.01 P [a | 0] = 0.99

P [u | 1] = 0.04 P [a | 1] = 0.96

19 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Alpha Pass (Forward)

αt(i) 0 (a) 1 (u) 2 (a)0 (0.8)(0.99) [(0.792)(0.9) [0.007128)(0.9)

= 0.792 +(0.192)(0)] +(0.010848)(0)]×(0.01) ×(0.99)= 0.007128 = 0.006351

1 (0.2)(0.96) [(0.792)(0.1) [0.007128)(0.1)= 0.192 +(0.192)(1)] +(0.010848)(1)]

×(0.04) ×(0.96)= 0.010848 = 0.0011098

0.006351 + 0.011098= 0.01745

20 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Beta-Pass (Backward)

βt(i) 0 1 20 (0.9)(0.01)(0.987) (0.9)(0.99)(1) 1

+(0.1)(0.04)(0.96) +(0.1)(0.96)(1)= 0.012723 = 0.987

1 (0)(0.01)(0.987) (0)(0.99)(1) 1+(1)(0.04)(0.96) +(1)(0.96)(1)= 0.0384 = 0.96

21 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Gamma Pass, (Posterior)

αβ 0 1 2(0.792)(0.012723) (0.0077128)(0.987) (0.006351)(1)= 0.01007662 = 0.007035 = 0.006351(0.192)(0.0384) (0.010848)(0.96) (0.011098)(1)= 0.007373 = 0.010414 = 0.0110980.01745 0.01745 0.01745

γt(i) 0 1 20 0.5775 0.4031 0.36401 0.4225 0.5968 0.6360

22 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



Most Probable States

γt(i) 0 1 20 0.5775 0.4031 0.36401 0.4225 0.5968 0.6360

At each time the state which is individually most likely is(0, 1, 1) ( i.e. factory good, not good, not good) Notethat here all state transitions are valid, but need notalways be the case.

This is different from the most likely overall path whichwas (1, 1, 1) (exhaustively computed last time.)

23 / 24


Steven R.Dunbar

Notation andProcess

The ThreeProblems

Algorithms

Viterbi



In a hypothetical dishonest casino,

the casino uses a fair die most of the time,occasionally the casino secretly switches to a loadeddie,later the casino switches back to the fair die.the switch from fair-to-loaded occurs withprobability 0.05

from loaded-to-fair with probability 0.1.assume that the loaded die will come up “six” withprobability 0.5

the remaining five numbers with probability 0.1each.

24 / 24

Hidden Markov Models, II. Notation, Problems, Algorithmssdunbar1/ProbabilityTheory/Lessons/Hidden... · Hidden Markov Models, II. Notation, Problems, Algorithms Steven R. Dunbar Notation

Documents