Hidden Markov Models, II. Notation, Problems, Algorithms Steven R. Dunbar Notation and Process The Three Problems Algorithms Viterbi Toy Example: The Variable Factory Example: The Cheating Casino Hidden Markov Models, II. Notation, Problems, Algorithms Steven R. Dunbar February 24, 2017 1 / 24
24
Embed
Hidden Markov Models, II. Notation, Problems, Algorithmssdunbar1/ProbabilityTheory/Lessons/Hidden... · Hidden Markov Models, II. Notation, Problems, Algorithms Steven R. Dunbar Notation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
T = length of observation sequence.N = number of states in the model.M = number of observation symbols.Q = {q0, q1, . . . , qN−1} = states of the Markov chain.V = {0, 1, 2, . . . ,M − 1} = set of possible observations.A = (aij) = (P [qj | qi]) = state transition probability matrix.B = (bi(j)) = (bij) = P [vj | qi] observation probability.π = {πj} = initial state distribution at time 0.
O = (O0,O1, . . .OT−1) = the observation sequence.
Set t = 0.Choose an initial state i0 according to the initialstate distribution π.Choose Oj according to bi(·) the symbol probabilitydistribution in state i.Set t = t+ 1.Choose the new state qj according to the probabilitytransition matrix A.Return to step 3 if t < T ; otherwise terminate theprocess.
Given the model λ = (A,B, π) and a sequence ofobservations O, find P [O |λ]. That is, determine thelikelihood of the observed sequence O, given the model.
Problem 1 is the evaluation problem: given a modeland observations, how can we compute the probabilitythat the model produced the observed sequence. We canalso view the problem as: how we “score” or evaluate themodel. If we think of the case in which we have severalcompeting models, the solutions of problem 1 allows usto choose the model that best matches the observations.
Given the model λ = (A,B, π) and a sequence ofobservations O, find an optimal state sequence. In otherwords, we want to uncover the hidden part of the HiddenMarkov Model.
Problem 2 is the one in which we attempt to uncover thehidden part of the model, i.e. the state sequence. This isthe estimation problem. Use an optimality criterion todiscriminate which sequence best matches theobservations. Two optimality criteria are common, andso the choice of criterion is a strong influence on therevealed state sequence.
Given an observation sequence O and the dimensions Nand M , find the model λ = (A,B, π) that maximizes theprobability of O. This can interpreted as training amodel to best fit the observed data. We can also viewthis as search in the parameter space represented by A,B and π.
The solution of Problem 3 attempts to optimize themodel parameters so as best to describe how theobserved sequence comes about. The observed sequenceused to solve Problem 3 is called a training sequencesince it is used to train the model. This training problemis the crucial one for most applications of hidden Markovmodels since it creates best models for real phenomena.
The backward algorithm, or beta-pass. This isanalogous to the alpha-pass described in the solution tothe first problem of HMMs, except that it starts at theend and works back toward the beginning.
It is independent of the forward algorithm, so it can bedone in parallel.
For t = 0, 1, . . . , T − 2 and i, j ∈ {0, 1, . . . , N − 1},define the di-gamma passes as
γt(i, j) = P [xt = qi, xt+1 = j | O, λ]so γt(i, j) is the probability of being in state qi at time tand transitioning to state qj at time t+ 1. The gammascan be written in terms of α, β, A and B as
γt(i, j) =αt(i)aijbj(Ot+1)βt+1(j)
P [O |λ]For t = 0, 1, . . . , T − 2, the γt(i) and γt(i, j) are relatedby
Re-estimation is an iterative process. First we initializeλ = (A,B, π) with a best guess, or if no reasonableguess is available, we choose random values such thatπi ≈ 1/N and aij ≈ 1/N and bj(k) ≈ 1/M . It is criticalthat A, B and π be randomized since exactly uniformvalues will results in a local maximum from which themodel cannot climb. As always, A, B, π must be rowstochastic.
1 Initialize λ = (A,B, π) with a best guess.2 Compute αt(i), βt(i), γt(i, j), and γt(i).3 Re-estimate the modelλ = (A,B, π).4 If P [O |λ] increases by at least some predetermined
threshold or the predetermined maximum number ofiterations has not been exceeded, go to step 2.
Born 1935 in Bergamo ItalyItalian Jewish family emigrated to US, August 1939Graduated BS/MS/EE from MIT in 1957JPL/NASA from 1958-1962Ph.D./EE USC in 1962Professor EE at UCLA, 1963-1973Viterbi ALgorithm, March 19661969 co-founds startup LinkabitAcquired by Microwave Assoc. Communications,1980Co-founds Qualcomm, 1985-20002004: Gives $52 Million to USC School ofEngineering
If in good state 0 then, independent of the past,with probability 0.9 it will be in state 0 during thenext period,with probability 0.1 it will be in bad state 1,once in state 1 it remains in that state forever.
At each time the state which is individually most likely is(0, 1, 1) ( i.e. factory good, not good, not good) Notethat here all state transitions are valid, but need notalways be the case.
This is different from the most likely overall path whichwas (1, 1, 1) (exhaustively computed last time.)
the casino uses a fair die most of the time,occasionally the casino secretly switches to a loadeddie,later the casino switches back to the fair die.the switch from fair-to-loaded occurs withprobability 0.05
from loaded-to-fair with probability 0.1.assume that the loaded die will come up “six” withprobability 0.5
the remaining five numbers with probability 0.1each.