Stats 318: Lecture # 18 Agenda: Hidden Markov Models I Hidden Markov models I Forward algorithm for state estimation (filtering) I Forward-backward algorithm for state estimation (smoothing) I Viterbi algorithm for most likely explanation I EM algorithm (Baum Welch) for parameter estimation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Stats 318: Lecture # 18
Agenda: Hidden Markov Models
I Hidden Markov models
I Forward algorithm for state estimation (filtering)
I Forward-backward algorithm for state estimation (smoothing)
I Viterbi algorithm for most likely explanation
I EM algorithm (Baum Welch) for parameter estimation
Hidden Markov model
I {Ht} & {Yt} discrete time stochastic processes
I {Ht} Markov chain and not directly observable (”hidden”)
I {Yt} directly observable
P(Yt | H1:T ) = P(Yt | H1:t) = P(Yt | Ht)
I Terminology : Transition probabilities P(Ht+1 | Ht)
Emission probabilities P(Yt | Ht)
Homogeneous if above probabilities time independent (assumed henceforth)
Examples
I Speech recognition
I Finance forecasting
I DNA motif discovery
I . . .
Speech recognition
Copy number variations
How many duplications do we have? Do we have deletions?
Inference problems
1. What is the prob/likelihood of an observed sequence Y1:T ?
2. What is the prob/likelihood of the latent variable given Y ?
3. What is the most likely value of a latent variable? (’decoding problem’)
4. Given one or several observed sequences, how would we estimate model parameters?
i.e. transition & emission probabilities (and distribution of initial latent variable)
Probability of an observed sequence
P(y1:T ) =∑h1:T
P(y1:T , h1:T ) =∑h1:T
P(h1)T∏t=2
P(ht | ht−1)T∏t=1
P(yt | ht) (∗)
(∗) =∑hT
P(yT | hT )∑h1:T−1
P(hT | hT−1)P(h1)T−1∏t=2
P(ht | ht−1)T−1∏t=1
P(yt | ht)
Suggests dynamic programming solution
Forward algorithm
I Forward probabilities : αt(ht) = P(y1:t, ht) y1:T is given throughout
I Recursion: α1(h1) = P(h1)P(y1, h1) &
αt+1(ht+1) =∑ht
P(yt+1 | ht+1, ht, y1:t)P(ht+1 | ht, y1:t)αt(ht)
= P(yt+1 | ht+1)∑ht
P(ht+1 | ht)αt(ht)
One matrix-vector product per time step!
I Likelihood of an observed sequence is: P(y1:T ) =∑h
αT (h)
Probability of latent variables
Interested in conditional distribution of latent variables: