Hidden Markov Models in Bioinformatics 14.11 60 min Definition Three Key Algorithms • Summing over Unknown States • Most Probable Unknown States • Marginalizing Unknown States Key Bioinformatic Applications • Pedigree Analysis • Isochores in Genomes (CG-rich regions) • Profile HMM Alignment • Fast/Slowly Evolving States • Secondary Structure Elements in Proteins • Gene Finding • Statistical Alignment
12
Embed
Hidden Markov Models in Bioinformaticshein/hmm.pdf · Hidden Markov Models in Bioinformatics 14.11 60 min Definition Three Key Algorithms • Summing over Unknown States • Most
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Hidden Markov Models in Bioinformatics 14.11 60 min
Definition
Three Key Algorithms
• Summing over Unknown States
• Most Probable Unknown States
• Marginalizing Unknown States
Key Bioinformatic Applications• Pedigree Analysis• Isochores in Genomes (CG-rich regions)• Profile HMM Alignment• Fast/Slowly Evolving States• Secondary Structure Elements in Proteins• Gene Finding• Statistical Alignment
Hidden Markov Models
O1 O2 O3 O4 O5 O6 O7 O8 O9 O10H1
H2
H3
(O1,H1), (O2,H2),……. (On,Hn) is a sequence of stochastic variables with 2components - one that is observed (Oi) and one that is hidden (Hi).
The marginal discribution of the Hi’s are described by a Homogenous Markov Chain:
•Let pi,j = P(Hk=i,Hk+1=j)
•Let πi =P{H1=i) - often πi is the equilibrium distribution of the Markov Chain.
•Conditional on Hk (all k), the Ok are independent.
•The distribution of Ok only depends on the value of Hi and is called the emit function
€
e(i, j) = P{Ok = i Hk = j)
What is the probability of the data?
O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
H1H2H3
€
PO5 = iH5 = 2 = P(O5 = i H5 = 2) PO4
H4 = j
H4 = j∑ p j,i
€
P(r O ) = P(r
H ∑r O
r H )P(
r H )The probability of the observed is , which can
be hard to calculate. However, these calculations can be considerablyaccelerated. Let the probability of the observations (O1,..Ok)conditional on Hk=j. Following recursion will be obeyed:
€
POk = iHk = j
€
i. POk = iHk = j = P(Ok = i Hk = j) POk−1
Hk−1 = j
Hk−1 = j∑ p j,i
€
ii. PO1 = iH1 = j = P(O1 = i H1 = j)π j (initial condition)
€
iii. P(O) = POn= iHn= j
Hn= j∑
What is the most probable ”hidden” configuration?
€
H61 =max j{H6
1 * p j,1 *e(O6,1)}O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
H1H2H3
€
P{OH}Let be the sequences of hidden states that maximizes the observedsequence O ie ArgMaxH[ ]. Let probability of the mostprobability of the most probable path up to k ending in hidden state j.
Again recursions can be found€
Hkj
€
H*
€
i. Hkj = maxi{Hk−1
i pi, j}e(Ok, j)
€
ii. H1j = π je(O1,1)
€
iii. Hk−1* = {i Hk−1
i pi, je(Ok, j) = HkHk
*}
The actual sequence of hidden states can be found recursively by
€
Hk*
€
H5* = {i H5
i * pi,1 *e(O6,1) = H61}
What is the probability of specific ”hidden” state?
€
Qkj = P(Ok Hk+1 = i)p j,iQk+1
i
Hk+1= i∑
O1 O2 O3 O4 O5 O6 O7 O8 O9 O10
H1H2H3
€
P{Hk = j) = PkjQk
j /P(O)
€
P{H5 = 2) = P52Q5
2 /P(O)
Let be the probability of the observations from j+1 to n given Hk=j.These will also obey recursions:
€
Qkj
€
P{O,Hk = j) = PkjQk
jThe probability of the observations and a specific hidden state canfound as: