This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Some definitions Information measures Markov Chains Hidden Markov Models
Outline of theOutline of the StatisticsStatistics partpart
Benos 02-710/MSCBIO2070 1-FEB-2007 3
P(X,Y,Z) = P(X|Y,Z) P(Y|Z) P(Z)
P(X,Y) = P(X) P(Y)
• If P(X|Y) = P(X) then X,Y independent
• Marginal probability:
!
P(X) = P(X,Y ) =Y
" P(X |Y )P(Y )Y
"
• Chain rule:
Some definitionsSome definitions
• Conditional independence:
!
P(X,Y | Z) = P(X | Z) " P(Y | Z)
Benos 02-710/MSCBIO2070 1-FEB-2007 4
• P(X), P(Y): prior probabilities
• P(X|Y), P(Y|X): posterior probabilities
• Posterior probabilities are the compromise betweendata and prior information
!
P(X |Y ) =P(Y | X)P(X)
P(Y )=
P(Y | X)P(X)
P(Y | X)P(X)x
"
BayesBayes’’ RuleRule
Benos 02-710/MSCBIO2070 1-FEB-2007 5
• Problem:
A rare genetic disease is discovered withpopulation frequency one in 1 million. Anextremely good genetic test is 100% sensitive(always correct if you have the disease) and99.99% specific (false positive 0.01%). Willyou be willing to take such a test?
BayesBayes: application: application
Benos 02-710/MSCBIO2070 1-FEB-2007 6
BayesBayes: application (: application (cntdcntd))• What are the chances to have the disease if it
Mutual InformationMutual Information in RNA in RNAstructure predictionstructure prediction
!
I(X,Y ) = fX ,Y (xi,y j ) " logfX ,Y (xi,y j )
fX (xi) " fY (y j )j
#i
#
Benos 02-710/MSCBIO2070 1-FEB-2007 10
• What is a Markov chain?
Markov chain of order n is a stochasticprocess of a series of outcomes, in whichthe probability of outcome x depends onthe state of the previous n outcomes.
Markov chainsMarkov chains
Benos 02-710/MSCBIO2070 1-FEB-2007 11
• Markov chain (of first order) and the Chain Rule
!
P(r x ) = P(X
L,X
L"1,...,X1) =
= P(XL| X
L"1,...,X1)P(XL"1,XL"2,...,X1) =
= P(XL| X
L"1,...,X1)P(XL"1 | XL"2,...,X1)...P(X1) =
= P(XL| X
L"1)P(XL"1 | XL"2)...P(X2 | X1)P(X1) =
= P(X1) P(Xi| X
i"1)i= 2
L
#
Markov chains (Markov chains (cntdcntd))
Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)
Benos 02-710/MSCBIO2070 1-FEB-2007 12
• Problem:
Given two sets of sequences from the human genome,one with CpG islands and one without, can wecalculate a model that can predict the CpG islands?
Application of MarkovApplication of Markov chains:chains:CpG CpG islandsislands• CG is relatively rare in the genome due to high
mutation of methyl-CG to TG• In promoters, CG is usually unmethylated resulting in
high frequency of CG
Benos 02-710/MSCBIO2070 1-FEB-2007 13
P( x2 | x1,- )
P( x2 | x1,+ )
Application of MarkovApplication of Markov chains:chains:CpG CpG islands (islands (cntdcntd))
!
log2P(
r x |+)
P(r x |")
= log2P(x
i+1 | xi,+)
P(xi+1 | xi
,")i=1
L
#
!
log2(P(x2 | x1,+) /P(x2 | x1,"))
Benos 02-710/MSCBIO2070 1-FEB-2007 14
Hidden Markov Models (Hidden Markov Models (HMMsHMMs))• What is a HMM?
A Markov process in which the probability of an outcomedepends also in a (hidden) random variable (state).
• Memory-less: future states affected only by current state
• We need:
Ω : alphabet of symbols (outcomes)
∫ : set of states (hidden), each of which emits symbols
A = (akl) : matrix of state transition probabilities
E = (ek(b)) = (P(xi=b|π=k)) : matrix of emission probabilities
Benos 02-710/MSCBIO2070 1-FEB-2007 15
1: 1/6
2: 1/6
3: 1/6
4: 1/6
5: 1/6
6: 1/6
1: 1/10
2: 1/10
3: 1/10
4: 1/10
5: 1/10
6: 1/2
0.05
0.1
0.95 0.9
Fair
Loaded
Example: the dishonest casinoExample: the dishonest casino
Ω = {1, 2, 3, 4, 5, 6}
∫ = {F, L}
A : aFF=0.95, aLL=0.9,
aFL=0.05, aLF=0.1
E : eF(b)=1/6 (∀ b ∈ Ω),
eL(“6”)=1/2
eL(b)=1/10 (if b≠6)
Benos 02-710/MSCBIO2070 1-FEB-2007 16
Three main questions on HMMsThree main questions on HMMs1. Evaluation
GIVEN HMM M, sequence xFIND P(x | M )ALGOR. Forward
2. DecodingGIVEN HMM M, sequence xFIND the sequence π of states that maximizes P(π | x, M )ALGOR. Viterbi, Forward-Backward
3. LearningGIVEN HMM M, with unknown prob. parameters, sequence xFIND parameters θ = (π, eij, akl) that maximize P(x | θ, M )ALGOR. Maximum likelihood (ML), Baum-Welch (EM)
Benos 02-710/MSCBIO2070 1-FEB-2007 17
Problem 1: EvaluationProblem 1: EvaluationFind the likelihood a given sequence
is generated by a particular model
E.g. Given the following sequence is it more likely that itcomes from a Loaded or a Fair die?
123412316261636461623411221341
Benos 02-710/MSCBIO2070 1-FEB-2007 18
Problem 1: Evaluation (Problem 1: Evaluation (cntdcntd))123412316261636461623411221341
!
P(Data |F1...F30) = aF ,F " eF (bi)
i=1
30
# =
!
= (1/6)30" 0.95
29= 4.52 "10
#24" 0.226 =
=1.02 "10#24
!
P(Data | L1...L30) = aL,L " eL (bi)
i=1
30
# =
!
= (1/2)6" (1/10)
24" 0.90
29=1.56 "10
#26" 0.047 =
= 7.36 "10#28
What happens in a sliding window?
Benos 02-710/MSCBIO2070 1-FEB-2007 19
Problem 2:Problem 2: DecodingDecodingGiven a point xi in a sequence find its
most probable state
E.g. Given the following sequence is it more likely that the3rd observed “6” comes from a Loaded or a Fair die?
123412316261636461623411221341
Benos 02-710/MSCBIO2070 1-FEB-2007 20
The Forward Algorithm -The Forward Algorithm -derivationderivation In order to calculate P(xi) = probability of xi, given
the HMM, we need to sum over all possible waysof generating xi:
To avoid summing over an exponential number ofpaths π, we first define the forward probability:!
P(xi) = P(x
i,") =
"# P(x
i|") $ P(")
"#
!
fk (i) = P(x1...xi," i = k)
Benos 02-710/MSCBIO2070 1-FEB-2007 21
The Forward Algorithm The Forward Algorithm ––derivation (derivation (cntdcntd))
Then, we need to write the fk(i) as a function ofthe previous state, fl(i-1).
!
fk (i) = P(x1,...,xi"1,# i = k)
!
= P(x1,...,x
i"1,#1,...,# i"1,# i= k) $ e
k(x
i)
#1 ,...,# i"1%
!
= P(x1,...,x
i"1,#1,...,# i"2,# i"1 = l) $ al ,k#1 ,...,# i"2
%& ' ( ) * +
l% $ e
k(x
i)
!
= ek (xi) " f l (i #1) " al ,kl
$
!
= P(x1,...,x
i"1,# i"1 = l) $ al ,k
l% $ e
k(x
i)
Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)
Benos 02-710/MSCBIO2070 1-FEB-2007 22
The Forward AlgorithmThe Forward AlgorithmWe can compute fk(i) for all k, i, using dynamic programming
!
f0(0) =1
fk (0) = 0, "k > 0
!
fk (i) = ek (xi) " f l (i #1) " al,kl
$Iteration:
Termination:
!
P(r x ) = fk (N) " ak,0
k#
Initialization:
Benos 02-710/MSCBIO2070 1-FEB-2007 23
The Backward AlgorithmThe Backward Algorithm Forward algorithm determines the most likely
state k at position i, using the previousobservations.
123412316261636461623411221341
What if we started from the end?
Benos 02-710/MSCBIO2070 1-FEB-2007 24
The Backward Algorithm The Backward Algorithm ––derivationderivation
!
bk(i) = P(x
i+1,...,xN |" i= k)
!
= P(xi+1,...,xN ," i+1,...,"N
|"i= k)
"i+1 ,...,"N
#
!
= P(xi+1,...,xN ," i+1 = l,"
i+2,...,"N|"
i= k)
"i+1 ,...,"N
#l
#
!
= bl(i +1) " a
k,l"
l# e
l(x
i+1)
!
= ek(x
i+1) " ak,l "l
# P(xi+2,...,xN ,$ i+2,...,$N
|$i+ i = l)
$i+2 ,...,$N
#
We define the backward probability:
Chain rule: P(A,B,C)=P(C|A,B) P(B|A) P(A)
Benos 02-710/MSCBIO2070 1-FEB-2007 25
The Backward AlgorithmThe Backward AlgorithmWe can compute bk(i) for all k, i, using dynamic programming
Solution: add pseudocounts Larger pseudocounts ⇒ strong prior belief (need a lot of data to change) Smaller pseudocounts ⇒ just smoothing (to avoid zero probabilities)
Benos 02-710/MSCBIO2070 1-FEB-2007 38
Unsupervised learning - MLUnsupervised learning - ML Given x = x1…xN for which the true state path π = π1…πN is unknown
EXPECTATION MAXIMIZATION (EM) in a nutshell0. Initialize the parameters θ of the model M1. Calculate the expected values of Ak,l, Ek(b) based on the training
data and current parameters2. Update θ according to Ak,l, Ek(b) as in supervised learning3. Repeat #1 & #2 until convergence
In HMM training, this is also called the Baum-Welch Algorithm
Benos 02-710/MSCBIO2070 1-FEB-2007 39
The Baum-Welch algorithmThe Baum-Welch algorithm• Initialization: pick arbitrary model parameters• Recurrence:
1. Set A and E to pseudocounts2. Calculate fk(i) and bk(i) for each training sequence j3. Add the contribution of seq j to A and E:
4. Calculate new model parameters ak,l and ek(b)5. Calculate the new (log)likelihood of the model
• Termination: if Δlog-likelihood < threshold or Ntimes>max_times
!
Ak,l = fk (i) " ak,l " el (xi+1) " bl (i +1)i
# /P(r x )
!
Ek (b) = fk (i) " bk (i){ i|xi b}
# /P(r x )
!
P("i= k,"
i+1 = l | x,#)
Benos 02-710/MSCBIO2070 1-FEB-2007 40
The Baum-Welch algorithmThe Baum-Welch algorithm Time complexity:
# iterations x O(K2N)
Guaranteed to increase the log-likelihood P(x | θ)
Not guaranteed to find globally optimal parameters Converges to a local optimum, depending on initial conditions
Too many parameters / too large model ⇒ Overtraining
Benos 02-710/MSCBIO2070 1-FEB-2007 41
AcknowledgementsAcknowledgementsSome of the slides used in this lecture are adapted or
modified slides from lectures of: Serafim Batzoglou, Stanford University Eric Xing, Carnegie-Mellon University
Theory and examples from the following books: T. Koski, “Hidden Markov Models for Bioinformatics”,
2001, Kluwer Academic Publishers R. Durbin, S. Eddy, A. Krogh, G. Mitchison, “Biological
Sequence Analysis”, 1998, Cambridge University Press