Graphical Models of Probability Graphical models use directed or undirected graphs over a set of random variables to explicitly specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated. Bayesian Networks: Directed acyclic graphs that indicate causal structure. Markov Networks: Undirected graphs that capture general dependencies. Middle ware, CCNT, ZJU 06/07/22
33
Embed
Hidden markov chain and bayes belief networks doctor consortium
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Graphical Models of Probability
Graphical models use directed or undirected graphs over a set of random variables to explicitly specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated.
Bayesian Networks: Directed acyclic graphs that indicate causal structure.
Markov Networks: Undirected graphs that capture general dependencies.
Middle ware, CCNT, ZJU04/10/23
Hidden Markov Model
Zhejiang Univ
CCNT
Yueshen Xu
Middle ware, CCNT, ZJU04/10/23
Overview
Markov Chain HMM Three Core Problems and Algorithms Application
Middleware, CCNT, ZJU04/10/23
Markov Chain
Instance
We can regard the weather as three states: state1 : Rain
state2 : Cloudy
state3 : Sun
Tomorrow
Rain Cloudy Sun
Today
Rain 0.4 0.3 0.3
Cloudy 0.2 0.6 0.2
Sun 0.1 0.1 0.8
We can obtain the transition matrix with long term observation
Middleware, CCNT, ZJU04/10/23
Definition
one-step transition probability
That is to say, the evolvement of the stochastic process only relies on the current state and has nothing to do with those states before. Then we call this Markov property, and the process is regarded as Markov Process
State Space:
Observation Sequence:
Middleware, CCNT, ZJU04/10/23
Keystone
Middleware, CCNT, ZJU
state transition matrix
其中:
Initial state probability matrix
04/10/23
HMM
A HMM is a double random process, consisting of two parallel parts: Markov Chain: Describe the transition of the states, which is unobservable, by
means of transition probability matrix. Common stochastic process: Describe the stochastic process of the
observable events
Markov Chain( , A)
Stochastic Process( B)
State Sequence Observation Sequence
q1, q2, ..., qT o1, o2, ..., oT
HMM
Middleware, CCNT, ZJU04/10/23
Unobservable ObservableCoreFeature
S1 S2 S3
a11 0.3a
b0.80.2
a22 0.4a
b0.30.7
a12 0.5a
b1
0
a23 0.6a
b0.50.5 a13 0.2
a
b0
1
Example:What’s the probability of producing the sequence “abb” for this stochastic process?
Middleware, CCNT, ZJU04/10/23
S1 S2 S3
a11 0.3a
b0.80.2
a12 0.5a
b1
0
a23 0.6a
b0.50.5 a13 0.2
a
b0
1
S1→S1→S2→S3 0.3*0.8*0.5*1.0*0.6*0.5=0.036
a22 0.4a
b0.30.7
Instance1:
Middleware, CCNT, ZJU04/10/23
S1 S2 S3
a11 0.3a
b0.80.2
a12 0.5a
b1
0
a23 0.6a
b0.50.5 a13 0.2
a
b0
1
S1→S2→S2→S3 0.5*1.0*0.4*0.3*0.6*0.5=0.018
a22 0.4a
b0.30.7
Instance2:
Middleware, CCNT, ZJU04/10/23
S1 S2 S3
a11 0.3a
b0.80.2
a12 0.5a
b1
0
a23 0.6a
b0.50.5 a13 0.2
a
b0
1
S1→S1→S1→S3 0.3*0.8*0.3*0.8*0.2*1.0=0.01152
Therefore, the total probability is: 0.036+0.018+0.01152=0.06552
a22 0.4a
b0.30.7
Instance3:
Middleware, CCNT, ZJU
We just know “abb”, but don’t know “S?S?S?”-----That’s the point.
04/10/23
Description
A HMM can be identified by those parameters below:
N: the number of states
M: the number of observable events for each state
A: the state transition matrix
B: observable event probability
: the initial state probability
Middleware, CCNT, ZJU
We generally record it as ),,( BA
04/10/23
Three Core Problem
Evaluation: In the case that the observation sequence and the
model have been preseted, then how can we calculate ?
Optimization:Based on question 1, the question is how to choose a special
sequence so that the observation sequence O can be explained reasonably?
TrainingBased on question 1, here is how to adjust parameters of the
model to maximize ?
Middleware, CCNT, ZJU
TOOOO 21,),,( BA
)|( OP
TqqqS 21
),,( BA )|( OP
We know O, but don’t know Q
04/10/23
Solution
There is no need to expound those algorithms, since we should pay attention to the application context.
The naive Bayes theorem is a simple probabilistic theorem based on applying Bayes theorem with strong independence assumptions
Naïve Bayes Theorem
Middleware, CCNT, ZJU
),,,,|(),|()|()(
),,,(
121121
21
nn
n
FFFCFPFCFPCFPCP
FFFCP
Chain RuleChain Rule
Conditional IndependenceConditional Independence
)|(),|( CFPFCFP iji
n
iin CFPCPFFFCP
121 )|()(),,,(
CC
F1F1 F2
F2… Fn
Fn
Naïve Bayes is a simple Bayes Net
04/10/23
Bayes Belief Network:Graph Structure
Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences
Middleware, CCNT, ZJU
BurglaryBurglary EarthquakeEarthquake
AlarmAlarm
JohnCallsJohnCalls MaryCallsMaryCalls
RV
parents
descendant
relationship
04/10/23
Bayes Belief Network:Conditional Probability Table
Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents.
Roots (sources) of the DAG that have no parents are given prior probabilities.
Middleware, CCNT, ZJU
BurglaryBurglary EarthquakeEarthquake
AlarmAlarm
JohnCallsJohnCalls MaryCallsMaryCalls
P(B)
.001
P(E)
.002
B E P(A)
T T .95
T F .94
F T .29
F F .001
A P(M)
T .70
F .01
A P(J)
T .90
F .05
04/10/23
Bayes Belief Network:Joint Distributions
A Bayesian Network implicitly defines a joint distribution.
))(Parents|(),...,(1
21 i
n
iin XxPxxxP
Example
)( EBAMJP
)()()|()|()|( EPBPEBAPAMPAJP
00062.0998.0999.0001.07.09.0 Therefore an inefficient approach to inference is:
– 1) Compute the joint distribution using this equation.– 2) Compute any desired conditional probability using the joint
distribution.
Middleware, CCNT, ZJU
Conditional Independence
04/10/23
Conditional Independence &D-separation
D-separation− Let X,Y and Z be three sets of node
− If X and Y are d-separation by Z then X and Y are conditional independent given Z
D-separation− A is d-separation from B given C if
every undirected path between them is blocked
Path blocking− Three cases that expand on three
basic independence structures.
Middleware, CCNT, ZJU04/10/23
Application:Simple Document Classification(1)
Step1: Assume for the moment that there are only two mutually exclusive classes, S and ¬S (eg, spam and not spam), such that every element(email) is in either one or the other, that is to say:
Step2: what we concern about is :
Middleware, CCNT, ZJU
ii
ii
SPSDP
and
SPSDP
)|()|(
)|()|(
ii
ii
SPDP
SPDSP
and
SPDP
SPDSP
)|()(
)()|(
)|()(
)()|(
04/10/23
Application:Simple Document Classification(2)
Step3: Dividing one by the other gives, and the be re-factored .
Step4: Taking the logarithm of all these ratios for decreasing calculated quantity:
i i
i
i i
i i
SP
SP
SP
SP
DSP
DSP
SPSP
SPSP
DSP
DSP
)|(
)|(
)(
)(
)|(
)|(
)|()(
)|()(
)|(
)|(
i i
i
SP
SP
SP
SP
DSP
DSP
)|(
)|(ln
)(
)(ln
)|(
)|(ln
>0 or
<0
Known Sample
Training
Middleware, CCNT, ZJU04/10/23
Application:Overall
Medical diagnosis Pathfinder system outperforms leading experts in diagnosis of lymph-node
disease.
Microsoft applications Problem diagnosis: printer problems Recognizing user intents for HCI
Text categorization and spam filtering Student modeling for intelligent tutoring systems. Biochemical Data Analysis Predicting mutagenicity