Top Banner
Graphical Models of Probability Graphical models use directed or undirected graphs over a set of random variables to explicitly specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated. Bayesian Networks: Directed acyclic graphs that indicate causal structure. Markov Networks: Undirected graphs that capture general dependencies. Middle ware, CCNT, ZJU 06/07/22
33

Hidden markov chain and bayes belief networks doctor consortium

Jan 27, 2015

Download

Education

Yueshen Xu

Hidden markov chain and bayes belief networks doctor consortium.I made it by myself. I hope it's helpful for you.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hidden markov chain and bayes belief networks doctor consortium

Graphical Models of Probability

Graphical models use directed or undirected graphs over a set of random variables to explicitly specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated.

Bayesian Networks: Directed acyclic graphs that indicate causal structure.

Markov Networks: Undirected graphs that capture general dependencies.

Middle ware, CCNT, ZJU04/10/23

Page 2: Hidden markov chain and bayes belief networks doctor consortium

Hidden Markov Model

Zhejiang Univ

CCNT

Yueshen Xu

Middle ware, CCNT, ZJU04/10/23

Page 3: Hidden markov chain and bayes belief networks doctor consortium

Overview

Markov Chain HMM Three Core Problems and Algorithms Application

Middleware, CCNT, ZJU04/10/23

Page 4: Hidden markov chain and bayes belief networks doctor consortium

Markov Chain

Instance

We can regard the weather as three states: state1 : Rain

state2 : Cloudy

state3 : Sun

Tomorrow

Rain Cloudy Sun

Today

Rain 0.4 0.3 0.3

Cloudy 0.2 0.6 0.2

Sun 0.1 0.1 0.8

We can obtain the transition matrix with long term observation

Middleware, CCNT, ZJU04/10/23

Page 5: Hidden markov chain and bayes belief networks doctor consortium

Definition

one-step transition probability

That is to say, the evolvement of the stochastic process only relies on the current state and has nothing to do with those states before. Then we call this Markov property, and the process is regarded as Markov Process

State Space:

Observation Sequence:

Middleware, CCNT, ZJU04/10/23

Page 6: Hidden markov chain and bayes belief networks doctor consortium

Keystone

Middleware, CCNT, ZJU

state transition matrix

其中:

Initial state probability matrix

04/10/23

Page 7: Hidden markov chain and bayes belief networks doctor consortium

HMM

A HMM is a double random process, consisting of two parallel parts: Markov Chain: Describe the transition of the states, which is unobservable, by

means of transition probability matrix. Common stochastic process: Describe the stochastic process of the

observable events

Markov Chain( , A)

Stochastic Process( B)

State Sequence Observation Sequence

q1, q2, ..., qT o1, o2, ..., oT

HMM

Middleware, CCNT, ZJU04/10/23

Unobservable ObservableCoreFeature

Page 8: Hidden markov chain and bayes belief networks doctor consortium

S1 S2 S3

a11 0.3a

b0.80.2

a22 0.4a

b0.30.7

a12 0.5a

b1

0

a23 0.6a

b0.50.5 a13 0.2

a

b0

1

Example:What’s the probability of producing the sequence “abb” for this stochastic process?

Middleware, CCNT, ZJU04/10/23

Page 9: Hidden markov chain and bayes belief networks doctor consortium

S1 S2 S3

a11 0.3a

b0.80.2

a12 0.5a

b1

0

a23 0.6a

b0.50.5 a13 0.2

a

b0

1

S1→S1→S2→S3 0.3*0.8*0.5*1.0*0.6*0.5=0.036

a22 0.4a

b0.30.7

Instance1:

Middleware, CCNT, ZJU04/10/23

Page 10: Hidden markov chain and bayes belief networks doctor consortium

S1 S2 S3

a11 0.3a

b0.80.2

a12 0.5a

b1

0

a23 0.6a

b0.50.5 a13 0.2

a

b0

1

S1→S2→S2→S3 0.5*1.0*0.4*0.3*0.6*0.5=0.018

a22 0.4a

b0.30.7

Instance2:

Middleware, CCNT, ZJU04/10/23

Page 11: Hidden markov chain and bayes belief networks doctor consortium

S1 S2 S3

a11 0.3a

b0.80.2

a12 0.5a

b1

0

a23 0.6a

b0.50.5 a13 0.2

a

b0

1

S1→S1→S1→S3 0.3*0.8*0.3*0.8*0.2*1.0=0.01152

Therefore, the total probability is: 0.036+0.018+0.01152=0.06552

a22 0.4a

b0.30.7

Instance3:

Middleware, CCNT, ZJU

We just know “abb”, but don’t know “S?S?S?”-----That’s the point.

04/10/23

Page 12: Hidden markov chain and bayes belief networks doctor consortium

Description

A HMM can be identified by those parameters below:

N: the number of states

M: the number of observable events for each state

A: the state transition matrix

B: observable event probability

: the initial state probability

Middleware, CCNT, ZJU

We generally record it as ),,( BA

04/10/23

Page 13: Hidden markov chain and bayes belief networks doctor consortium

Three Core Problem

Evaluation: In the case that the observation sequence and the

model have been preseted, then how can we calculate ?

Optimization:Based on question 1, the question is how to choose a special

sequence so that the observation sequence O can be explained reasonably?

TrainingBased on question 1, here is how to adjust parameters of the

model to maximize ?

Middleware, CCNT, ZJU

TOOOO 21,),,( BA

)|( OP

TqqqS 21

),,( BA )|( OP

We know O, but don’t know Q

04/10/23

Page 14: Hidden markov chain and bayes belief networks doctor consortium

Solution

There is no need to expound those algorithms, since we should pay attention to the application context.

Evaluation——Dynamic Programming Forward Backward Optimization——Greedy Viterbi Training——Iterative Baum-Welch & Maximum Likelihood Estimation

You can think over and deduce these methods after the workshop.

Middleware, CCNT, ZJU04/10/23

Page 15: Hidden markov chain and bayes belief networks doctor consortium

Application Context

Just think over it : The feature of HMM Which kind of problem can it describe and model?

Two stochastic sequence One relies on another or two is related. One can be “seen”, but another can not Just think about the Three Core Problem ……

I think we can make a conclusion , just as: Use One sequence to deduce and predict another or Find Out Who is Behind

““Iceberg” Iceberg” ProblemProblem

Middleware, CCNT, ZJU04/10/23

Page 16: Hidden markov chain and bayes belief networks doctor consortium

Application Context(1):Voice Recognition

Statistical DescriptionI. The characteristic pattern of voice, from sampling more often:

T =t1,t2,…, tn

II. The word sequence W(n): W1,W2,...,Wn

III. Therefore, what we concern about is P( W(n)|T )

Middleware, CCNT, ZJU

Formalization DescriptionWhat we have to solve is :

k = arg max{ P( W(n)|T ) }

n

04/10/23

Page 17: Hidden markov chain and bayes belief networks doctor consortium

Application Context(1):Voice Recognition

Middleware, CCNT, ZJU

Baum-WelchRe-estimation

Speechdatabase

FeatureExtraction

Converged?

1

2

7

HMM

waveform feature

Yes

No

end

Recognition FrameworkRecognition Framework

04/10/23

Page 18: Hidden markov chain and bayes belief networks doctor consortium

Application Context(2):Text Information Extraction

Figure out the HMM Model : Q1:What ‘s the state and what’s the observation event? Q2:How to figure out those parameters, just like aij?

Middleware, CCNT, ZJU

),,( BA

state : what you want to extract observation event : text block or each word etc

Through Training Samples

04/10/23

Page 19: Hidden markov chain and bayes belief networks doctor consortium

Application Context(2):Text Information Extraction

Middleware, CCNT, ZJU

Partition-ing

State List

Extracted Sequence

Document Partitioni-ng

Training Sample

HMM

Extraction FrameworkExtraction Frameworkcountry, state , city, street

title, author, email, abstract

04/10/23

Page 20: Hidden markov chain and bayes belief networks doctor consortium

Application Context(3):Other Fields:

Face Recognition POS tagging Web Data Extraction Bioinformatics Network intrusion detection Handwriting recognition Document Categorization Multiple Sequence Alignment …

Middleware, CCNT, ZJU

Which field are you interested in ?

Which field are you interested in ?

04/10/23

Page 21: Hidden markov chain and bayes belief networks doctor consortium

Middleware, CCNT, ZJU04/10/23

Page 22: Hidden markov chain and bayes belief networks doctor consortium

Bayes Belief Network

Yueshen Xu, too

Middle ware, CCNT, ZJU04/10/23

Page 23: Hidden markov chain and bayes belief networks doctor consortium

Overview

Bayes Theorem Naïve Bayes Theorem Bayes Belief Network Application

Middleware, CCNT, ZJU04/10/23

Page 24: Hidden markov chain and bayes belief networks doctor consortium

Bayes Theorem

Basic Bayes FormulaBasic of basis, but vital.

)(

)()|()|(

AP

BPBAPABP ii

i

niBPBAP

BPBAPABP n

jjj

iii ,...,2,1,

)()|(

)()|()|(

1

prior probabilityposterior probability

complete probability formula

Middleware, CCNT, ZJU

Condition Condition InversionInversion

04/10/23

Page 25: Hidden markov chain and bayes belief networks doctor consortium

The naive Bayes theorem is a simple probabilistic theorem based on applying Bayes theorem with strong independence  assumptions

Naïve Bayes Theorem

Middleware, CCNT, ZJU

),,,,|(),|()|()(

),,,(

121121

21

nn

n

FFFCFPFCFPCFPCP

FFFCP

Chain RuleChain Rule

Conditional IndependenceConditional Independence

)|(),|( CFPFCFP iji

n

iin CFPCPFFFCP

121 )|()(),,,(

CC

F1F1 F2

F2… Fn

Fn

Naïve Bayes is a simple Bayes Net

04/10/23

Page 26: Hidden markov chain and bayes belief networks doctor consortium

Bayes Belief Network:Graph Structure

Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences

Middleware, CCNT, ZJU

BurglaryBurglary EarthquakeEarthquake

AlarmAlarm

JohnCallsJohnCalls MaryCallsMaryCalls

RV

parents

descendant

relationship

04/10/23

Page 27: Hidden markov chain and bayes belief networks doctor consortium

Bayes Belief Network:Conditional Probability Table

Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents.

Roots (sources) of the DAG that have no parents are given prior probabilities.

Middleware, CCNT, ZJU

BurglaryBurglary EarthquakeEarthquake

AlarmAlarm

JohnCallsJohnCalls MaryCallsMaryCalls

P(B)

.001

P(E)

.002

B E P(A)

T T .95

T F .94

F T .29

F F .001

A P(M)

T .70

F .01

A P(J)

T .90

F .05

04/10/23

Page 28: Hidden markov chain and bayes belief networks doctor consortium

Bayes Belief Network:Joint Distributions

A Bayesian Network implicitly defines a joint distribution.

))(Parents|(),...,(1

21 i

n

iin XxPxxxP

Example

)( EBAMJP

)()()|()|()|( EPBPEBAPAMPAJP

00062.0998.0999.0001.07.09.0 Therefore an inefficient approach to inference is:

– 1) Compute the joint distribution using this equation.– 2) Compute any desired conditional probability using the joint

distribution.

Middleware, CCNT, ZJU

Conditional Independence

04/10/23

Page 29: Hidden markov chain and bayes belief networks doctor consortium

Conditional Independence &D-separation

D-separation− Let X,Y and Z be three sets of node

− If X and Y are d-separation by Z then X and Y are conditional independent given Z

D-separation− A is d-separation from B given C if

every undirected path between them is blocked

Path blocking− Three cases that expand on three

basic independence structures.

Middleware, CCNT, ZJU04/10/23

Page 30: Hidden markov chain and bayes belief networks doctor consortium

Application:Simple Document Classification(1)

Step1: Assume for the moment that there are only two mutually exclusive classes, S and ¬S (eg, spam and not spam), such that every element(email) is in either one or the other, that is to say:

Step2: what we concern about is :

Middleware, CCNT, ZJU

ii

ii

SPSDP

and

SPSDP

)|()|(

)|()|(

ii

ii

SPDP

SPDSP

and

SPDP

SPDSP

)|()(

)()|(

)|()(

)()|(

04/10/23

Page 31: Hidden markov chain and bayes belief networks doctor consortium

Application:Simple Document Classification(2)

Step3: Dividing one by the other gives, and the be re-factored .

Step4: Taking the logarithm of all these ratios for decreasing calculated quantity:

i i

i

i i

i i

SP

SP

SP

SP

DSP

DSP

SPSP

SPSP

DSP

DSP

)|(

)|(

)(

)(

)|(

)|(

)|()(

)|()(

)|(

)|(

i i

i

SP

SP

SP

SP

DSP

DSP

)|(

)|(ln

)(

)(ln

)|(

)|(ln

>0 or

<0

Known Sample

Training

Middleware, CCNT, ZJU04/10/23

Page 32: Hidden markov chain and bayes belief networks doctor consortium

Application:Overall

Medical diagnosis Pathfinder system outperforms leading experts in diagnosis of lymph-node

disease.

Microsoft applications Problem diagnosis: printer problems Recognizing user intents for HCI

Text categorization and spam filtering Student modeling for intelligent tutoring systems. Biochemical Data Analysis Predicting mutagenicity

So many…

Which field are you interested in ?

Which field are you interested in ?

Middleware, CCNT, ZJU04/10/23

Page 33: Hidden markov chain and bayes belief networks doctor consortium

Middleware, CCNT, ZJU04/10/23