Top Banner
Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004
35

Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Dec 25, 2015

Download

Documents

Harry Wilkinson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Developments of Hidden Markov Models

by

Chandima Karunanayake

30th March, 2004

Page 2: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Developments:

•Estimating the Order (Number of Hidden States) of a Hidden Markov Model

•Application of Decision Tree to HMM

Page 3: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

A Hidden Markov Model consists of

1. A sequence of states {Xt|t T} = {X1,

X2, ... , XT} , and

2. A sequence of observations {Yt |t T} =

{Y1, Y2, ... , YT}

Page 4: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Some basic problems:

from the observations {Y1, Y2, ... , YT}

1. Determine the sequence of states {X1,

X2, ... , XT}.

2. Determine (or estimate) the parameters of the stochastic process that is generating the states and the observations.

Page 5: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Estimating the Order (Number of Hidden States) of a Hidden Markov Model

        Finite mixture models

 

),(1

)( jyfm

jjyF

Finite mixture model takes the form

Page 6: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Example: Poisson mixture model with m=3 components

The density function of Poisson mixture model:)3,(3)2,(2)1,(1)( yfyfyfyF

!

33

3!

22

2!

11

1 y

ey

y

ey

y

ey

Poi (λ1)α1 Poi (λ2)

α2

Poi (λ3)α3

Page 7: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Estimation of the number of components of a finite mixture model

•AIC-Akaike Information Criterion

•BIC-Bayesian Information Criterion

Most commonly used but not justified theoretically

mm dl

2/)(log mm dnl

md -The number of free parameters in the modelm -The number of components n -sample size

ml -log likelihood with m components

Page 8: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Solution

Penalized likelihood methods -Only for finite number of states

•Penalized Minimum distance method (Chen & Kalbfleisch, 1996)

• Consistent estimate of the number of components in a finite mixture model

Page 9: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Chen & Kalbfleisch Idea

The stationary HMMs form a class of finite mixture models with a Markovian property

Penalized Minimum Distance Method to estimate the number of Hidden States in HMM (MacKey, 2002)

+

Page 10: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Penalized Distance

Let { }, be a family of density functions and G( ) be a finite distribution function on . Then the density function of a finite mixture model is

),( xF

),(1

),( jxFk

jjpGxF

The mixing distribution is

)(1

)(

jIk

jjpG

Page 11: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

The Penalized Distance is calculated using following way

k

j jPnCGxFnFdGxFnFD

1log)),(,()),(,(

Distance Measure Penalty term

-Sequence of positive constants Chen & Kalbfleisch used =0.01n-1/2logn where n is number of observations The penalty proposed here penalizes the overfitting of subpopulations which has an estimated probability close to zero and which differs only very slightly.

nC

nC

Page 12: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

n

ix

iXI

nxnF

1)(1)(

The empirical distribution function

Different distance measures ) can be used       The Kolmogorov-Smirnov Distance       The Cramer-Von Mises Distance       The Kullback-Leibler Distance

2,

1( FFd

Page 13: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Application to Multiple Sclerosis Lesion Count Data

Patients afflicted with relapsing –remitting multiple sclerosis (MS) experience lesions on the brain stem, with symptoms typically worsening and improving in a somewhat cyclic fashion. -Reasonable to assume that the distribution of the lesion counts depends on the patient’s underlying disease activity. -The sequence of disease states is hidden. -Three patients, each of whom has monthly MRI scans for a period of 30 months.

Page 14: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Proposed model:

Yit|Zit ~ Poisson (μ0Zit)

Yit – the number of lesions observed on patient i at time t

Zit – the associated disease state (unobserved)

μ0Zit- Distinct Poisson means

Page 15: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Results: Penalized minimum –distances for different numbers of hidden states

Number of states

Estimated Poisson means

Minimum distance

1 4.03 0.1306

2 2.48, 6.25 0.0608

3 2.77, 2.62, 7.10 0.0639

4 2.05, 2.96, 3.53, 7.75

0.0774

5 1.83, 3.21, 3.40, 3.58, 8.35

0.0959

Page 16: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Estimates of the parameters of the hidden process Initial probability matrix

]406.0,594.0[0ˆ

Transition probability matrix

 

0P̂ 0.619 0.3810.558 0.442

Page 17: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

The performance of the penalized minimum distance method

       Number of components        Sample size        Separation of components        Proportion of time in each state 

Page 18: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

1. Application of Decision Tree to HMM Observed data sequence

…. Ot-1 Ot Ot+1 ….

  

 

Viterbi-labeled statesDecision Tree

Output probabilitiesPr(Lj, qt=si)

Lj

Page 19: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

The Simulated Hidden Markov model for the Multiple Sclerosis Lesion Count Data (Laverty et al., 2002)

Transition Probability Matrix

State1 State 2 State 1 State 2

Initial Probability Matrix State1 State 20.594 0.406

Mean Vector State1 State 2

2.48 6.25

0.619 0.3810.558 0.442

Page 20: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Number of lesions Counts

State Number of lesions Counts

State Number of lesions Counts

State

4 2 1 1 3 2

3 2 4 2 4 2

4 2 2 2 7 2

7 2 0 1 0 1

1 1 2 2 5 2

1 1 1 1 3 2

0 1 2 1 4 2

1 1 3 2 6 2

3 1 1 1 4 2

2 1 4 2 1 2

Page 21: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

How this works:Tree construction Greedy Tree Construction Algorithm

Step 0:start with all labeled data Step 1: while stopping condition is unmet do: Step 2: Find best split threshold over all thresholds and dimensions. Step 3: send data to left or right child depending on threshold test. Step 4: recursively repeat steps 1-4 for left and right children. 

Page 22: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

The three rules characterize a tree- growing strategy:

A splitting rule: that determines when the decision threshold is placed, given the data in a node.

A stopping rule: that determines when recursion ends. This is the rule that determines whether a node is a leaf node.

A labeling rule: that assigns some values or class label to every leaf node. For the tree considered here, leaves will be associated (labeled) with the state-conditional output probabilities used in the HMM.

Page 23: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Splitting Rules: Entropy Criterion: The highest info-Gain is used to select the attribute to split.

The entropy of the set S (units are in bits)

Info(T)=

where size of S.Infox(T)=

Gain(X)=Info(T)-Infox(T)

S

SiCfreqm

i S

SiCfreq ),(2

log1

),(

S

)(inf1

||iTo

k

i TiT

Page 24: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

GINI Criterion: The smallest value of GINI Index

is used to select the attribute to split.

GINI criteria for splitting is calculated by the following formula:

where N-the number of observations in the initial node. -the number of observations of wth class, which corresponds to lth nodeNl -the number of observations appropriate to lth new node

L

l

K

w l

wl

N

N

NLG

1 1

2)(11)(

wlN

Page 25: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Decision Tree

Lesion Count Data

State 1 State 2

Count 2 Count > 2

Page 26: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Decision Rule:

If count <= 2 Then Classification=State 1Else Then Classification=State 2

Page 27: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Decision Tree classification of States

Number of lesions Counts

State According to Decision Tree

Classification

Number of lesions Counts

State According to Decision Tree Classification

Number of lesions Counts

State According to Decision Tree Classification

4 2 2 1 1 1 3 2 2

3 2 2 4 2 2 4 2 2

4 2 2 2 2 1 7 2 2

7 2 2 0 1 1 0 1 1

1 1 1 2 2 1 5 2 2

1 1 1 1 1 1 3 2 2

0 1 1 2 1 1 4 2 2

1 1 1 3 2 2 6 2 2

3 1 1 1 1 1 4 2 2

2 1 1 4 2 2 1 2 1

Page 28: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Given the state

The state-conditional probability at time t and state Si

 Pr(Ot|qt=Si)

 

Can estimate the probabilities that a given state emitted a certain observation.

Page 29: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

2. Application of Decision Tree to HMM Observed data sequence

…. Ot-1 Ot Ot+1 ….

  

 

Decision Tree

The Simplest possible model for

the given data

Page 30: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Decision Tree   The splitting criterion can be depending on several things:

•Type of observed data (independent/autoregressive)

    Type of the transition probabilities (balanced/ unbalanced among the states)

      Separation of Components (well separated or close together)

Page 31: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

S

?

S-Well separated C-Close together

Unbalanced

C

ObservedData

Independent Autoregressive

Durbin Watson test

SC S

CSC

Balanced Balanced Unbalanced

??

?

S

Page 32: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Advantages of Decision Tree

•Trees can handle high-dimensional spaces gracefully.

•Because of the hierarchical nature, finding a tree-based output probability given the output is extremely fast.

•Trees can cope with categorical as well as continuous data.

Page 33: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Disadvantages of Decision Tree

•The set of class boundaries is relatively inelegant (rough).

•A decision tree model is non-parametric and has many more free parameters than a parametric model of similar power. Therefore this will require more storage and to obtain good estimates a large amount of training data is required.

Page 34: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Reference:

       Foote, J.T., Decision-Tree Probability modeling for HMM Speech Recognition, Ph.D. Thesis, Division of Engineering, Brown University, RI, USA, 1993. 

       Kantardzic, M, Data mining: concepts, models, methods and algorithms, New York; Chichester, Wiley, c2003        Laverty, W.H., M. J. Miket and I.W. Kelly, Simulation of Hidden Markov models with Excel, The Statistician, Volume 51, Part 1, pp. 31-40, 2002        MacKay, R.J., estimating the order of a Hidden Markov Model, The Canadian Journal of Statistics, Vol. 30, pp.573-589, 2002.

Page 35: Developments of Hidden Markov Models by Chandima Karunanayake 30 th March, 2004.

Thanking you

Prof. M.J. Miket

and my Supervisor

Prof. W. H. Laverty

giving me valuable advice and courage to make this presentation a success.