Top Banner
Hidden Markov Model for Sequential Data Dr.-Ing. Michelle Karg [email protected] Electrical and Computer Engineering Cheriton School of Computer Science
32

Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Jun 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Hidden Markov Model for Sequential Data

Dr.-Ing. Michelle Karg

[email protected]

Electrical and Computer Engineering Cheriton School of Computer Science

Page 2: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Sequential Data

• Measurement of time series:

• Others:

– Characters in a Sentence – Nucleotide base pairs along a strand of DNA sequence

Example: Speech data [1]

Example: Motion data

Page 3: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Sequential Data

• Characteristic: – Dependence on previous observations More recent observations likely to be more relevant – Stationary versus nonstationary sequential distributions Stationary: generative distribution not evolving with time

• Tasks:

– Predict next value in a time series – Classify time series

)|(,...),|( 121 jtitktjtit SqSqPSqSqSqP ====== −−−

Markov property

Page 4: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Methods

Deterministic Models: • Frequency analysis • Statistical Features: (e.g., mean) + classification • Dynamic time warping Probabilistic Models: • Hidden Markov Models

Page 5: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Frequency Analysis

• Fourier transform – Amplitude of frequency

• Pro: – Visualization

• Disadvantage:

– No information about previous state

Example: speech data [1]

Page 6: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Statistical Features

• Transformation of time series into a set of features → Conventional classification • Example: Emotion recognition in gait [2] Step length, time, velocity: 84 % (NN) Min, mean, max: 93 % (Naive Bayes)

Example [2]

Time series [2]

Page 7: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Statistical Features

• Questions: – Which descriptors to calculate?

• Feature Selection – Window size?

Minimum

Maximum

Median

Page 8: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Statistical Features

• Questions: – Which descriptors to calculate?

• Feature Selection – Window size?

• Pro:

– Simple approach, fast

• Disadvantage: – Could be easily distorted by noise

Minimum

Maximum

Median

Page 9: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Dynamic Time Warping

• Similarity measure between two sequences: Spatio-temporal correspondence • Minimize error between sequence and reference:

Illustration [3]

Aligned points

Page 10: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Dynamic Time Warping

• Computation: 1. Local cost measure – Distance measure (e.g., Euclidean, Manhattan) – Sampled at equidistant points in time

Cost matrix C for time series X and Y [6]

Page 11: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Dynamic Time Warping

• Computation: 1. Local cost measure – Distance measure (e.g., Euclidean, Manhattan) – Sampled at equidistant points in time

Cost matrix C for time series X and Y [6]

Low cost

Page 12: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Dynamic Time Warping

• Computation: 1. Local cost measure – Distance measure (e.g., Euclidean, Manhattan) – Sampled at equidistant points in time

Cost matrix C for time series X and Y [6]

Low cost

Optimal path?

Page 13: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Dynamic Time Warping

• 2. Find optimal warping path: – Boundary condition: and – Monotonicity condition: and – Step size condition

Which figure fulfills all conditions? [6]

Lnnn ≤≤≤ ...21 Lmmm ≤≤≤ ...21

)1,1(1 =p ),( MNpL =

Page 14: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Dynamic Time Warping

• Result: Optimal warping path

• Accumulated cost matrix D:

Cost matrix C [6] Accumulated cost matrix D [6]

),()}1,(),,1(),1,1(min{),( mncmnDmnDmnDmnD +−−−−=

Page 15: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Dynamic Time Warping

• Pro: – Very accurate – Cope with different speeds – Can be used for generation

Generation: Morphing [3]

• Disadvantages: – Alignment of segments? (e.g., different length) – Computationally intensive – Usually applied to low-

dimensional data (1-dim.)

Page 16: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Methods

Deterministic Models: • Frequency analysis • Statistical Features: (e.g., mean) + classification • Dynamic time warping Probabilistic Models: • Hidden Markov Models

Page 17: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Hidden Markov Model

• Sequence of hidden states • Observations in each state • Markov property • Parameters: Transition matrix, observation, prior [5] “A Tutorial on HMM and Selected Applications in Speech

Recognition”

Concept of HMM [4]

Page 18: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Hidden Markov Model

• Topology of transition matrix • Model for the observations • Methodology (3 basic problems) • Implementation Issues

Page 19: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Topology of Transition Matrix A

• Markov Chain: Considering the previous state ! • Transition matrix A:

– Transitions of the hidden states • Topologies:

– Ergodic or fully connected – Left-right or Bakis model (cyclic, noncyclic) – Note: the more “0”, the faster computation!

• What happens if .... – All entries of A are equal? – All entries in a row/column are zero except for diagonal?

1

5

2

3

4

Topology of a Markov Chain

10 ≤≤ ija

Page 20: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Example for Markov Chain

• Given 3 states and A – State 1: rain or snow, state 2: cloudy, state 3: sunny

• Questions:

– If the sun shines now, what is the most probable weather for tomorrow?

– What is the probability that the weather for the next 7 days will be: “ sun – rain – rain – sun – cloudy – sun”

Given, that the sun shines today;

=

8.01.01.02.06.02.03.03.04.0

A

Page 21: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

• Markov Chain: States are observable • HMM: states are not observable, only the observations

• Observations are either – Discrete, e.g., icy - cold – warm – Continuous, e.g., temperature

Hidden Markov Model

Comparison of Markov Chain and HMM [4]

Page 22: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

HMM – Discrete Observations

• A number of M distinct observation symbols per state: → Vector quantization of continuous data • Observation Matrix B

j

1 2 3

i

1 2 3

Page 23: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Continuous Density HMM

• Example: Identification using gait [7] – Extract silhouette from video

– FED vector for observation: • 5 stances: en

• Distance

• Distance: Gaussian distributed

Width vector profile during gait steps [7] Gait as biometric [7]

FED vector components during a step [7]

)),(()( nn etxdtfed =

Page 24: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Design of an HMM

• An HMM is characterized by – The number of states: N – The number of distinct observation symbols M (only discrete !) – The state transition probabilities: A – The observation probability distributions – The initial state distribution π

• A model is described by the parameter set λ

),,( πλ BA=

Page 25: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

3 Basic Problems

1. Learning: Given:

– Number of states N – The number of observations M – Structure of the model – Set of training observations

How to estimate the probability matrices A and B?

Solution: Baum-Welch algorithm (It can result in local maxima and the results depend on the initial

estimates of A and B)

Application: Required for any HMM

Page 26: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Similarity Measure for HMMS

• Kullback-Leibler divergence • Example: Movement imitation in robotics

– Encode observed behavior as HMM – Calculate Kullback-Leibler divergence:

• Existing or new behavior? – Build tree of human motions

Why not a metric?

General Concept [8]

Clustering human movement [8]

Page 27: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

3 Basic Problems

2. Evaluation: Given:

– Trained HMM – Observation sequence

What is the conditional probability P(V| λ) that the observation sequence V is generated by the model λ?

Solution: Forward-backward algorithm (Straight-forward calculation of P(V| λ) would be too

computationally intensive)

Application: Classification of time series

),,( πλ BA=)](),....,2(),1([ TvvvV =

Page 28: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Classification of Time Series

• Examples: Happy versus neutral gait • Concept: An HMM is trained for each class c

and

• Calculation of the probabilities P(V| λ) for sequence V and • Comparison:

),,( πλ hhh BA= ),,( πλ neuneuneu BA= Baum-Welch

)|( hh VP λ )|( neuneu VP λ Forward-Backward

)|()|(?

neuneuhh VPVP λλ >

Concept of HMM [4]

Page 29: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

3 Basic Problems

3. Decoding: Given:

– Trained HMM – Observation sequence

What is the most likely sequence of hidden states?

Solution: Viterbi algorithm

Application: Activity recognition

),,( πλ BA=)](),....,2(),1([ TvvvV =

Page 30: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Implementation Issues

• Scaling – Rescale of forward and backward variables to avoid that

computed variables exceed the precision range of machines • Multiple observation sequences

– Training • Initial estimates of the HMM parameters

– Random or uniform of π and A is adequate – Observation distributions: good initial estimate crucial

• Choice of the model – Topology of Markov Chain – Observation: discrete or continuous?

Long or short observations?

Page 31: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

Implementation

An HMM can be used to • Estimate a state sequence • Classify sequential data • Predict next value • Build a generative model (e.g.,

application in robotics for motion imitation)

Real-world issues: • Incomplete sequences • Data differing in length

Data preprocessing

Filtering

Segmentation

HMM

Feature Selection

Page 32: Hidden Markov Model - Western Universitydlizotte/teaching/cs886_slides/MKarg_hmm_slides.pdf[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,

References

[1] C. Bishop, Pattern Recognition and Machine Learning, Springer, 2009 [2] M. Karg, K. Kuehnlenz, M. Buss, Recognition of Affect Based on Gait Patterns, IEEE

Transactions on SMC – Part B, 2010, 40(4), p.1050-1061 [3] W. Ilg, G. Bakur, J. Mezger, M.A. Giese, On the Representation, Learning, and

Transfer of Spatio-Temporal Movement Characteristics, International Journal of Humanoid Robotics, 2004, 1(4), 613-636.

[4] M. Karg, Pattern Recognition Algorithms for Gait Analysis with Application to Affective Computing, Doctorate thesis, Technical University of Munich, 2012.

[5] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, 1989,77(2), 257-286.

[6] M. Muller, Information Retrieval for Music and Motion, Ch. 4, Springer, 2007. [7] A.Kale et al., Identification of Humans Using Gait, IEEE Transactions on Image

Processing, 2004, 13(9), p.1163-1173 [8] D. Kulic, W. Takano and Y. Nakamura, Incremental Learning, Clustering and Hierarchy

Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains, The International Journal of Robotics Research, 2008, 27,761-784.