HiddenMarkovModel FINAL

Hidden Markov Models (HMMs)

Dhiraj

DSG-MVL

Future is independent of the past given the present

Used to model extraordinary large number of applications using temporal data or

sequence of data eg weather, financial

Language, music it deals with how world is

evolving over time

Andrei Andreyevich Markov

(1856 -1922)

MARKOV CHAINS

3

Markov Chain :

Auto Insurance Example

4


5

Generics

6

Markov Chain :


7

Markov Chain :


8

Markov Chain :


9

Markov Chain : Auto Insurance Example

Power of Markov chain, it will allow us to travel

in future many many steps

10

Markov Chain : Auto Insurance Example

11

Markov Chain : Free Throw confidence

12


13


14


15


16

Markov Chain : Free Throw confidence Transitions

17

Markov Chain : Free Throw confidence Transitions

18

Markov Chain : Transition Matrix

19

TRANSITION DIAGRAM: EXAMPLE 1

20


21


Relative Probability

22

MARKOV CHAIN

Transient/ephemeral

Recurrent

Absorbing

Current

States

States Going to

25

System Behavior

Initial State representing Initial Vector

Arriving

Playing on

Phone

Paying

Attention

After One

time Unit

After n time

Units

26

System Behavior

After Two

time Unit

Playing on

Phone

Paying

Attention

Writing

Notes

Kicked

Out

27

System Behavior

After 100

time Unit

Playing on

Phone

Paying

Attention Writing

Notes

Listening Kicked

Out

Arriving

28

Markov Model A Markov model is a type of stochastic

process, sometimes referred to as a

chain

The model is similar to the FSM except that it is executed by probabilistic

moves rather than deterministic moves

It is nondeterministic, where the FSM is deterministic

Markov Models

A discrete (finite) system:

N distinct states.

Begins (at time t=1) in some initial state(s).

At each time step (t=1,2,) the system moves

from current to next state according to transition

probabilities associated with current state.

This kind of system is called a finite, or discrete

Markov model

30

Set of states:

Process moves from one state to another generating a

sequence of states :

Markov chain property: probability of each subsequent state

depends only on what was the previous state:

To define Markov model, the following probabilities have to be

specified: transition probabilities and initial

probabilities

The output of the process is the set of states at each instant of

time

Markov Models

},,,{ 21 Nsss

,,,, 21 ikii sss

)|(),,,|( 1121 ikikikiiik ssPssssP

)|( jiij ssPa )( ii sP

31

Markov Property

Markov Property: The state of the system at time t+1

depends only on the state of the system at time t

Xt=1 Xt=2 Xt=3 Xt=4 Xt=5

] x X | x P[X

] x X , x X , . . . , x X , x X | x P[X

tt11t

00111-t1-ttt11t

t

t

32

A Markov System

s1 s3

s2

Has N states, called s1, s2 .. sN

There are discrete timesteps,

t=0, t=1,

N = 3

t=0

A Markov System

s1 s3

s2



t=0, t=1,

On the tth timestep the system is in exactly one of the available

states. Call it qt

Note: qt {s1, s2 .. sN }

N = 3

t=0

qt=q0=s3

Current State

A Markov System

s1 s3

s2



t=0, t=1,


states. Call it qt


Between each timestep, the next

state is chosen randomly.

N = 3

t=1

qt=q1=s2

Current State

A Markov System

s1 s3

s2



t=0, t=1,


states. Call it qt




The current state determines the

probability distribution for the

next state.

N = 3

t=1

qt=q1=s2

P(qt+1=s1|qt=s3) = 1/3

P(qt+1=s2|qt=s3) = 2/3

P(qt+1=s3|qt=s3) = 0

P(qt+1=s1|qt=s1) = 0

P(qt+1=s2|qt=s1) = 0

P(qt+1=s3|qt=s1) = 1

P(qt+1=s1|qt=s2) = 1/2

P(qt+1=s2|qt=s2) = 1/2

P(qt+1=s3|qt=s2) = 0

1/2

1/2

1/3

2/3

1

Often notated with arcs

between states

Markov Property

s1 s3

s2 qt+1 is conditionally independent

of { qt-1, qt-2, q1, q0 } given qt.

In other words:

P(qt+1 = sj |qt = si ) =

P(qt+1 = sj |qt = si ,any earlier history) N = 3

t=1

qt=q1=s2

P(qt+1=s1|qt=s3) = 1/3

P(qt+1=s2|qt=s3) = 2/3

P(qt+1=s3|qt=s3) = 0

P(qt+1=s1|qt=s1) = 0

P(qt+1=s2|qt=s1) = 0

P(qt+1=s3|qt=s1) = 1

P(qt+1=s1|qt=s2) = 1/2

P(qt+1=s2|qt=s2) = 1/2

P(qt+1=s3|qt=s2) = 0

1/2

1/2

1/3

2/3

1

Hidden Markov Models

(probabilistic finite state automata)

Often we face scenarios where states cannot be

directly observed.

We need an extension: Hidden Markov Models

a11 a22 a33 a44

a12 a23 a34

b11 b14 b12

b13

1 2 3

4

Observed

phenomenon

aij are state transition probabilities.

bik are observation (output) probabilities. b11 + b12 + b13 + b14 = 1,

b21 + b22 + b23 + b24 = 1, etc.

Hidden Markov Models - HMM

H1 H2 HL-1 HL

X1 X2 XL-1 XL

Hi

Xi

Hidden variables

Observed data

Definition of Hidden Markov Model The Hidden Markov Model (HMM) is a finite set of states,

each of which is associated with a probability

distribution. A Hidden Markov model is a statistical model in which the system

being modelled is assumed to be markov process with unobserved

hidden states.

Transitions among the states are governed by a set

of probabilities called transition probabilities.

In a particular state an outcome or observation

can be generated, according to the associated

probability distribution.

It is only the outcome, not the state visible to an external

observer and therefore states are ``hidden'' from the

observer; hence the name Hidden Markov Model.

Hidden Markov Models A Hidden Markov model is a statistical model

in which the system being modelled is

assumed to be markov process with

unobserved hidden states.

In Regular Markov models the state is clearly visible to others in which the state transition

probabilities are the parameters only where

as in HMM the state is not visible but the

output is visible.

Hidden Markov Model Consider a discrete-time Markov Process

Consider a system that may be described at any time as being in one of a set of N distinct states

At regularly spaced, discrete times, the system undergoes a change of state according to a set of

probabilities associated with the state

We denote the time instance associated with state changes as t = 1,2, and actual state at time t as

Essentials To define hidden Markov model, the following

probabilities have to be specified: matrix of

transition probabilities A=(aij), aij= P(si | sj) ,

matrix of observation probabilities

B=(bi (vm )), bi(vm ) = P(vm | si) and a vector of

initial probabilities =(i), i = P(si) . Model is

represented by M=(A, B, ).

Hidden Markov Model

45

Discrete Markov Model: Example

Discrete Markov Model

with 5 states.

Each aij represents the

probability of moving

from state i to state j

The aij are given in a

matrix A = {aij}

The probability to start

in a given state i is i , The vector repre-

sents these start

probabilities.

46

Overview

Mathematical notation

Example : flow chart

Mathematic Notation

To obtain the conditional probability of achieving a

particular state based on previous state

X1,X2,.Xn where X1 represents variable at time 1

P[Xn+1 = j | Xn = i ] = P(i,j)

What is the probability that given the system is in state i and it

will move in state j

47

Mathematical Notation

(0,0) (0,1) (0,2)

(1,0) (1,1) (1,2)

(2,0) (2,1) (2,2)

P P P

P P P

P P P

P(0,0): Probability of moving from state 0

to state 0

Probability Matrix

0

1

2

0.3

0.4

1.0

0.5

0.3

0.5

0.5 0.5 0

0 0 1.0

0.3 0.4 0.3

48

Example: Orange Juice

Assumption: A Family of four buy orange juice once a weak

A = Someone using Brand A A = Someone using other Brand

Transition Diagram

A A Next State

P = A

A

Current State

Transition Probability Matrix

A A

So = Initial State Distribution Matrix

A A 0.9

0.1

0.7

0.3

0.9 0.1

0.7 0.3

0.2 0.8

49

Example: Orange Juice

Start 0.2

0.8

A

A

0.9

0.1 0.7

0.3

A

A A

A

To find probability that someone

uses Brand A after one week

P(Brand A after 1 wk) =

(0.2) (0.9) + (0.8) (0.7) = .74

Initial State Distribution Matrix

0.74 0.26

0.2 0.8So =

S1 =

A A

50

Markov Model

51

Markov Model

52

Markov Model

53

Hidden Markov Model Markov model is a process in which each state

corresponds to a deterministically observable event

and hence the output of any given state is not

random

We extend the concept of Markov Models to include

the case in which the observation is a probabilistic

function of the state

That is the resulting model is a doubly embedded

stochastic process with an underlying stochastic

process that is not directly observable(hidden) but

can be observed only through another set of

stochastic process that produce the sequence of

observations 54

HMM Components

A set of states (xs)

A set of possible output symbols (ys)

A state transition matrix (as) probability of making transition from

one state to the next

Output emission matrix (bs) probability of a emitting/observing a

symbol at a particular state

Initial probability vector probability of starting at a particular

state

Not shown, sometimes assumed to be 1

COIN-TOSS MODEL

59

COIN-TOSS MODEL (contd..)

60


61


62

Weather Example Revisited

63

64


65


PROBLEM

66

Solution

67

Solution contd.

68

Solution contd.

69

States

Observation

70

Evaluation problem. Given the HMM M=(A, B, )

and the observation sequence O=o1 o2 ... oK ,

calculate the probability that model M has generated

sequence O .

Decoding problem. Given the HMM M=(A, B, ) and the observation sequence O=o1 o2 ... oK ,

calculate the most likely sequence of hidden states si

that produced this observation sequence O.

Learning problem. Given some training observation sequences O=o1 o2 ... oK and general

structure of HMM (numbers of hidden and visible

states), adjust M=(A, B, ) to maximize the probability.

O=o1...oK denotes a sequence of observations

ok{v1,,vM}.

71

Main issues using HMMs

Learning/Training Problem

Modify the model parameters that best represents

the observed output, given the output sequence

and the model.

Consider the coin-toss example (with 3 biased

coins)

Say we get the observations as

{HHHHTTHTTTTHHTT}

So find the model parameters, i.e., transition matrix,

emission matrix and initial distribution that best

represents the output

72

Evaluation Problem

What is the chance of appearance of an output

observation while the model is known?

Consider coin-toss example (with 3 biased coins)

We know some previous output sequence obtained

from the coin-toss experiment

Say {HHHHTTHTTTTHHTT}

We know the model parameters too

So what is the possibility that we will get an output

sequence like {HTHT}

73

Decoding Problem

What is the state sequence that best explains the

output sequence while the model is known?

Say we get the observations as

{HHHHTTHTTTTHHTT}

Decode/Find the sequence of states that generates

the output sequence

In simpler words, find the sequence of tossing the 3

biased coins that generates the given output

sequence

74

Solution to the Problems

Learning/Training:

Baum-Welch Algorithm

Viterbi Training (Unsupervised Learning)

Evaluation:

Forward Algorithm

Decoding:

Forward-Backward Algorithm

Viterbi Algorithm

75

Thank you

HiddenMarkovModel FINAL

Documents

markov chain property

markov chains

discrete markov model

deterministic markov

instant of time markov

markov system s1 s3

time andrei andreyevich

x x x px x x