Top Banner
Hidden Markov Models (HMMs) Dhiraj DSG-MVL
73
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Hidden Markov Models (HMMs)

    Dhiraj

    DSG-MVL

  • Future is independent of the past given the present

    Used to model extraordinary large number of applications using temporal data or

    sequence of data eg weather, financial

    Language, music it deals with how world is

    evolving over time

    Andrei Andreyevich Markov

    (1856 -1922)

  • MARKOV CHAINS

    3

  • Markov Chain :

    Auto Insurance Example

    4

  • Auto Insurance Example

    5

  • Generics

    6

  • Markov Chain :

    Auto Insurance Example

    7

  • Markov Chain :

    Auto Insurance Example

    8

  • Markov Chain :

    Auto Insurance Example

    9

  • Markov Chain : Auto Insurance Example

    Power of Markov chain, it will allow us to travel

    in future many many steps

    10

  • Markov Chain : Auto Insurance Example

    11

  • Markov Chain : Free Throw confidence

    12

  • Markov Chain : Free Throw confidence

    13

  • Markov Chain : Free Throw confidence

    14

  • Markov Chain : Free Throw confidence

    15

  • Markov Chain : Free Throw confidence

    16

  • Markov Chain : Free Throw confidence Transitions

    17

  • Markov Chain : Free Throw confidence Transitions

    18

  • Markov Chain : Transition Matrix

    19

  • TRANSITION DIAGRAM: EXAMPLE 1

    20

  • TRANSITION DIAGRAM: EXAMPLE 2

    21

  • TRANSITION DIAGRAM: EXAMPLE 3

    Relative Probability

    22

  • MARKOV CHAIN

    Transient/ephemeral

    Recurrent

    Absorbing

  • 24

  • Current

    States

    States Going to

    25

  • System Behavior

    Initial State representing Initial Vector

    Arriving

    Playing on

    Phone

    Paying

    Attention

    After One

    time Unit

    After n time

    Units

    26

  • System Behavior

    After Two

    time Unit

    Playing on

    Phone

    Paying

    Attention

    Writing

    Notes

    Kicked

    Out

    27

  • System Behavior

    After 100

    time Unit

    Playing on

    Phone

    Paying

    Attention Writing

    Notes

    Listening Kicked

    Out

    Arriving

    28

  • Markov Model A Markov model is a type of stochastic

    process, sometimes referred to as a

    chain

    The model is similar to the FSM except that it is executed by probabilistic

    moves rather than deterministic moves

    It is nondeterministic, where the FSM is deterministic

  • Markov Models

    A discrete (finite) system:

    N distinct states.

    Begins (at time t=1) in some initial state(s).

    At each time step (t=1,2,) the system moves

    from current to next state according to transition

    probabilities associated with current state.

    This kind of system is called a finite, or discrete

    Markov model

    30

  • Set of states:

    Process moves from one state to another generating a

    sequence of states :

    Markov chain property: probability of each subsequent state

    depends only on what was the previous state:

    To define Markov model, the following probabilities have to be

    specified: transition probabilities and initial

    probabilities

    The output of the process is the set of states at each instant of

    time

    Markov Models

    },,,{ 21 Nsss

    ,,,, 21 ikii sss

    )|(),,,|( 1121 ikikikiiik ssPssssP

    )|( jiij ssPa )( ii sP

    31

  • Markov Property

    Markov Property: The state of the system at time t+1

    depends only on the state of the system at time t

    Xt=1 Xt=2 Xt=3 Xt=4 Xt=5

    ] x X | x P[X

    ] x X , x X , . . . , x X , x X | x P[X

    tt11t

    00111-t1-ttt11t

    t

    t

    32

  • A Markov System

    s1 s3

    s2

    Has N states, called s1, s2 .. sN

    There are discrete timesteps,

    t=0, t=1,

    N = 3

    t=0

  • A Markov System

    s1 s3

    s2

    Has N states, called s1, s2 .. sN

    There are discrete timesteps,

    t=0, t=1,

    On the tth timestep the system is in exactly one of the available

    states. Call it qt

    Note: qt {s1, s2 .. sN }

    N = 3

    t=0

    qt=q0=s3

    Current State

  • A Markov System

    s1 s3

    s2

    Has N states, called s1, s2 .. sN

    There are discrete timesteps,

    t=0, t=1,

    On the tth timestep the system is in exactly one of the available

    states. Call it qt

    Note: qt {s1, s2 .. sN }

    Between each timestep, the next

    state is chosen randomly.

    N = 3

    t=1

    qt=q1=s2

    Current State

  • A Markov System

    s1 s3

    s2

    Has N states, called s1, s2 .. sN

    There are discrete timesteps,

    t=0, t=1,

    On the tth timestep the system is in exactly one of the available

    states. Call it qt

    Note: qt {s1, s2 .. sN }

    Between each timestep, the next

    state is chosen randomly.

    The current state determines the

    probability distribution for the

    next state.

    N = 3

    t=1

    qt=q1=s2

    P(qt+1=s1|qt=s3) = 1/3

    P(qt+1=s2|qt=s3) = 2/3

    P(qt+1=s3|qt=s3) = 0

    P(qt+1=s1|qt=s1) = 0

    P(qt+1=s2|qt=s1) = 0

    P(qt+1=s3|qt=s1) = 1

    P(qt+1=s1|qt=s2) = 1/2

    P(qt+1=s2|qt=s2) = 1/2

    P(qt+1=s3|qt=s2) = 0

  • A Markov System

    s1 s3

    s2

    Has N states, called s1, s2 .. sN

    There are discrete timesteps,

    t=0, t=1,

    On the tth timestep the system is in exactly one of the available

    states. Call it qt

    Note: qt {s1, s2 .. sN }

    Between each timestep, the next

    state is chosen randomly.

    The current state determines the

    probability distribution for the

    next state.

    N = 3

    t=1

    qt=q1=s2

    P(qt+1=s1|qt=s3) = 1/3

    P(qt+1=s2|qt=s3) = 2/3

    P(qt+1=s3|qt=s3) = 0

    P(qt+1=s1|qt=s1) = 0

    P(qt+1=s2|qt=s1) = 0

    P(qt+1=s3|qt=s1) = 1

    P(qt+1=s1|qt=s2) = 1/2

    P(qt+1=s2|qt=s2) = 1/2

    P(qt+1=s3|qt=s2) = 0

    1/2

    1/2

    1/3

    2/3

    1

    Often notated with arcs

    between states

  • Markov Property

    s1 s3

    s2 qt+1 is conditionally independent

    of { qt-1, qt-2, q1, q0 } given qt.

    In other words:

    P(qt+1 = sj |qt = si ) =

    P(qt+1 = sj |qt = si ,any earlier history) N = 3

    t=1

    qt=q1=s2

    P(qt+1=s1|qt=s3) = 1/3

    P(qt+1=s2|qt=s3) = 2/3

    P(qt+1=s3|qt=s3) = 0

    P(qt+1=s1|qt=s1) = 0

    P(qt+1=s2|qt=s1) = 0

    P(qt+1=s3|qt=s1) = 1

    P(qt+1=s1|qt=s2) = 1/2

    P(qt+1=s2|qt=s2) = 1/2

    P(qt+1=s3|qt=s2) = 0

    1/2

    1/2

    1/3

    2/3

    1

  • Hidden Markov Models

    (probabilistic finite state automata)

    Often we face scenarios where states cannot be

    directly observed.

    We need an extension: Hidden Markov Models

    a11 a22 a33 a44

    a12 a23 a34

    b11 b14 b12

    b13

    1 2 3

    4

    Observed

    phenomenon

    aij are state transition probabilities.

    bik are observation (output) probabilities. b11 + b12 + b13 + b14 = 1,

    b21 + b22 + b23 + b24 = 1, etc.

  • Hidden Markov Models - HMM

    H1 H2 HL-1 HL

    X1 X2 XL-1 XL

    Hi

    Xi

    Hidden variables

    Observed data

  • Definition of Hidden Markov Model The Hidden Markov Model (HMM) is a finite set of states,

    each of which is associated with a probability

    distribution. A Hidden Markov model is a statistical model in which the system

    being modelled is assumed to be markov process with unobserved

    hidden states.

    Transitions among the states are governed by a set

    of probabilities called transition probabilities.

    In a particular state an outcome or observation

    can be generated, according to the associated

    probability distribution.

    It is only the outcome, not the state visible to an external

    observer and therefore states are ``hidden'' from the

    observer; hence the name Hidden Markov Model.

  • Hidden Markov Models A Hidden Markov model is a statistical model

    in which the system being modelled is

    assumed to be markov process with

    unobserved hidden states.

    In Regular Markov models the state is clearly visible to others in which the state transition

    probabilities are the parameters only where

    as in HMM the state is not visible but the

    output is visible.

  • Hidden Markov Model Consider a discrete-time Markov Process

    Consider a system that may be described at any time as being in one of a set of N distinct states

    At regularly spaced, discrete times, the system undergoes a change of state according to a set of

    probabilities associated with the state

    We denote the time instance associated with state changes as t = 1,2, and actual state at time t as

  • Essentials To define hidden Markov model, the following

    probabilities have to be specified: matrix of

    transition probabilities A=(aij), aij= P(si | sj) ,

    matrix of observation probabilities

    B=(bi (vm )), bi(vm ) = P(vm | si) and a vector of

    initial probabilities =(i), i = P(si) . Model is

    represented by M=(A, B, ).

  • Hidden Markov Model

    45

  • Discrete Markov Model: Example

    Discrete Markov Model

    with 5 states.

    Each aij represents the

    probability of moving

    from state i to state j

    The aij are given in a

    matrix A = {aij}

    The probability to start

    in a given state i is i , The vector repre-

    sents these start

    probabilities.

    46

  • Overview

    Mathematical notation

    Example : flow chart

    Mathematic Notation

    To obtain the conditional probability of achieving a

    particular state based on previous state

    X1,X2,.Xn where X1 represents variable at time 1

    P[Xn+1 = j | Xn = i ] = P(i,j)

    What is the probability that given the system is in state i and it

    will move in state j

    47

  • Mathematical Notation

    (0,0) (0,1) (0,2)

    (1,0) (1,1) (1,2)

    (2,0) (2,1) (2,2)

    P P P

    P P P

    P P P

    P(0,0): Probability of moving from state 0

    to state 0

    Probability Matrix

    0

    1

    2

    0.3

    0.4

    1.0

    0.5

    0.3

    0.5

    0.5 0.5 0

    0 0 1.0

    0.3 0.4 0.3

    48

  • Example: Orange Juice

    Assumption: A Family of four buy orange juice once a weak

    A = Someone using Brand A A = Someone using other Brand

    Transition Diagram

    A A Next State

    P = A

    A

    Current State

    Transition Probability Matrix

    A A

    So = Initial State Distribution Matrix

    A A 0.9

    0.1

    0.7

    0.3

    0.9 0.1

    0.7 0.3

    0.2 0.8

    49

  • Example: Orange Juice

    Start 0.2

    0.8

    A

    A

    0.9

    0.1 0.7

    0.3

    A

    A A

    A

    To find probability that someone

    uses Brand A after one week

    P(Brand A after 1 wk) =

    (0.2) (0.9) + (0.8) (0.7) = .74

    Initial State Distribution Matrix

    0.74 0.26

    0.2 0.8So =

    S1 =

    A A

    50

  • Markov Model

    51

  • Markov Model

    52

  • Markov Model

    53

  • Hidden Markov Model Markov model is a process in which each state

    corresponds to a deterministically observable event

    and hence the output of any given state is not

    random

    We extend the concept of Markov Models to include

    the case in which the observation is a probabilistic

    function of the state

    That is the resulting model is a doubly embedded

    stochastic process with an underlying stochastic

    process that is not directly observable(hidden) but

    can be observed only through another set of

    stochastic process that produce the sequence of

    observations 54

  • HMM Components

    A set of states (xs)

    A set of possible output symbols (ys)

    A state transition matrix (as) probability of making transition from

    one state to the next

    Output emission matrix (bs) probability of a emitting/observing a

    symbol at a particular state

    Initial probability vector probability of starting at a particular

    state

    Not shown, sometimes assumed to be 1

  • COIN-TOSS MODEL

    59

  • COIN-TOSS MODEL (contd..)

    60

  • COIN-TOSS MODEL (contd..)

    61

  • COIN-TOSS MODEL (contd..)

    62

  • Weather Example Revisited

    63

  • 64

    Weather Example Revisited

  • 65

    Weather Example Revisited

  • PROBLEM

    66

  • Solution

    67

  • Solution contd.

    68

  • Solution contd.

    69

  • States

    Observation

    70

  • Evaluation problem. Given the HMM M=(A, B, )

    and the observation sequence O=o1 o2 ... oK ,

    calculate the probability that model M has generated

    sequence O .

    Decoding problem. Given the HMM M=(A, B, ) and the observation sequence O=o1 o2 ... oK ,

    calculate the most likely sequence of hidden states si

    that produced this observation sequence O.

    Learning problem. Given some training observation sequences O=o1 o2 ... oK and general

    structure of HMM (numbers of hidden and visible

    states), adjust M=(A, B, ) to maximize the probability.

    O=o1...oK denotes a sequence of observations

    ok{v1,,vM}.

    71

    Main issues using HMMs

  • Learning/Training Problem

    Modify the model parameters that best represents

    the observed output, given the output sequence

    and the model.

    Consider the coin-toss example (with 3 biased

    coins)

    Say we get the observations as

    {HHHHTTHTTTTHHTT}

    So find the model parameters, i.e., transition matrix,

    emission matrix and initial distribution that best

    represents the output

    72

  • Evaluation Problem

    What is the chance of appearance of an output

    observation while the model is known?

    Consider coin-toss example (with 3 biased coins)

    We know some previous output sequence obtained

    from the coin-toss experiment

    Say {HHHHTTHTTTTHHTT}

    We know the model parameters too

    So what is the possibility that we will get an output

    sequence like {HTHT}

    73

  • Decoding Problem

    What is the state sequence that best explains the

    output sequence while the model is known?

    Say we get the observations as

    {HHHHTTHTTTTHHTT}

    Decode/Find the sequence of states that generates

    the output sequence

    In simpler words, find the sequence of tossing the 3

    biased coins that generates the given output

    sequence

    74

  • Solution to the Problems

    Learning/Training:

    Baum-Welch Algorithm

    Viterbi Training (Unsupervised Learning)

    Evaluation:

    Forward Algorithm

    Decoding:

    Forward-Backward Algorithm

    Viterbi Algorithm

    75

  • Thank you