Top Banner
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili [email protected] University of Tehran
56

Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili [email protected] University of Tehran.

Dec 18, 2015

Download

Documents

David West
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY

Heshaam [email protected]

University of Tehran

Page 2: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

2

Introduction Hidden Markov Model (HMM) Maximum Entropy Maximum Entropy Markov Model (MEMM) machine learning methods A sequence classifier or sequence labeler is a

model whose job is to assign some label or class to each unit in a sequence

finite-state transducer is a non-probabilistic sequence classifier for transducing from sequences of words to sequences of morphemes

HMM and MEMM extend this notion by being probabilistic sequence classifiers

Page 3: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

3

Markov chain Observed Markov model Weighted finite-state automaton Markov Chain: a weighted automaton in

which the input sequence uniquely determines which states the automaton will go through

can’t represent inherently ambiguous problems useful for assigning probabilities to unambiguous

sequences

Page 4: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

4

Markov Chain

Page 5: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

5

Formal Description

Page 6: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

6

Formal Description

First-order Markov Chain: the probability of a particular state is dependent only on the previous state

Markov Assumption: P(qi|q1...qi−1) = P(qi|qi−1)

Page 7: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

7

Markov Chain example

compute the probability of each of the following sequences

hot hot hot hotcold hot cold hot

Page 8: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

8

Hidden Markov Model in POS tagging we didn’t observe POS tags in

the world; we saw words, and had to infer the correct tags from the word sequence. We call the POS tags hidden because they are not observed

HMM allows us to talk HIDDEN MARKOV about both observed MODEL events (like words) and hidden events (like POS tags) that we think of as causal factors in our probabilistic model

Page 9: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

9

Jason Eisner (2002) example

Imagine that you are a climatologist in the year 2799 studying the history of global warming. You cannot find any records of the weather in Baltimore, Maryland, for the summer of 2007, but you do find Jason Eisner’s diary, which lists how many ice creams Jason ate every day that summer.

Our goal is to use these observations to estimate the temperature every day

Given a sequence of observations O, each observation an integer corresponding to the number of ice creams eaten on a given day, figure out the correct ‘hidden’ sequence Q of weather states (H or C) which caused Jason to eat the ice cream

Page 10: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

10

Formal Description

Page 11: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

11

Formal Description

Page 12: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

12

HMM Example

Page 13: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

13

Fully-connected (Ergodic) & Left-to-right (Bakis) HMM

Page 14: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

14

Three fundamental problems

Problem 1 (Computing Likelihood): Given an HMM = (A,B) and an observation sequence O, determine the likelihood P(O | )

Problem 2 (Decoding): Given an observation sequence O and an HMM = (A,B), discover the best hidden state sequence Q

Problem 3 (Learning): Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B

Page 15: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

15

COMPUTING LIKELIHOOD: THE FORWARD ALGORITHM

Given an HMM = (A,B) and an observation sequence O, determine the likelihood P(O | )

For a Markov chain: we could compute the probability of 3 1 3 just by following the states labeled 3 1 3 and multiplying the probabilities along the arcs

We want to determine the probability of an ice-cream observation sequence like 3 1 3, but we don’t know what the hidden state sequence is!

Markov chain: Suppose we already knew the weather, and wanted to predict how much ice cream Jason would eat

For a given hidden state sequence (e.g. hot hot cold) we can easily compute the output likelihood of 3 1 3.

Page 16: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

16

THE FORWARD ALGORITHM

Page 17: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

17

THE FORWARD ALGORITHM

Page 18: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

18

THE FORWARD ALGORITHM

dynamic programming O(N2T) N hidden states and an observation sequence of T

observations

T (j) represents the probability of being in state j after seeing the first t observations, given the automaton

qt = j means “the probability that the tth state in the sequence of states is state j”

Page 19: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

19

Page 20: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

20

THE FORWARD ALGORITHM

Page 21: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

21

THE FORWARD ALGORITHM

Page 22: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

22

THE FORWARD ALGORITHM

Page 23: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

23

DECODING: THE VITERBI

Page 24: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

24

DECODING: THE VITERBI ALGORITHM

vt (j) represents the probability that the HMM is in state j after seeing the first t observations and passing through the most probable state sequence q0,q1,...,qt−1, given the automaton

Page 25: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

25

TRAINING HMMS: THE FORWARD-BACKWARD ALGORITHM

Given an observation sequence O and the set of possible states in the HMM, learn the HMM parameters A and B

Ice-Cream task: we would start with a sequence of observations O = {1,3,2, ...,}, and the set of hidden states H and C.

part-of-speech tagging task: we would start with a sequence of observations O = {w1,w2,w3 . . .} and a set of hidden states NN, NNS, VBD, IN,...

Page 26: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

26

forward-backward Forward-backward or Baum-Welch

algorithm (Baum, 1972), a special case of the Expectation-Maximization (EM algorithm)

Start on Markov Model: no emission probabilities B (alternatively we could view a Markov chain as a degenerate Hidden Markov Model where all the b probabilities are 1.0 for the observed symbol and 0 for all other symbols)

Only need to train transition probability A

Page 27: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

27

forward-backward For Markov Chain: only need to compute the

state transition based on observation and calculate matrix A

For Hidden Markov Model: we can not count this transition

Baum-Welch algorithm uses two intuitions: The first idea is to iteratively estimate the counts computing the forward probability for an observation

and then dividing that probability mass among all the different paths that contributed to this forward probability

Page 28: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

28

backward probability.

Page 29: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

29

backward probability.

Page 30: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

30

backward probability.

Page 31: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

31

forward-backward

Page 32: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

32

forward-backward

Page 33: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

33

forward-backward

Page 34: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

34

forward-backward

The probability of being in state j at time t,

which we will call t (j)

Page 35: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

35

forward-backward

Page 36: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

36

forward-backward

Page 37: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

37

Page 38: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

38

MAXIMUM ENTROPY MODELS

Machine learning framework called Maximum Entropy modeling, MAXEnt

Used for Classification The task of classification is to take a single observation, extract

some useful features describing the observation, and then based on these features, to classify the observation into one of a set of discrete classes.

Probabilistic classifier: gives the probability of the observation being in that class

Non-sequential classification in text classification we might need to decide whether a

particular email should be classified as spam or not In sentiment analysis we have to determine whether a particular

sentence or document expresses a positive or negative opinion. we’ll need to classify a period character (‘.’) as either a sentence

boundary or not

Page 39: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

39

MaxEnt MaxEnt belongs to the family of classifiers

known as the exponential or log-linear classifiers

MaxEnt works by extracting some set of features from the input, combining them linearly (meaning that we multiply each by a weight and then add them up), and then using this sum as an exponent

Example: tagging A feature for tagging might be this word ends in -ing

or the previous word was ‘the’

Page 40: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

40

Linear Regression Two different names for tasks that map some

input features into some output value: regression when the output is real-valued, and classification when the output is one of a discrete set of classes

Page 41: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

41

Linear Regression, Example

price = w0+w1 Num Adjectives∗

Page 42: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

42

Multiple linear regression

price=w0+w1 ∗Num Adjectives+w2 ∗Mortgage Rate+w3 ∗Num Unsold Houses

Page 43: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

43

Learning in linear regression

sum-squared error

Page 44: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

44

Logistic regression Classification in which the output y we are

trying to predict takes on one from a small set of discrete values

binary classification:

Odds

logit function

Page 45: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

45

Logistic regression

Page 46: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

46

Logistic regression

Page 47: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

47

Logistic regression: Classification

hyperplane

Page 48: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

48

Learning in logistic regression

conditional maximum likelihood

estimation.

Page 49: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

49

Learning in logistic regression

Convex Optimization

Page 50: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

50

MAXIMUM ENTROPY MODELING

multinomial logistic regression(MaxEnt) Most of the time, classification problems that come

up in language processing involve larger numbers of classes (part-of-speech classes)

y is a value take on C different value corresponding to classes C1,…,Cn

Page 51: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

51

Maximum Entropy Modeling

Indicator function: A feature that only takes on the values 0 and 1

Page 52: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

52

Maximum Entropy Modeling

Example Secretariat/NNP is/BEZ expected/VBN to/TO race/??

tomorrow/

Page 53: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

53

Maximum Entropy Modeling

Page 54: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

54

Why do we call it Maximum Entropy?

From of all possible distributions, the equiprobable distribution has the maximum entropy

Page 55: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

55

Why do we call it Maximum Entropy?

Page 56: Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili hfaili@ece.ut.ac.ir University of Tehran.

56

Maximum Entropy

probability distribution of a multinomial logistic regression model whose weights W maximize the likelihood of the training data! Thus the exponential model