Top Banner
Sequence Labeling Inputs: x = (x 1 , …, x n ) Labels: y = (y 1 , …, y n ) Typical goal: Given x, predict y Example sequence labeling tasks – Part-of-speech tagging – Named-entity-recognition (NER) Label people, places, organizations
18

Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Jun 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Sequence Labeling

• Inputs: x = (x1, …, xn)• Labels: y = (y1, …, yn)• Typical goal: Given x, predict y

• Example sequence labeling tasks– Part-of-speech tagging– Named-entity-recognition (NER)

• Label people, places, organizations

Page 2: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

NER Example:

Page 3: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

First Solution:Maximum Entropy Classifier

• Conditional model p(y|x).– Do not waste effort modeling p(x), since x

is given at test time anyway.– Allows more complicated input features,

since we do not need to modeldependencies between them.

• Feature functions f(x,y):– f1(x,y) = { word is Boston & y=Location }– f2(x,y) = { first letter capitalized & y=Name }– f3(x,y) = { x is an HTML link & y=Location}

Page 4: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

First Solution: MaxEnt Classifier

• How should we choose a classifier?

• Principle of maximum entropy– We want a classifier that:

• Matches feature constraints from training data.• Predictions maximize entropy.

• There is a unique, exponential familydistribution that meets these criteria.

Page 5: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

First Solution: MaxEnt Classifier

• p(y|x;θ), inference, learning, andgradient.

• (ON BOARD)

Page 6: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

First Solution: MaxEnt Classifier

• Problem with using a maximum entropyclassifier for sequence labeling:

• It makes decisions at each positionindependently!

Page 7: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Second Solution: HMM

• Defines a generative process.• Can be viewed as a weighted finite

state machine.!

P(y,x) = P(yt | yt"1)P(x | yt )t

#

Page 8: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Second Solution: HMM

• HMM problems: (ON BOARD)– Probability of an input sequence.– Most likely label sequence given an input

sequence.– Learning with known label sequences.– Learning with unknown label sequences?

Page 9: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Second Solution: HMM

• How can represent we multiple featuresin an HMM?– Treat them as conditionally independent

given the class label?• The example features we talked about are not

independent.– Try to model a more complex generative

process of the input features?• We may lose tractability (i.e. lose a dynamic

programming for exact inference).

Page 10: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Second Solution: HMM

• Let’s use a conditional model instead.

Page 11: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Third Solution: MEMM

• Use a series of maximum entropyclassifiers that know the previous label.

• Define a Viterbi algorithm for inference.

!

P(y | x) = Pyt"1 (yt | x)t

#

Page 12: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Third Solution: MEMM

• Finding the most likely label sequencegiven an input sequence and learning.

• (ON BOARD)

Page 13: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Third Solution: MEMM

• Combines the advantages of maximumentropy and HMM!

• But there is a problem…

Page 14: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Problem with MEMMs: Label Bias• In some state space configurations,

MEMMs essentially completely ignorethe inputs.

• Example (ON BOARD).

• This is not a problem for HMMs,because the input sequence isgenerated by the model.

Page 15: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Fourth Solution:Conditional Random Field

• Conditionally-trained, undirectedgraphical model.

• For a standard linear-chain structure:

!

P(y | x) = "k (yt ,yt#1,x)t

$

"k (yt ,yt#1,x) = exp %kk

& f (yt ,yt#1,x)'

( )

*

+ ,

Page 16: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Fourth Solution: CRF

• Finding the most likely label sequencegiven an input sequence and learning.(ON BOARD)

Page 17: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Fourth Solution: CRF

• Have the advantages of MEMMs, butavoid the label bias problem.

• CRFs are globally normalized, whereasMEMMs are locally normalized.

• Widely used and applied. CRFs givestate-the-art results in many domains.

Page 18: Sequence Labeling - UMass Amherstmccallum/courses/inlp2007/lect15-memm-crf.p… · Sequence Labeling •Inputs: x = (x 1, …, x n) •Labels: y = (y 1, …, y n) •Typical goal:

Example Applications

• CRFs have been applied to:– Part-of-speech tagging– Named-entity-recognition– Table extraction– Gene prediction– Chinese word segmentation– Extracting information from research

papers.– Many more…