Top Banner
Lecture 5: Sequence Models II Alan Ri7er (many slides from Greg Durrett, Dan Klein,Vivek Srikumar, Chris Manning,Yoav Artzi)
176

Lecture 5: Sequence Models II

Jan 12, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 5: Sequence Models II

Lecture5:SequenceModelsII

AlanRi7er(many slides from Greg Durrett, Dan Klein, Vivek Srikumar, Chris Manning, Yoav Artzi)

Page 2: Lecture 5: Sequence Models II

Recall:HMMs

‣ Inputx = (x1, ..., xn) y = (y1, ..., yn)Output

Page 3: Lecture 5: Sequence Models II

Recall:HMMs

‣ Inputx = (x1, ..., xn) y = (y1, ..., yn)Output

y1 y2 yn

x1 x2 xn

Page 4: Lecture 5: Sequence Models II

Recall:HMMs

‣ Inputx = (x1, ..., xn) y = (y1, ..., yn)Output

y1 y2 yn

x1 x2 xn

… P (y,x) = P (y1)nY

i=2

P (yi|yi�1)nY

i=1

P (xi|yi)

Page 5: Lecture 5: Sequence Models II

Recall:HMMs

‣ Inputx = (x1, ..., xn) y = (y1, ..., yn)Output

y1 y2 yn

x1 x2 xn

… P (y,x) = P (y1)nY

i=2

P (yi|yi�1)nY

i=1

P (xi|yi)

‣ Training:maximumlikelihoodesBmaBon(withsmoothing)

Page 6: Lecture 5: Sequence Models II

Recall:HMMs

‣ Inferenceproblem:

‣ Inputx = (x1, ..., xn) y = (y1, ..., yn)Output

y1 y2 yn

x1 x2 xn

… P (y,x) = P (y1)nY

i=2

P (yi|yi�1)nY

i=1

P (xi|yi)

argmaxyP (y|x) = argmaxyP (y,x)

P (x)

‣ Training:maximumlikelihoodesBmaBon(withsmoothing)

Page 7: Lecture 5: Sequence Models II

Recall:HMMs

‣ Inferenceproblem:

‣ Viterbi:

‣ Inputx = (x1, ..., xn) y = (y1, ..., yn)Output

y1 y2 yn

x1 x2 xn

… P (y,x) = P (y1)nY

i=2

P (yi|yi�1)nY

i=1

P (xi|yi)

argmaxyP (y|x) = argmaxyP (y,x)

P (x)

‣ Training:maximumlikelihoodesBmaBon(withsmoothing)

scorei(s) = maxyi�1

P (s|yi�1)P (xi|s)scorei�1(yi�1)

Page 8: Lecture 5: Sequence Models II

ThisLecture

‣ (ifBme)Beamsearch

‣ CRFs:model(+featuresforNER),inference,learning

‣ NamedenBtyrecogniBon(NER)

Page 9: Lecture 5: Sequence Models II

NamedEnBtyRecogniBon

Page 10: Lecture 5: Sequence Models II

NamedEnBtyRecogniBon

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

Page 11: Lecture 5: Sequence Models II

NamedEnBtyRecogniBon

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

Page 12: Lecture 5: Sequence Models II

NamedEnBtyRecogniBon

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣ BIOtagset:begin,inside,outside

Page 13: Lecture 5: Sequence Models II

NamedEnBtyRecogniBon

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣ BIOtagset:begin,inside,outside

‣WhymightanHMMnotdosowellhere?

‣ Sequenceoftags—shouldweuseanHMM?

Page 14: Lecture 5: Sequence Models II

NamedEnBtyRecogniBon

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣ BIOtagset:begin,inside,outside

‣WhymightanHMMnotdosowellhere?

‣ LotsofO’s,sotagsaren’tasinformaBveaboutcontext

‣ Sequenceoftags—shouldweuseanHMM?

Page 15: Lecture 5: Sequence Models II

NamedEnBtyRecogniBon

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣ BIOtagset:begin,inside,outside

‣WhymightanHMMnotdosowellhere?

‣ LotsofO’s,sotagsaren’tasinformaBveaboutcontext

‣ Sequenceoftags—shouldweuseanHMM?

‣ Insufficientfeatures/capacitywithmulBnomials(especiallyforunks)

Page 16: Lecture 5: Sequence Models II

CRFs

Page 17: Lecture 5: Sequence Models II

CondiBonalRandomFields

‣ HMMsareexpressibleasBayesnets(factorgraphs)

y1 y2 yn

x1 x2 xn

Page 18: Lecture 5: Sequence Models II

CondiBonalRandomFields

‣ HMMsareexpressibleasBayesnets(factorgraphs)

y1 y2 yn

x1 x2 xn

‣ ThisreflectsthefollowingdecomposiBon:

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 19: Lecture 5: Sequence Models II

CondiBonalRandomFields

‣ HMMsareexpressibleasBayesnets(factorgraphs)

y1 y2 yn

x1 x2 xn

‣ ThisreflectsthefollowingdecomposiBon:

‣ Locallynormalizedmodel:eachfactorisaprobabilitydistribuBonthatnormalizes

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 20: Lecture 5: Sequence Models II

CondiBonalRandomFields‣ HMMs: P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 21: Lecture 5: Sequence Models II

CondiBonalRandomFields

‣ CRFs:discriminaBvemodelswiththefollowingglobally-normalizedform:

‣ HMMs: P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 22: Lecture 5: Sequence Models II

CondiBonalRandomFields

‣ CRFs:discriminaBvemodelswiththefollowingglobally-normalizedform:

‣ HMMs:

P (y|x) = 1

Z

Y

k

exp(�k(x,y))

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 23: Lecture 5: Sequence Models II

CondiBonalRandomFields

‣ CRFs:discriminaBvemodelswiththefollowingglobally-normalizedform:

‣ HMMs:

P (y|x) = 1

Z

Y

k

exp(�k(x,y))

normalizer

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 24: Lecture 5: Sequence Models II

CondiBonalRandomFields

anyreal-valuedscoringfuncBonofitsarguments

‣ CRFs:discriminaBvemodelswiththefollowingglobally-normalizedform:

‣ HMMs:

P (y|x) = 1

Z

Y

k

exp(�k(x,y))

normalizer

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 25: Lecture 5: Sequence Models II

CondiBonalRandomFields

anyreal-valuedscoringfuncBonofitsarguments

‣ CRFs:discriminaBvemodelswiththefollowingglobally-normalizedform:

‣ HMMs:

‣ NaiveBayes:logisBcregression::HMMs:CRFslocalvs.globalnormalizaBon<->generaBvevs.discriminaBve

P (y|x) = 1

Z

Y

k

exp(�k(x,y))

normalizer

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 26: Lecture 5: Sequence Models II

CondiBonalRandomFields

anyreal-valuedscoringfuncBonofitsarguments

‣ CRFs:discriminaBvemodelswiththefollowingglobally-normalizedform:

‣ HMMs:

‣ NaiveBayes:logisBcregression::HMMs:CRFslocalvs.globalnormalizaBon<->generaBvevs.discriminaBve

P (y|x) = 1

Z

Y

k

exp(�k(x,y))

normalizer

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

‣ LocallynormalizeddiscriminaBvemodelsdoexist(MEMMs)

Page 27: Lecture 5: Sequence Models II

CondiBonalRandomFields

anyreal-valuedscoringfuncBonofitsarguments

‣ Howdowemaxovery?Intractableingeneral—canwefixthis?

‣ CRFs:discriminaBvemodelswiththefollowingglobally-normalizedform:

‣ HMMs:

‣ NaiveBayes:logisBcregression::HMMs:CRFslocalvs.globalnormalizaBon<->generaBvevs.discriminaBve

P (y|x) = 1

Z

Y

k

exp(�k(x,y))

normalizer

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

‣ LocallynormalizeddiscriminaBvemodelsdoexist(MEMMs)

Page 28: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

P (y|x) /Y

k

exp(�k(x,y))

‣ HMMs:

‣ CRFs:

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 29: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

P (y|x) /Y

k

exp(�k(x,y))

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣ HMMs:

‣ CRFs:

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 30: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

P (y|x) /Y

k

exp(�k(x,y))

y1 y2 yn

x1 x2 xn

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣ HMMs:

‣ CRFs:

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 31: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

P (y|x) /Y

k

exp(�k(x,y))

y1 y2 yn

x1 x2 xn

…�o

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣ HMMs:

‣ CRFs:

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 32: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

P (y|x) /Y

k

exp(�k(x,y))

y1 y2 yn

x1 x2 xn

…�t

�o

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣ HMMs:

‣ CRFs:

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 33: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

P (y|x) /Y

k

exp(�k(x,y))

y1 y2 yn

x1 x2 xn

…�t

�e

�o

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣ HMMs:

‣ CRFs:

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

Page 34: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

�t

�e

�o

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

Page 35: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

�t

�e

�o

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣WecondiBononx,soeveryfactorcandependonallofx(includingtransiBons,butwewon’tdothis)

Page 36: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

�t

�e

�o

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣WecondiBononx,soeveryfactorcandependonallofx(includingtransiBons,butwewon’tdothis)

nY

i=1

exp(�e(yi, i,x))

Page 37: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

�t

�e

�o

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣WecondiBononx,soeveryfactorcandependonallofx(includingtransiBons,butwewon’tdothis)

nY

i=1

exp(�e(yi, i,x))

tokenindex—letsuslookatcurrentword

Page 38: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

�t

�e

�o y1 y2 yn…

�t

�e

�o

x

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣WecondiBononx,soeveryfactorcandependonallofx(includingtransiBons,butwewon’tdothis)

nY

i=1

exp(�e(yi, i,x))

tokenindex—letsuslookatcurrentword

Page 39: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn

x1 x2 xn

�t

�e

�o y1 y2 yn…

�t

�e

�o

x

P (y|x) / exp(�o(y1))nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(xi, yi))

‣WecondiBononx,soeveryfactorcandependonallofx(includingtransiBons,butwewon’tdothis)

nY

i=1

exp(�e(yi, i,x))

‣ ycan’tdependarbitrarilyonxinageneraBvemodel

tokenindex—letsuslookatcurrentword

Page 40: Lecture 5: Sequence Models II

SequenBalCRFs

y1 y2 yn…

�t

�e

�o

x

Page 41: Lecture 5: Sequence Models II

SequenBalCRFs

‣ NotaBon:omitxfromthefactorgraphenBrely(implicit)

y1 y2 yn…

�t

�e

�o

x

Page 42: Lecture 5: Sequence Models II

SequenBalCRFs

‣ NotaBon:omitxfromthefactorgraphenBrely(implicit)

y1 y2 yn…

�t

�e

�o

x

y1 y2 yn…

�t

�e

�o

Page 43: Lecture 5: Sequence Models II

SequenBalCRFs

‣ NotaBon:omitxfromthefactorgraphenBrely(implicit)

y1 y2 yn…

�t

�e

�o

x

y1 y2 yn…

�t

�e

�o

‣ Don’tincludeiniBaldistribuBon,canbakeintootherfactors

Page 44: Lecture 5: Sequence Models II

SequenBalCRFs

‣ NotaBon:omitxfromthefactorgraphenBrely(implicit)

y1 y2 yn…

�t

�e

�o

x

y1 y2 yn…

�t

�e

�o

‣ Don’tincludeiniBaldistribuBon,canbakeintootherfactors

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

SequenBalCRFs:

Page 45: Lecture 5: Sequence Models II

FeatureFuncBons

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Page 46: Lecture 5: Sequence Models II

FeatureFuncBons

y1 y2 yn…

�e

�t

‣ Phiscanbealmostanything!HereweuselinearfuncBonsofsparsefeatures

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Page 47: Lecture 5: Sequence Models II

FeatureFuncBons

y1 y2 yn…

�e

�t

‣ Phiscanbealmostanything!HereweuselinearfuncBonsofsparsefeatures

�e(yi, i,x) = w>fe(yi, i,x)

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Page 48: Lecture 5: Sequence Models II

FeatureFuncBons

y1 y2 yn…

�e

�t

‣ Phiscanbealmostanything!HereweuselinearfuncBonsofsparsefeatures

�t(yi�1, yi) = w>ft(yi�1, yi)�e(yi, i,x) = w>fe(yi, i,x)

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Page 49: Lecture 5: Sequence Models II

FeatureFuncBons

y1 y2 yn…

�e

�t

‣ Phiscanbealmostanything!HereweuselinearfuncBonsofsparsefeatures

�t(yi�1, yi) = w>ft(yi�1, yi)

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#�e(yi, i,x) = w>fe(yi, i,x)

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Page 50: Lecture 5: Sequence Models II

FeatureFuncBons

y1 y2 yn…

�e

�t

‣ Phiscanbealmostanything!HereweuselinearfuncBonsofsparsefeatures

‣ LookslikeoursingleweightvectormulBclasslogisBcregressionmodel

�t(yi�1, yi) = w>ft(yi�1, yi)

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#�e(yi, i,x) = w>fe(yi, i,x)

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Page 51: Lecture 5: Sequence Models II

BasicFeaturesforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 52: Lecture 5: Sequence Models II

BasicFeaturesforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

OB-LOC

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 53: Lecture 5: Sequence Models II

BasicFeaturesforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

OB-LOC

TransiBons: ft(yi�1, yi) = Ind[yi�1 & yi]

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

=Ind[O—B-LOC]

Page 54: Lecture 5: Sequence Models II

BasicFeaturesforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

OB-LOC

TransiBons:

Emissions:

ft(yi�1, yi) = Ind[yi�1 & yi]

fe(y6, 6,x) =

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

=Ind[O—B-LOC]

Page 55: Lecture 5: Sequence Models II

BasicFeaturesforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

OB-LOC

TransiBons:

Emissions: Ind[B-LOC&Currentword=Hangzhou]

ft(yi�1, yi) = Ind[yi�1 & yi]

fe(y6, 6,x) =

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

=Ind[O—B-LOC]

Page 56: Lecture 5: Sequence Models II

BasicFeaturesforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

OB-LOC

TransiBons:

Emissions: Ind[B-LOC&Currentword=Hangzhou]Ind[B-LOC&Prevword=to]

ft(yi�1, yi) = Ind[yi�1 & yi]

fe(y6, 6,x) =

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

=Ind[O—B-LOC]

Page 57: Lecture 5: Sequence Models II

FeaturesforNER

Leicestershireisaniceplacetovisit…

Itookavaca=ontoBoston

Applereleasedanewversion…

AccordingtotheNewYorkTimes…

ORG

ORG

LOC

LOC

TexasgovernorGregAbboIsaid

LeonardoDiCapriowonanaward…

PER

PER

LOC

�e(yi, i,x)

Page 58: Lecture 5: Sequence Models II

FeaturesforNER

‣ Contextfeatures(can’tuseinHMM!)‣Wordsbefore/ager‣ Tagsbefore/ager

‣Wordfeatures(canuseinHMM)‣ CapitalizaBon‣Wordshape‣ Prefixes/suffixes‣ Lexicalindicators

‣ Gaze7eers‣Wordclusters

Leicestershire

Boston

Applereleasedanewversion…

AccordingtotheNewYorkTimes…

Page 59: Lecture 5: Sequence Models II

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference

‣ Learning

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 60: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Page 61: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

Page 62: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

Page 63: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

= maxy2,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x) maxy1

e�t(y1,y2)e�e(y1,1,x)

Page 64: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

= maxy2,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x) maxy1

e�t(y1,y2)e�e(y1,1,x)

Page 65: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

{maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

= maxy2,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x) maxy1

e�t(y1,y2)e�e(y1,1,x)

Page 66: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

{maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

= maxy2,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x) maxy1

e�t(y1,y2)e�e(y1,1,x)

= maxy3,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · ·maxy2

e�t(y2,y3)e�e(y2,2,x) maxy1

e�t(y1,y2)score1(y1)

Page 67: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

{maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

= maxy2,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x) maxy1

e�t(y1,y2)e�e(y1,1,x)

= maxy3,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · ·maxy2

e�t(y2,y3)e�e(y2,2,x) maxy1

e�t(y1,y2)score1(y1){

Page 68: Lecture 5: Sequence Models II

CompuBng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

‣ andplaytheroleofthePsnow,samedynamicprogramexp(�t(yi�1, yi)) exp(�e(yi, i,x))

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

{maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

= maxy2,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x) maxy1

e�t(y1,y2)e�e(y1,1,x)

= maxy3,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · ·maxy2

e�t(y2,y3)e�e(y2,2,x) maxy1

e�t(y1,y2)score1(y1){

Page 69: Lecture 5: Sequence Models II

InferenceinGeneralCRFs

y1 y2 yn…

�e

�t

Page 70: Lecture 5: Sequence Models II

InferenceinGeneralCRFs

y1 y2 yn…

�e

�t

‣ Candoinferenceinanytree-structuredCRF

Page 71: Lecture 5: Sequence Models II

InferenceinGeneralCRFs

y1 y2 yn…

�e

�t

‣ Candoinferenceinanytree-structuredCRF

‣Max-productalgorithm:generalizaBonofViterbitoarbitrarytree-structuredgraphs(sum-productisgeneralizaBonofforward-backward)

Page 72: Lecture 5: Sequence Models II

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference:argmaxP(y|x)fromViterbi

‣ Learning

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 73: Lecture 5: Sequence Models II

TrainingCRFs

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 74: Lecture 5: Sequence Models II

TrainingCRFs

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

P (y|x) / expw>f(x, y)‣ LogisBcregression:

Page 75: Lecture 5: Sequence Models II

TrainingCRFs

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

P (y|x) / expw>f(x, y)‣ LogisBcregression:

‣Maximize L(y⇤,x) = logP (y⇤|x)

Page 76: Lecture 5: Sequence Models II

TrainingCRFs

‣ GradientiscompletelyanalogoustologisBcregression:

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

P (y|x) / expw>f(x, y)‣ LogisBcregression:

‣Maximize L(y⇤,x) = logP (y⇤|x)

Page 77: Lecture 5: Sequence Models II

TrainingCRFs

‣ GradientiscompletelyanalogoustologisBcregression:

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

P (y|x) / expw>f(x, y)‣ LogisBcregression:

‣Maximize L(y⇤,x) = logP (y⇤|x)

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 78: Lecture 5: Sequence Models II

TrainingCRFs

‣ GradientiscompletelyanalogoustologisBcregression:

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

P (y|x) / expw>f(x, y)‣ LogisBcregression:

‣Maximize L(y⇤,x) = logP (y⇤|x)

intractable!

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 79: Lecture 5: Sequence Models II

TrainingCRFs@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 80: Lecture 5: Sequence Models II

TrainingCRFs

‣ Let’sfocusonemissionfeatureexpectaBon

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 81: Lecture 5: Sequence Models II

TrainingCRFs

‣ Let’sfocusonemissionfeatureexpectaBon

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Ey

"nX

i=1

fe(yi, i,x)

#

Page 82: Lecture 5: Sequence Models II

TrainingCRFs

‣ Let’sfocusonemissionfeatureexpectaBon

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Ey

"nX

i=1

fe(yi, i,x)

#=

X

y2YP (y|x)

"nX

i=1

fe(yi, i,x)

#

Page 83: Lecture 5: Sequence Models II

TrainingCRFs

‣ Let’sfocusonemissionfeatureexpectaBon

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Ey

"nX

i=1

fe(yi, i,x)

#=

X

y2YP (y|x)

"nX

i=1

fe(yi, i,x)

#=

nX

i=1

X

y2YP (y|x)fe(yi, i,x)

Page 84: Lecture 5: Sequence Models II

TrainingCRFs

‣ Let’sfocusonemissionfeatureexpectaBon

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Ey

"nX

i=1

fe(yi, i,x)

#=

X

y2YP (y|x)

"nX

i=1

fe(yi, i,x)

#=

nX

i=1

X

y2YP (y|x)fe(yi, i,x)

=nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 85: Lecture 5: Sequence Models II

CompuBngMarginals

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Page 86: Lecture 5: Sequence Models II

CompuBngMarginals

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Z =X

y

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))‣ Normalizingconstant

Page 87: Lecture 5: Sequence Models II

CompuBngMarginals

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Z =X

y

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))‣ Normalizingconstant

‣ AnalogoustoP(x)forHMMs

Page 88: Lecture 5: Sequence Models II

CompuBngMarginals

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Z =X

y

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ ForbothHMMsandCRFs:

‣ Normalizingconstant

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ AnalogoustoP(x)forHMMs

Page 89: Lecture 5: Sequence Models II

CompuBngMarginals

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Z =X

y

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ ForbothHMMsandCRFs:

‣ Normalizingconstant

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

ZforCRFs,P(x)forHMMs

‣ AnalogoustoP(x)forHMMs

Page 90: Lecture 5: Sequence Models II

Posteriorsvs.ProbabiliBes

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condiBonedonx!)

Page 91: Lecture 5: Sequence Models II

Posteriorsvs.ProbabiliBes

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condiBonedonx!)

HMM Modelparameter(usuallymulBnomialdistribuBon)

P (xi|yi), P (yi|yi�1)

Page 92: Lecture 5: Sequence Models II

Posteriorsvs.ProbabiliBes

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condiBonedonx!)

HMM Modelparameter(usuallymulBnomialdistribuBon)

P (xi|yi), P (yi|yi�1) P (yi|x), P (yi�1, yi|x)

Page 93: Lecture 5: Sequence Models II

Posteriorsvs.ProbabiliBes

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condiBonedonx!)

HMM Modelparameter(usuallymulBnomialdistribuBon)

InferredquanBtyfromforward-backward

P (xi|yi), P (yi|yi�1) P (yi|x), P (yi�1, yi|x)

Page 94: Lecture 5: Sequence Models II

Posteriorsvs.ProbabiliBes

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condiBonedonx!)

HMM

CRF

Modelparameter(usuallymulBnomialdistribuBon)

InferredquanBtyfromforward-backward

P (xi|yi), P (yi|yi�1) P (yi|x), P (yi�1, yi|x)

Page 95: Lecture 5: Sequence Models II

Posteriorsvs.ProbabiliBes

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condiBonedonx!)

HMM

CRF

Modelparameter(usuallymulBnomialdistribuBon)

InferredquanBtyfromforward-backward

Undefined(modelisbydefiniBoncondiBonedonx)

P (xi|yi), P (yi|yi�1) P (yi|x), P (yi�1, yi|x)

Page 96: Lecture 5: Sequence Models II

Posteriorsvs.ProbabiliBes

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condiBonedonx!)

HMM

CRF

Modelparameter(usuallymulBnomialdistribuBon)

InferredquanBtyfromforward-backward

InferredquanBtyfromforward-backward

Undefined(modelisbydefiniBoncondiBonedonx)

P (xi|yi), P (yi|yi�1) P (yi|x), P (yi�1, yi|x)

Page 97: Lecture 5: Sequence Models II

TrainingCRFs‣ Foremissionfeatures:

@

@wL(y⇤,x) =

nX

i=1

fe(y⇤i , i,x)�

nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 98: Lecture 5: Sequence Models II

TrainingCRFs‣ Foremissionfeatures:

goldfeatures—expectedfeaturesundermodel

@

@wL(y⇤,x) =

nX

i=1

fe(y⇤i , i,x)�

nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 99: Lecture 5: Sequence Models II

TrainingCRFs

‣ TransiBonfeatures:needtocomputeP (yi = s1, yi+1 = s2|x)usingforward-backwardaswell

‣ Foremissionfeatures:

goldfeatures—expectedfeaturesundermodel

@

@wL(y⇤,x) =

nX

i=1

fe(y⇤i , i,x)�

nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 100: Lecture 5: Sequence Models II

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference:argmaxP(y|x)fromViterbi

‣ Learning:runforward-backwardtocomputeposteriorprobabiliBes;then

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

@

@wL(y⇤,x) =

nX

i=1

fe(y⇤i , i,x)�

nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 101: Lecture 5: Sequence Models II

Pseudocode

foreachepoch

foreachexample

Page 102: Lecture 5: Sequence Models II

Pseudocode

foreachepoch

foreachexample

extractfeaturesoneachemissionandtransiBon(lookupincache)

Page 103: Lecture 5: Sequence Models II

Pseudocode

foreachepoch

foreachexample

extractfeaturesoneachemissionandtransiBon(lookupincache)

computepotenBalsphibasedonfeatures+weights

Page 104: Lecture 5: Sequence Models II

Pseudocode

foreachepoch

foreachexample

extractfeaturesoneachemissionandtransiBon(lookupincache)

computemarginalprobabiliBeswithforward-backward

computepotenBalsphibasedonfeatures+weights

Page 105: Lecture 5: Sequence Models II

Pseudocode

foreachepoch

foreachexample

extractfeaturesoneachemissionandtransiBon(lookupincache)

computemarginalprobabiliBeswithforward-backward

computepotenBalsphibasedonfeatures+weights

accumulategradientoverallemissionsandtransiBons

Page 106: Lecture 5: Sequence Models II

StructuredPerceptron

Page 107: Lecture 5: Sequence Models II

StructuredPerceptron

argmaxy2Yw>f(x, y)y =

<latexit sha1_base64="lZVohhKf8gIklCebjvLxzG2Hzk8=">AAACG3icbVDLSsNAFJ3UV42vqks3g0VwVRIVdCMU3bisYB/ShjKZTNqhM5MwMxFCyFe4tV/jTty68GMEJ20WtvXChcM593LPPX7MqNKO821V1tY3Nreq2/bO7t7+Qe3wqKOiRGLSxhGLZM9HijAqSFtTzUgvlgRxn5GuP7kv9O4LkYpG4kmnMfE4GgkaUoy0oZ4HY6SzNIe3w1rdaTizgqvALUEdlNUa1n4GQYQTToTGDCnVd51YexmSmmJGcnuQKBIjPEEj0jdQIE6Ul80M5/DMMAEMI2laaDhj/25kiCuVct9McqTHalkryP+0fqLDGy+jIk40EXh+KEwY1BEsvocBlQRrlhqAsKTGK8RjJBHWJiN74UygCm8Lj2TxKDSmc9u2TV7ucjqroHPRcC8bzuNVvXlXJlcFJ+AUnAMXXIMmeAAt0AYYcPAK3sDUmlrv1of1OR+tWOXOMVgo6+sXnA2hXQ==</latexit>

w = w + f(x, y⇤)� f(x, y)<latexit sha1_base64="SHhE6uWGl/Q8vXXxUKr5184hkg8=">AAACM3icbVDLSsNAFJ3UV42vqEsXDhah9VESFXQjFN24rGAf0NYymU7aoZMHMxNrCF36NW7tx4g7cesnCE7aLGzrgQuHc+7l3nvsgFEhTfNdyywsLi2vZFf1tfWNzS1je6cq/JBjUsE+83ndRoIw6pGKpJKResAJcm1Ganb/NvFrT4QL6nsPMgpIy0VdjzoUI6mktrE/gNdwAI+hk38+iR6PCvB0TJs9JONoWGgbObNojgHniZWSHEhRbhs/zY6PQ5d4EjMkRMMyA9mKEZcUMzLUm6EgAcJ91CUNRT3kEtGKx48M4aFSOtDxuSpPwrH6dyJGrhCRa6tOF8memPUS8T+vEUrnqhVTLwgl8fBkkRMyKH2YpAI7lBMsWaQIwpyqWyHuIY6wVNnpU2s6Irlt6pE46Drq6KGu6yovazadeVI9K1rnRfP+Ile6SZPLgj1wAPLAApegBO5AGVQABi/gFbyBkTbSPrRP7WvSmtHSmV0wBe37Fy2gqAc=</latexit>

‣ StructuredPerceptronUpdate:

Page 108: Lecture 5: Sequence Models II

StructuredPerceptron

argmaxy2Yw>f(x, y)y =

<latexit sha1_base64="lZVohhKf8gIklCebjvLxzG2Hzk8=">AAACG3icbVDLSsNAFJ3UV42vqks3g0VwVRIVdCMU3bisYB/ShjKZTNqhM5MwMxFCyFe4tV/jTty68GMEJ20WtvXChcM593LPPX7MqNKO821V1tY3Nreq2/bO7t7+Qe3wqKOiRGLSxhGLZM9HijAqSFtTzUgvlgRxn5GuP7kv9O4LkYpG4kmnMfE4GgkaUoy0oZ4HY6SzNIe3w1rdaTizgqvALUEdlNUa1n4GQYQTToTGDCnVd51YexmSmmJGcnuQKBIjPEEj0jdQIE6Ul80M5/DMMAEMI2laaDhj/25kiCuVct9McqTHalkryP+0fqLDGy+jIk40EXh+KEwY1BEsvocBlQRrlhqAsKTGK8RjJBHWJiN74UygCm8Lj2TxKDSmc9u2TV7ucjqroHPRcC8bzuNVvXlXJlcFJ+AUnAMXXIMmeAAt0AYYcPAK3sDUmlrv1of1OR+tWOXOMVgo6+sXnA2hXQ==</latexit>

w = w + f(x, y⇤)� f(x, y)<latexit sha1_base64="SHhE6uWGl/Q8vXXxUKr5184hkg8=">AAACM3icbVDLSsNAFJ3UV42vqEsXDhah9VESFXQjFN24rGAf0NYymU7aoZMHMxNrCF36NW7tx4g7cesnCE7aLGzrgQuHc+7l3nvsgFEhTfNdyywsLi2vZFf1tfWNzS1je6cq/JBjUsE+83ndRoIw6pGKpJKResAJcm1Ganb/NvFrT4QL6nsPMgpIy0VdjzoUI6mktrE/gNdwAI+hk38+iR6PCvB0TJs9JONoWGgbObNojgHniZWSHEhRbhs/zY6PQ5d4EjMkRMMyA9mKEZcUMzLUm6EgAcJ91CUNRT3kEtGKx48M4aFSOtDxuSpPwrH6dyJGrhCRa6tOF8memPUS8T+vEUrnqhVTLwgl8fBkkRMyKH2YpAI7lBMsWaQIwpyqWyHuIY6wVNnpU2s6Irlt6pE46Drq6KGu6yovazadeVI9K1rnRfP+Ile6SZPLgj1wAPLAApegBO5AGVQABi/gFbyBkTbSPrRP7WvSmtHSmV0wBe37Fy2gqAc=</latexit>

‣ StructuredPerceptronUpdate:Viterbi Algorithm

Page 109: Lecture 5: Sequence Models II

StructuredPerceptron

argmaxy2Yw>f(x, y)y =

<latexit sha1_base64="lZVohhKf8gIklCebjvLxzG2Hzk8=">AAACG3icbVDLSsNAFJ3UV42vqks3g0VwVRIVdCMU3bisYB/ShjKZTNqhM5MwMxFCyFe4tV/jTty68GMEJ20WtvXChcM593LPPX7MqNKO821V1tY3Nreq2/bO7t7+Qe3wqKOiRGLSxhGLZM9HijAqSFtTzUgvlgRxn5GuP7kv9O4LkYpG4kmnMfE4GgkaUoy0oZ4HY6SzNIe3w1rdaTizgqvALUEdlNUa1n4GQYQTToTGDCnVd51YexmSmmJGcnuQKBIjPEEj0jdQIE6Ul80M5/DMMAEMI2laaDhj/25kiCuVct9McqTHalkryP+0fqLDGy+jIk40EXh+KEwY1BEsvocBlQRrlhqAsKTGK8RjJBHWJiN74UygCm8Lj2TxKDSmc9u2TV7ucjqroHPRcC8bzuNVvXlXJlcFJ+AUnAMXXIMmeAAt0AYYcPAK3sDUmlrv1of1OR+tWOXOMVgo6+sXnA2hXQ==</latexit>

w = w + f(x, y⇤)� f(x, y)<latexit sha1_base64="SHhE6uWGl/Q8vXXxUKr5184hkg8=">AAACM3icbVDLSsNAFJ3UV42vqEsXDhah9VESFXQjFN24rGAf0NYymU7aoZMHMxNrCF36NW7tx4g7cesnCE7aLGzrgQuHc+7l3nvsgFEhTfNdyywsLi2vZFf1tfWNzS1je6cq/JBjUsE+83ndRoIw6pGKpJKResAJcm1Ganb/NvFrT4QL6nsPMgpIy0VdjzoUI6mktrE/gNdwAI+hk38+iR6PCvB0TJs9JONoWGgbObNojgHniZWSHEhRbhs/zY6PQ5d4EjMkRMMyA9mKEZcUMzLUm6EgAcJ91CUNRT3kEtGKx48M4aFSOtDxuSpPwrH6dyJGrhCRa6tOF8memPUS8T+vEUrnqhVTLwgl8fBkkRMyKH2YpAI7lBMsWaQIwpyqWyHuIY6wVNnpU2s6Irlt6pE46Drq6KGu6yovazadeVI9K1rnRfP+Ile6SZPLgj1wAPLAApegBO5AGVQABi/gFbyBkTbSPrRP7WvSmtHSmV0wBe37Fy2gqAc=</latexit>

‣ StructuredPerceptronUpdate:

‣ ComparetogradientofCRF:

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Viterbi Algorithm

Page 110: Lecture 5: Sequence Models II

StructuredPerceptron

argmaxy2Yw>f(x, y)y =

<latexit sha1_base64="lZVohhKf8gIklCebjvLxzG2Hzk8=">AAACG3icbVDLSsNAFJ3UV42vqks3g0VwVRIVdCMU3bisYB/ShjKZTNqhM5MwMxFCyFe4tV/jTty68GMEJ20WtvXChcM593LPPX7MqNKO821V1tY3Nreq2/bO7t7+Qe3wqKOiRGLSxhGLZM9HijAqSFtTzUgvlgRxn5GuP7kv9O4LkYpG4kmnMfE4GgkaUoy0oZ4HY6SzNIe3w1rdaTizgqvALUEdlNUa1n4GQYQTToTGDCnVd51YexmSmmJGcnuQKBIjPEEj0jdQIE6Ul80M5/DMMAEMI2laaDhj/25kiCuVct9McqTHalkryP+0fqLDGy+jIk40EXh+KEwY1BEsvocBlQRrlhqAsKTGK8RjJBHWJiN74UygCm8Lj2TxKDSmc9u2TV7ucjqroHPRcC8bzuNVvXlXJlcFJ+AUnAMXXIMmeAAt0AYYcPAK3sDUmlrv1of1OR+tWOXOMVgo6+sXnA2hXQ==</latexit>

w = w + f(x, y⇤)� f(x, y)<latexit sha1_base64="SHhE6uWGl/Q8vXXxUKr5184hkg8=">AAACM3icbVDLSsNAFJ3UV42vqEsXDhah9VESFXQjFN24rGAf0NYymU7aoZMHMxNrCF36NW7tx4g7cesnCE7aLGzrgQuHc+7l3nvsgFEhTfNdyywsLi2vZFf1tfWNzS1je6cq/JBjUsE+83ndRoIw6pGKpJKResAJcm1Ganb/NvFrT4QL6nsPMgpIy0VdjzoUI6mktrE/gNdwAI+hk38+iR6PCvB0TJs9JONoWGgbObNojgHniZWSHEhRbhs/zY6PQ5d4EjMkRMMyA9mKEZcUMzLUm6EgAcJ91CUNRT3kEtGKx48M4aFSOtDxuSpPwrH6dyJGrhCRa6tOF8memPUS8T+vEUrnqhVTLwgl8fBkkRMyKH2YpAI7lBMsWaQIwpyqWyHuIY6wVNnpU2s6Irlt6pE46Drq6KGu6yovazadeVI9K1rnRfP+Ile6SZPLgj1wAPLAApegBO5AGVQABi/gFbyBkTbSPrRP7WvSmtHSmV0wBe37Fy2gqAc=</latexit>

‣ StructuredPerceptronUpdate:

‣ ComparetogradientofCRF:

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Viterbi Algorithm

Replaces ExpectationWith argmax

Page 111: Lecture 5: Sequence Models II

NER

Page 112: Lecture 5: Sequence Models II

NER

Page 113: Lecture 5: Sequence Models II

NER

‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem

Page 114: Lecture 5: Sequence Models II

NER

‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem

‣ OtherpiecesofinformaBonthatmanysystemscapture

Page 115: Lecture 5: Sequence Models II

NER

‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem

‣ OtherpiecesofinformaBonthatmanysystemscapture

‣Worldknowledge:

Page 116: Lecture 5: Sequence Models II

NER

‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem

‣ OtherpiecesofinformaBonthatmanysystemscapture

‣Worldknowledge:

ThedelegaBonmetthepresidentattheairport,Tanjugsaid.

Page 117: Lecture 5: Sequence Models II

NER

‣ CRFwithlexicalfeaturescangetaround85F1onthisproblem

‣ OtherpiecesofinformaBonthatmanysystemscapture

‣Worldknowledge:

ThedelegaBonmetthepresidentattheairport,Tanjugsaid.

Page 118: Lecture 5: Sequence Models II

NonlocalFeatures

ThedelegaBonmetthepresidentattheairport,Tanjugsaid.

FinkelandManning(2008),RaBnovandRoth(2009)

Page 119: Lecture 5: Sequence Models II

ORG?PER?

NonlocalFeatures

ThedelegaBonmetthepresidentattheairport,Tanjugsaid.

FinkelandManning(2008),RaBnovandRoth(2009)

Page 120: Lecture 5: Sequence Models II

ORG?PER?

NonlocalFeatures

ThedelegaBonmetthepresidentattheairport,Tanjugsaid.

ThenewsagencyTanjugreportedontheoutcomeofthemeeBng.

FinkelandManning(2008),RaBnovandRoth(2009)

Page 121: Lecture 5: Sequence Models II

ORG?PER?

NonlocalFeatures

ThedelegaBonmetthepresidentattheairport,Tanjugsaid.

ThenewsagencyTanjugreportedontheoutcomeofthemeeBng.

FinkelandManning(2008),RaBnovandRoth(2009)

Page 122: Lecture 5: Sequence Models II

ORG?PER?

NonlocalFeatures

ThedelegaBonmetthepresidentattheairport,Tanjugsaid.

ThenewsagencyTanjugreportedontheoutcomeofthemeeBng.

‣Morecomplexfactorgraphstructurescanletyoucapturethis,orjustdecodesentencesinorderandusefeaturesonprevioussentences

FinkelandManning(2008),RaBnovandRoth(2009)

Page 123: Lecture 5: Sequence Models II

Semi-MarkovModels

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

SarawagiandCohen(2004)

Page 124: Lecture 5: Sequence Models II

Semi-MarkovModels

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

‣ Chunk-levelpredicBonratherthantoken-levelBIO

SarawagiandCohen(2004)

Page 125: Lecture 5: Sequence Models II

Semi-MarkovModels

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

‣ Chunk-levelpredicBonratherthantoken-levelBIO

{ { { { { {

PER O LOC ORG OO

SarawagiandCohen(2004)

Page 126: Lecture 5: Sequence Models II

Semi-MarkovModels

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

‣ Chunk-levelpredicBonratherthantoken-levelBIO

‣ yisasetoftouchingspansofthesentence

{ { { { { {

PER O LOC ORG OO

SarawagiandCohen(2004)

Page 127: Lecture 5: Sequence Models II

Semi-MarkovModels

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

‣ Chunk-levelpredicBonratherthantoken-levelBIO

‣ yisasetoftouchingspansofthesentence

{ { { { { {

PER O LOC ORG OO

‣ Pros:featurescanlookatwholespanatonce

SarawagiandCohen(2004)

Page 128: Lecture 5: Sequence Models II

Semi-MarkovModels

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

‣ Chunk-levelpredicBonratherthantoken-levelBIO

‣ yisasetoftouchingspansofthesentence

‣ Cons:there’sanextrafactorofninthedynamicprograms

{ { { { { {

PER O LOC ORG OO

‣ Pros:featurescanlookatwholespanatonce

SarawagiandCohen(2004)

Page 129: Lecture 5: Sequence Models II

EvaluaBngNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

Page 130: Lecture 5: Sequence Models II

EvaluaBngNER

‣ PredicBonofallOssBllgets66%accuracyonthisexample!

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

Page 131: Lecture 5: Sequence Models II

EvaluaBngNER

‣ PredicBonofallOssBllgets66%accuracyonthisexample!

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣Whatwereallywanttoknow:howmanynamedenBtychunkpredicBonsdidwegetright?

Page 132: Lecture 5: Sequence Models II

EvaluaBngNER

‣ PredicBonofallOssBllgets66%accuracyonthisexample!

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣Whatwereallywanttoknow:howmanynamedenBtychunkpredicBonsdidwegetright?

‣ Precision:oftheoneswepredicted,howmanyareright?

Page 133: Lecture 5: Sequence Models II

EvaluaBngNER

‣ PredicBonofallOssBllgets66%accuracyonthisexample!

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣Whatwereallywanttoknow:howmanynamedenBtychunkpredicBonsdidwegetright?

‣ Precision:oftheoneswepredicted,howmanyareright?

‣ Recall:ofthegoldnamedenBBes,howmanydidwefind?

Page 134: Lecture 5: Sequence Models II

EvaluaBngNER

‣ PredicBonofallOssBllgets66%accuracyonthisexample!

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣Whatwereallywanttoknow:howmanynamedenBtychunkpredicBonsdidwegetright?

‣ Precision:oftheoneswepredicted,howmanyareright?

‣ Recall:ofthegoldnamedenBBes,howmanydidwefind?

‣ F-measure:harmonicmeanofthesetwo

Page 135: Lecture 5: Sequence Models II

HowwelldoNERsystemsdo?

RaBnovandRoth(2009)

Page 136: Lecture 5: Sequence Models II

HowwelldoNERsystemsdo?

RaBnovandRoth(2009)

Lampleetal.(2016)

Page 137: Lecture 5: Sequence Models II

HowwelldoNERsystemsdo?

RaBnovandRoth(2009)

Lampleetal.(2016)

BiLSTM-CRF+ELMo Petersetal.(2018)

92.2

Page 138: Lecture 5: Sequence Models II

BeamSearch

Page 139: Lecture 5: Sequence Models II

ViterbiTimeComplexity

Fedraisesinterestrates0.5percent

VBDVBNNNP

VBZNNS

VBVBPNN

VBZNNS CD NN

Page 140: Lecture 5: Sequence Models II

ViterbiTimeComplexity

Fedraisesinterestrates0.5percent

VBDVBNNNP

VBZNNS

VBVBPNN

VBZNNS CD NN

‣ nwordsentence,stagstoconsider—whatistheBmecomplexity?

Page 141: Lecture 5: Sequence Models II

ViterbiTimeComplexity

Fedraisesinterestrates0.5percent

VBDVBNNNP

VBZNNS

VBVBPNN

VBZNNS CD NN

‣ nwordsentence,stagstoconsider—whatistheBmecomplexity?

tags

sentence

Page 142: Lecture 5: Sequence Models II

ViterbiTimeComplexity

Fedraisesinterestrates0.5percent

VBDVBNNNP

VBZNNS

VBVBPNN

VBZNNS CD NN

‣ nwordsentence,stagstoconsider—whatistheBmecomplexity?

tags

sentence

‣ O(ns2)—sis~40forPOS,nis~20

Page 143: Lecture 5: Sequence Models II

ViterbiTimeComplexity

Fedraisesinterestrates0.5percent

VBDVBNNNP

VBZNNS

VBVBPNN

VBZNNS CD NN

Page 144: Lecture 5: Sequence Models II

ViterbiTimeComplexity

‣Manytagsaretotallyimplausible

Fedraisesinterestrates0.5percent

VBDVBNNNP

VBZNNS

VBVBPNN

VBZNNS CD NN

Page 145: Lecture 5: Sequence Models II

ViterbiTimeComplexity

‣Manytagsaretotallyimplausible

‣ Cananyofthesebe:‣ Determiners?‣ PreposiBons?‣ AdjecBves?

Fedraisesinterestrates0.5percent

VBDVBNNNP

VBZNNS

VBVBPNN

VBZNNS CD NN

Page 146: Lecture 5: Sequence Models II

ViterbiTimeComplexity

‣Manytagsaretotallyimplausible

‣ Cananyofthesebe:‣ Determiners?‣ PreposiBons?‣ AdjecBves?‣ FeaturesquicklyeliminatemanyoutcomesfromconsideraBon—don’tneedtoconsiderthesegoingforward

Fedraisesinterestrates0.5percent

VBDVBNNNP

VBZNNS

VBVBPNN

VBZNNS CD NN

Page 147: Lecture 5: Sequence Models II

BeamSearch

Page 148: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

Page 149: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Page 150: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed raises

Page 151: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed raises

Page 152: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

NNP

raises

+0.9

Page 153: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

NNP

raises

+1.2

+0.9

Page 154: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

Page 155: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

Notexpanded

Page 156: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

Notexpanded

Page 157: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ -2.0

Notexpanded

Page 158: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ -2.0

Notexpanded

Page 159: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ -2.0

Notexpanded

Page 160: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ -2.0

Notexpanded

VBZ -2.0

Page 161: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

NNS -1.0

-2.0

Page 162: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

NNS -1.0

-2.0

Page 163: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ +1.2

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

NNS -1.0

-2.0

Page 164: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ +1.2

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

NNS

+1.2

-1.0

-2.0

Page 165: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ +1.2

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

NNS

+1.2

-1.0…

-2.0

Page 166: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ +1.2

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

DT

NNS

+1.2

-1.0

-5.3

…PRP -5.8

Notexpanded

-2.0

Page 167: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ +1.2

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

DT

NNS

+1.2

-1.0

-5.3

…PRP -5.8

Notexpanded

-2.0 ‣Maintainpriorityqueuetoefficientlyaddthings

Page 168: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ +1.2

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

DT

NNS

+1.2

-1.0

-5.3

…PRP -5.8

Notexpanded

‣ Beamsizeofk,Bmecomplexity

-2.0 ‣Maintainpriorityqueuetoefficientlyaddthings

Page 169: Lecture 5: Sequence Models II

BeamSearch‣MaintainabeamofkplausiblestatesatthecurrentBmestep

‣ Expandallstates,onlykeepktophypothesesatnewBmestep

Fed

VBD

VBN

NNP

raises

+1.2

+0.9

+0.7NN +0.3

VBZ +1.2

VBZ -2.0NNS -1.0

Notexpanded

… VBZ

DT

NNS

+1.2

-1.0

-5.3

…PRP -5.8

Notexpanded

‣ Beamsizeofk,Bmecomplexity

-2.0

O(nkslog(ks))

‣Maintainpriorityqueuetoefficientlyaddthings

Page 170: Lecture 5: Sequence Models II

Howgoodisbeamsearch?

Page 171: Lecture 5: Sequence Models II

Howgoodisbeamsearch?‣ k=1:greedysearch

Page 172: Lecture 5: Sequence Models II

Howgoodisbeamsearch?‣ k=1:greedysearch

‣ Choosingbeamsize:

Page 173: Lecture 5: Sequence Models II

Howgoodisbeamsearch?‣ k=1:greedysearch

‣ Choosingbeamsize:

‣ 2isusuallybe7erthan1

Page 174: Lecture 5: Sequence Models II

Howgoodisbeamsearch?‣ k=1:greedysearch

‣ Choosingbeamsize:

‣ 2isusuallybe7erthan1

‣ Usuallydon’tuselargerthan50

Page 175: Lecture 5: Sequence Models II

Howgoodisbeamsearch?‣ k=1:greedysearch

‣ Choosingbeamsize:

‣ 2isusuallybe7erthan1

‣ Usuallydon’tuselargerthan50

‣ Dependsonproblemstructure

Page 176: Lecture 5: Sequence Models II

Howgoodisbeamsearch?‣ k=1:greedysearch

‣ Choosingbeamsize:

‣ 2isusuallybe7erthan1

‣ Usuallydon’tuselargerthan50

‣ Dependsonproblemstructure

‣ IfbeamsearchismuchfasterthancompuBngfullsums,canusestructuredperceptronSVMinsteadofCRFs‣ VerysimilartostructuredSVM