Top Banner
CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs
46

CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Mar 06, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

CS388:NaturalLanguageProcessing

GregDurre8

Lecture5:NamedEn=tyRecogni=on,CRFs

Page 2: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Administrivia

‣ Project1duenextThursday

‣Mini1gradingunderway

Page 3: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Recall:HMMs

‣ Inferenceproblem:

‣ Viterbi:

y1 y2 yn

x1 x2 xn

… P (y,x) = P (y1)nY

i=2

P (yi|yi�1)nY

i=1

P (xi|yi)

argmaxyP (y|x) = argmaxyP (y,x)

P (x)

‣ Training:maximumlikelihoodes=ma=on(count+normalize)

scorei(s) = maxyi�1

P (s|yi�1)P (xi|s)scorei�1(yi�1)

y = (y1, ..., yn)Output‣ Inputx = (x1, ..., xn)

Page 4: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Recall:ViterbiAlgorithm

slidecredit:DanKlein

‣ Computescoresfornext=mestep(scoreofop=maltagsequenceendingwithtagiat=mestept)

Page 5: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Viterbi/HMMs:OtherResources

‣ Lecturenotesfrommyundergradcourse(postedonline)

‣ EisensteinChapter7.3butthenota=oncoversamoregeneralcasethanwhat’sdiscussedforHMMs

‣ Jurafsky+Mar=n8.4.5

‣WeignoretheSTOPtokenhere.It’snotinthetagsetandjustdon’tusetheseprobabili=es

Page 6: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

ThisLecture

‣ Condi=onalrandomfields

‣ Next=me:finishupNERsystems

‣ FeaturesforNER

‣ InferenceandLearninginCRFs

Page 7: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

NamedEn=tyRecogni=on

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

PERSON LOC ORG

B-PER I-PER O O O B-LOC B-ORGO O O O O

‣ BIOtagset:begin,inside,outside

‣WhymightanHMMnotdosowellhere?

‣ LotsofO’s

‣ Sequenceoftags—shouldweuseanHMM?

‣ Insufficientfeatures/capacitywithmul=nomials(especiallyforunks)

Page 8: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

HMMsProsandCons

‣ Condi=onalrandomfields:logis=cregression+featuresonpairsofy’s

‣ Bigadvantage:transi=ons,scoringpairsofadjacenty’s

y1 y2 yn

x1 x2 xn

‣ Bigdownside:notabletoincorporateusefulwordcontextinforma=on

‣ Solu=on:switchfromgenera=vetodiscrimina=vemodel(condi=onalrandomfields)sowecancondi=onontheen=reinput.

Page 9: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Condi=onalRandomFields

Page 10: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Condi=onalRandomFields

‣ Flexiblediscrimina=vemodelfortaggingtasksthatcanusearbitraryfeaturesoftheinput.Similartologis=cregression,butstructured

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

B-PER I-PER

Curr_word=Barack&Label=B-PERNext_word=Obama&Label=B-PERCurr_word_starts_with_capital=True&Label=B-PERPosn_in_sentence=1st&Label=B-PERLabel=B-PER&Next-Label=I-PER…

Page 11: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

TaggingwithLogis=cRegression‣ Logis=cregressionovereachtagindividually:

P (yi = y|x, i) = exp(w>f(y, i,x))Py02Y exp(w>f(y0, i,x))

<latexit sha1_base64="dDjgKeKxIN481MkDmhKPLvs3x3A=">AAADgnicrVJdb9MwFHWTASN8dfDIi0VVrUVVSTak8UClCV54LBLdhpouchxns+Y4ke2MWcb/g9/FG78G3DZja4vghStZOjn3nptz7ZtWjEoVhj9anr915+697fvBg4ePHj9p7zw9kmUtMJngkpXiJEWSMMrJRFHFyEklCCpSRo7Ti/fz/PElEZKW/JPSFZkV6IzTnGKkHJXstL6NezqhI/01LpA6T3NzZQe0D0cwzgXCJiZXVe869cWexqqs4PV3bnt6QAc3yn7fmljWRWL0LowpX1ZixMxna+E/e+2uNwu6498CbUexoiwj5oaxm64rUWaJoaPInnK4nK2RaZvQ1fqguzrmwnkj/YvPW/1WDf/P4W3S7oTDcBFwE0QN6IAmxkn7e5yVuC4IV5ghKadRWKmZQUJRzIgN4lqSCuELdEamDnJUEDkzixWysOuYDOalcIcruGBvKwwqpNRF6irnHuV6bk7+KTetVf5mZiivakU4Xv4orxlUJZzvI8yoIFgx7QDCgjqvEJ8j9ybKbW3gLiFaH3kTHO0No/3h3sfXncN3zXVsg+fgBeiBCByAQ/ABjMEE4NZPr+sNvVf+lv/Sj/z9ZanXajTPwEr4b38Bi9IlZA==</latexit>

“differentfeatures”approachtofeaturesforasingletag

Probabilityoftheithwordgemngassignedtagy(B-PER,etc.)

Page 12: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

TaggingwithLogis=cRegression

‣ SetZequaltotheproductofdenominators;we’lldiscussthisinafewslides

‣ Logis=cregressionovereachtagindividually:

P (yi = y|x, i) = exp(w>f(y, i,x))Py02Y exp(w>f(y0, i,x))

<latexit sha1_base64="dDjgKeKxIN481MkDmhKPLvs3x3A=">AAADgnicrVJdb9MwFHWTASN8dfDIi0VVrUVVSTak8UClCV54LBLdhpouchxns+Y4ke2MWcb/g9/FG78G3DZja4vghStZOjn3nptz7ZtWjEoVhj9anr915+697fvBg4ePHj9p7zw9kmUtMJngkpXiJEWSMMrJRFHFyEklCCpSRo7Ti/fz/PElEZKW/JPSFZkV6IzTnGKkHJXstL6NezqhI/01LpA6T3NzZQe0D0cwzgXCJiZXVe869cWexqqs4PV3bnt6QAc3yn7fmljWRWL0LowpX1ZixMxna+E/e+2uNwu6498CbUexoiwj5oaxm64rUWaJoaPInnK4nK2RaZvQ1fqguzrmwnkj/YvPW/1WDf/P4W3S7oTDcBFwE0QN6IAmxkn7e5yVuC4IV5ghKadRWKmZQUJRzIgN4lqSCuELdEamDnJUEDkzixWysOuYDOalcIcruGBvKwwqpNRF6irnHuV6bk7+KTetVf5mZiivakU4Xv4orxlUJZzvI8yoIFgx7QDCgjqvEJ8j9ybKbW3gLiFaH3kTHO0No/3h3sfXncN3zXVsg+fgBeiBCByAQ/ABjMEE4NZPr+sNvVf+lv/Sj/z9ZanXajTPwEr4b38Bi9IlZA==</latexit>

=1

Zexp

nX

i=1

w>f(yi, i,x)

!

<latexit sha1_base64="9mWNFhX2lavGP1rdwRTHCIBRYmE=">AAADR3ichVJNj9MwEHWyfCzhqwtHLhbValOpqpoFCS6VVnDhWCS6u1BnI8dxWmudD9kObGT877hw5cZf4MIBhDjiNql22yIYKdLzm3mTN6OJS86kGg6/Ou7Otes3bu7e8m7fuXvvfmfvwbEsKkHohBS8EKcxlpSznE4UU5yeloLiLOb0JD5/ucifvKdCsiJ/o+qShhme5SxlBCtLRXtOuD/264iN6o8ow2oep/rC9FkPjiBKBSYa0YvSX6U+mDOkihKu3qnx6z7rXyp7PaORrLJI1wcQsbypJJjrt8bA//Y62GzmWXcrojYjpBhPqL5kzLbrUhRJpNkoMGc5bGZrZbWJ2Hq9t5oyMPpd4w9xmiq/maFt8g/HVzqvW0eCzeaqF3W6w8FwGXAbBC3ogjbGUecLSgpSZTRXhGMpp8GwVKHGQjHCqfFQJWmJyTme0amFOc6oDPXyDgzct0wC00LYL1dwyV5VaJxJWWexrVw4lZu5Bfm33LRS6fNQs7ysFM1J86O04lAVcHFUMGGCEsVrCzARzHqFZI7tYpU9Pc8uIdgceRscHw6CJ4PD10+7Ry/adeyCR+Ax8EEAnoEj8AqMwQQQ55Pzzfnh/HQ/u9/dX+7vptR1Ws1DsBY7zh/X7xNL</latexit>

‣ Scoreofapredic=on:sumofweightsdotfeaturesovereachindividualpredictedtag(thisisasimpleCRFbutnotthegeneralform)

“differentfeatures”approachtofeaturesforasingletag

P (y = y|x) =nY

i=1

P (yi = yi|x, i)<latexit sha1_base64="u6eeugJqXQbeP8PnEClvMeO7rAk=">AAAE83ic1VTLbhMxFHU7A7Th0RSWbCyiqIkaVZmCBJtIFWxYBom0pXE68jiexKrnIdsDHRn/BhsWIMSWn2HH3+BkJuRFeYgVVxrpzH0cn2PZDlLOpGq3v21sOu616ze2tis3b92+s1PdvXssk0wQ2iMJT8RpgCXlLKY9xRSnp6mgOAo4PQkunk3qJ6+pkCyJX6o8pYMIj2IWMoKVTfm7zna928h91snfogircRDqS9NiTdiBKBSYaEQv08as9MacI5WkcPYfmkbeYq35ZLNpNJJZ5Ot8DyIWF50Ec/3KGPhbrr1Vskr3R39uOkgxPqR6njELoqeSU5EMfc06njmPYWGsHMqNz5YtVuozj57RZ4U6xGmoGoWDkuUXeheol4UjwUZjZVf4S/n/osanV+vZh39KopZI5lizfc9cYfJsIn3Kv+7wPzLiV2vtg/Y04DrwSlADZXT96lc0TEgW0VgRjqXse+1UDTQWihFOTQVlkqaYXOAR7VsY44jKgZ7eWQPrNjOEYSLsFys4zS5OaBxJmUeB7Zwolau1SfJntX6mwicDzeI0UzQmxUJhxqFK4OQBgEMmKFE8twATwaxWSMbYHjxln4mK3QRv1fI6OD488B4eHL54VDt6Wm7HFrgPHoAG8MBjcASegy7oAeKkzjvng/PRzdz37if3c9G6uVHO3ANL4X75DrNZtXc=</latexit>

‣ Overalltags:

‣ Condi=onalmodel:xisobserved,yisn’t

Page 13: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Example:“EmissionFeatures”fe

BarackObamawilltravelB-PERI-PEROO

BarackObamawilltravelB-PERB-PEROO

feats=fe(B-PER,i=1,x)+fe(I-PER,i=2,x)+fe(O,i=3,x)+fe(O,i=4,x)

feats=fe(B-PER,i=1,x)+fe(B-PER,i=2,x)+fe(O,i=3,x)+fe(O,i=4,x)

[CurrWord=Obama&label=I-PER,PrevWord=Barack&label=I-PER,CurrWordIsCapitalized&label=I-PER,…]

Page 14: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

AddingStructure

‣Wewanttobeabletolearnthatsometagsdon’tfollowothertags—wanttohavefeaturesontagpairs

=1

Zexp

nX

i=1

w>f(yi, i,x)

!

<latexit sha1_base64="9mWNFhX2lavGP1rdwRTHCIBRYmE=">AAADR3ichVJNj9MwEHWyfCzhqwtHLhbValOpqpoFCS6VVnDhWCS6u1BnI8dxWmudD9kObGT877hw5cZf4MIBhDjiNql22yIYKdLzm3mTN6OJS86kGg6/Ou7Otes3bu7e8m7fuXvvfmfvwbEsKkHohBS8EKcxlpSznE4UU5yeloLiLOb0JD5/ucifvKdCsiJ/o+qShhme5SxlBCtLRXtOuD/264iN6o8ow2oep/rC9FkPjiBKBSYa0YvSX6U+mDOkihKu3qnx6z7rXyp7PaORrLJI1wcQsbypJJjrt8bA//Y62GzmWXcrojYjpBhPqL5kzLbrUhRJpNkoMGc5bGZrZbWJ2Hq9t5oyMPpd4w9xmiq/maFt8g/HVzqvW0eCzeaqF3W6w8FwGXAbBC3ogjbGUecLSgpSZTRXhGMpp8GwVKHGQjHCqfFQJWmJyTme0amFOc6oDPXyDgzct0wC00LYL1dwyV5VaJxJWWexrVw4lZu5Bfm33LRS6fNQs7ysFM1J86O04lAVcHFUMGGCEsVrCzARzHqFZI7tYpU9Pc8uIdgceRscHw6CJ4PD10+7Ry/adeyCR+Ax8EEAnoEj8AqMwQQQ55Pzzfnh/HQ/u9/dX+7vptR1Ws1DsBY7zh/X7xNL</latexit>

‣ Score:sumofweightsdotfefeaturesovereachpredictedtag(“emissions”)plussumofweightsdotftfeaturesovertagpairs(“transi=ons”)

P (y = y|x) =nY

i=1

P (yi = yi|x, i)<latexit sha1_base64="u6eeugJqXQbeP8PnEClvMeO7rAk=">AAAE83ic1VTLbhMxFHU7A7Th0RSWbCyiqIkaVZmCBJtIFWxYBom0pXE68jiexKrnIdsDHRn/BhsWIMSWn2HH3+BkJuRFeYgVVxrpzH0cn2PZDlLOpGq3v21sOu616ze2tis3b92+s1PdvXssk0wQ2iMJT8RpgCXlLKY9xRSnp6mgOAo4PQkunk3qJ6+pkCyJX6o8pYMIj2IWMoKVTfm7zna928h91snfogircRDqS9NiTdiBKBSYaEQv08as9MacI5WkcPYfmkbeYq35ZLNpNJJZ5Ot8DyIWF50Ec/3KGPhbrr1Vskr3R39uOkgxPqR6njELoqeSU5EMfc06njmPYWGsHMqNz5YtVuozj57RZ4U6xGmoGoWDkuUXeheol4UjwUZjZVf4S/n/osanV+vZh39KopZI5lizfc9cYfJsIn3Kv+7wPzLiV2vtg/Y04DrwSlADZXT96lc0TEgW0VgRjqXse+1UDTQWihFOTQVlkqaYXOAR7VsY44jKgZ7eWQPrNjOEYSLsFys4zS5OaBxJmUeB7Zwolau1SfJntX6mwicDzeI0UzQmxUJhxqFK4OQBgEMmKFE8twATwaxWSMbYHjxln4mK3QRv1fI6OD488B4eHL54VDt6Wm7HFrgPHoAG8MBjcASegy7oAeKkzjvng/PRzdz37if3c9G6uVHO3ANL4X75DrNZtXc=</latexit>

‣ Thisisasequen=alCRF

P (y = y|x) = 1

Zexp

nX

i=1

w>fe(yi, i,x) +nX

i=2

w>ft(yi�1, yi, i,x)

!

<latexit sha1_base64="lyRXhBWi93dh2TsKkX2Efg4P9mc=">AAAC0HicfVJNj9MwEHXC11K+Chy5WFRIXVGqpCDBpdIKLlyQCqK7K+o2clyntdZxInsCjbwW4srP48Yv4G/gdlPo7iJGsvTmjefNeMZpKYWBKPoZhFeuXrt+Y+9m69btO3fvte8/ODRFpRkfs0IW+jilhkuh+BgESH5cak7zVPKj9OTNOn70mWsjCvUR6pJPc7pQIhOMgqeS9q9Rl+QUlmlmazckIOSc27+MO93ildvHQ0wyTZmNnf3kMOGrkkieQZeYKk+sGMZupvA24YubESjKP37mEt5tCtQuET3R29F+irciAzezyv1PBnZkrHgWu96u784LEy0WS9hP2p2oH20MXwZxAzqosVHS/kHmBatyroBJaswkjkqYWqpBMMldi1SGl5Sd0AWfeKhozs3Ubhbi8BPPzHFWaH8U4A27m2Fpbkydp/7mulNzMbYm/xWbVJC9mlqhygq4YmeFskpiKPB6u3guNGcgaw8o08L3itmS+p2B/wMtP4T44pMvg8NBP37eH7x/0Tl43YxjDz1Cj1EXxeglOkBv0QiNEQveBSY4DVz4IVyFX8NvZ1fDoMl5iM5Z+P03x2PmcQ==</latexit>

Page 15: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Example

BarackObamawilltravelB-PERI-PEROO

BarackObamawilltravelB-PERB-PEROO

feats=fe(B-PER,i=1,x)+fe(I-PER,i=2,x)+fe(O,i=3,x)+fe(O,i=4,x)+ft(B-PER,I-PER,i=1,x)+ft(I-PER,O,i=2,x)+ft(O,O,i=3,x)

feats=fe(B-PER,i=1,x)+fe(B-PER,i=2,x)+fe(O,i=3,x)+fe(O,i=4,x)+ft(B-PER,B-PER,i=1,x)+ft(B-PER,O,i=2,x)+ft(O,O,i=3,x)

‣ Obamacanstartanewnameden=ty(emissionfeatslookokay),butwe’renotlikelytohavetwoPERen==esinarow(transi=onfeats)

Page 16: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Sequen=alCRFs

‣ Cri=calproperty:thisstructureisgoingtoallowustousedynamicprogramming(Viterbi)tosumormaxoverallsequences

P (y = y|x) = 1

Zexp

nX

i=1

w>fe(yi, i,x) +nX

i=2

w>ft(yi�1, yi, i,x)

!

<latexit sha1_base64="lyRXhBWi93dh2TsKkX2Efg4P9mc=">AAAC0HicfVJNj9MwEHXC11K+Chy5WFRIXVGqpCDBpdIKLlyQCqK7K+o2clyntdZxInsCjbwW4srP48Yv4G/gdlPo7iJGsvTmjefNeMZpKYWBKPoZhFeuXrt+Y+9m69btO3fvte8/ODRFpRkfs0IW+jilhkuh+BgESH5cak7zVPKj9OTNOn70mWsjCvUR6pJPc7pQIhOMgqeS9q9Rl+QUlmlmazckIOSc27+MO93ildvHQ0wyTZmNnf3kMOGrkkieQZeYKk+sGMZupvA24YubESjKP37mEt5tCtQuET3R29F+irciAzezyv1PBnZkrHgWu96u784LEy0WS9hP2p2oH20MXwZxAzqosVHS/kHmBatyroBJaswkjkqYWqpBMMldi1SGl5Sd0AWfeKhozs3Ubhbi8BPPzHFWaH8U4A27m2Fpbkydp/7mulNzMbYm/xWbVJC9mlqhygq4YmeFskpiKPB6u3guNGcgaw8o08L3itmS+p2B/wMtP4T44pMvg8NBP37eH7x/0Tl43YxjDz1Cj1EXxeglOkBv0QiNEQveBSY4DVz4IVyFX8NvZ1fDoMl5iM5Z+P03x2PmcQ==</latexit>

‣ HowdoesthiscomparetoHMMs?

Page 17: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

HMMsvs.CRFs

y1 y2 yn

x1 x2 xn

‣ Bothmodelsareexpressibleindifferentfactorgraphnota=on

P (y,x) = P (y1)P (x1|y1)P (y2|y1)P (x2|y2) . . .

y1 y2 yn…

�t

�e

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Phisare“poten=als”,usedinthegeneralCRFformula=on

P (y,x) = P (y1)nY

i=2

P (yi|yi�1)nY

i=1

P (xi|yi)<latexit sha1_base64="DwFPL8cTe3GQA7nLvAL1TZ0b6wg=">AAADkXicbVJbb9MwFHYTLiPA6LZHXiyqilSUqukmAQ9FFbwg8VIkug7qLnJcp7WWm2xna5T69/B/eOPf4DZZlXY9kqXP5/t8bj5eEjAhu91/NcN89PjJ06Nn1vMXL49f1U9OL0WcckJHJA5ifuVhQQMW0ZFkMqBXCac49AI69m6+rvnxLeWCxdFPmSV0GuJ5xHxGsNQu96T2pzm0sxUKsVx4fr5ULdiHyOeY5IguE/ueuFNudo1knEDfrohbKkciDd08ewsRi+CGIjjIfykF9wNokToYw2pWVOcHJFog6VLyMBexL0O8VPZ4N4al29ho9D1Tq3tY7cdR+e+iKHhX5EAB9eUEFh2wfk9dR9B3pZ3p23tHtTOXteC7Le8UPNU8a7N2JT3ibL6QU2u4rSlT7d2Z6im7TguihMezbbK1k8EVLBNWaKeglyXNWm690e10NwYfAqcEDVDa0K3/RbOYpCGNJAmwEBOnm8hpjrlkJKDKQqmgCSY3eE4nGkY4pGKabzZKwab2zKAfc30iCTfe6osch0JkoaeV6y7FPrd2HuImqfQ/TnMWJamkESkS+WkAZQzX6wlnjFMig0wDTDjTtUKywPr7pF5iSw/B2W/5IbjsdZzzTu/HRWPwpRzHEXgN3gAbOOADGIBvYAhGgBjHxoXRNz6bZ+Ync2CWWqNWvjkDO2Z+/w+GlSQt</latexit>

CRF

HMM

Page 18: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

HMMsvs.CRFs

‣ HMMs:inthestandardHMM,emissionsconsideronewordata=me

‣ CRFssupportfeaturesovermanywordssimultaneously,non-independentfeatures(e.g.,suffixesandprefixes),notgenera=vemodels

‣ NaiveBayes:logis=cregression::HMMs:CRFslocalvs.globalnormaliza=on<->genera=vevs.discrimina=ve

(locallynormalizeddiscrimina=vemodelsdoexist(MEMMs))

Page 19: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

CRFsinGeneral

anyreal-valuedscoringfunc=onofitsarguments

‣ CRFs:discrimina=vemodelwiththefollowingform:

P (y|x) = 1

Z

Y

k

exp(�k(x,y))

normalizer

‣ Ourspecialcase:linearfeature-basedpoten=als �k(x,y) = w>fk(x,y)

P (y|x) = 1

Zexp

nX

k=1

w>fk(x,y)

!

‣ Problem:intractableinferenceinthegeneralcase!Compu=ngZrequiresanexponentsum

Page 20: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

FeaturesforNER

Page 21: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

BasicFeaturesforNER

BarackObamawilltraveltoHangzhoutodayfortheG20mee=ng.

OB-LOC

Transi=ons:

Emissions: Ind[B-LOC&Currentword=Hangzhou]Ind[B-LOC&Prevword=to]

ft(yi�1, yi) = Ind[yi�1 & yi]

fe(y6, 6,x) =

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

=Ind[O—B-LOC]Ind[yi-1—yi]

Page 22: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

EmissionFeaturesforNER

Leicestershireisaniceplacetovisit…

Itookavaca=ontoBoston

Applereleasedanewversion…

AccordingtotheNewYorkTimes…

ORG

ORG

LOC

LOC

TexasgovernorGregAbboJsaid

LeonardoDiCapriowonanaward…

PER

PER

LOC

�e(yi, i,x)

Page 23: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

EmissionFeaturesforNER

‣ Contextfeatures(can’tuseinHMM!)‣Wordsbefore/auer‣ Tagsbefore/auer

‣Wordfeatures(canuseinHMM)‣ Capitaliza=on‣Wordshape‣ Prefixes/suffixes‣ Lexicalindicators

‣ Gaze8eers‣Wordclusters

Leicestershire

Boston

Applereleasedanewversion…

AccordingtotheNewYorkTimes…

Page 24: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference

‣ Learning

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 25: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

InferenceandLearninginCRFs

Page 26: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Compu=ng(arg)maxes

y1 y2 yn…

�e

�t

‣ :canuseViterbiexactlyasinHMMcase

‣ andplaytheroleofthePsnow,usetheexactsameViterbidynamicprogramexp(�t(yi�1, yi)) exp(�e(yi, i,x))

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

argmaxyP (y|x)

maxy1,...,yn

e�t(yn�1,yn)e�e(yn,n,x) · · · e�e(y2,2,x)e�t(y1,y2)e�e(y1,1,x)

Page 27: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

InferenceinGeneralCRFs

y1 y2 yn…

�e

�t

‣ Candoefficientinferenceinanytree-structuredCRF

‣Max-productalgorithm:generaliza=onofViterbitoarbitrarytree-structuredgraphs(sum-productisgeneraliza=onofforward-backward)

Page 28: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference:argmaxP(y|x)fromViterbi

‣ Learning

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 29: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

TrainingCRFs

‣ Gradientisanalogoustologis=cregression:goldfeats—expectedfeats

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

P (y|x) / expw>f(x, y)‣ Logis=cregression:

‣ ForCRFs:maximize L(y⇤,x) = logP (y⇤|x)

intractable!

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Page 30: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

TrainingCRFs

‣ Let’sfocusonemissionfeatureexpecta=on

@

@wL(y⇤,x) =

nX

i=2

ft(y⇤i�1, y

⇤i ) +

nX

i=1

fe(y⇤i , i,x)

�Ey

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

Ey

"nX

i=1

fe(yi, i,x)

#=

X

y2YP (y|x)

"nX

i=1

fe(yi, i,x)

#=

nX

i=1

X

y2YP (y|x)fe(yi, i,x)

=nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 31: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

TrainingCRFs

=nX

i=1

X

s

P (yi = s|x)fe(s, i,x)sumover=mesteps

sumovertagsfeatsofthattagatthatstep

marginalprobability

Page 32: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Forward-BackwardAlgorithm‣ Howdowecomputethesemarginals?P (yi = s|x)

P (yi = s|x) =X

y1,...,yi�1,yi+1,...,yn

P (y|x)

‣WhatdidViterbicompute? P (ymax|x) = maxy1,...,yn

P (y|x)

‣ Cancomputemarginalswithdynamicprogrammingaswellusingforward-backward

Page 33: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Forward-BackwardAlgorithm

P (y3 = 2|x) =sum of all paths through state 2 at time 3

sum of all paths

Page 34: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Forward-BackwardAlgorithm

slidecredit:DanKlein

P (y3 = 2|x) =sum of all paths through state 2 at time 3

sum of all paths

=

‣ Easiestandmostflexibletodoonepasstocomputeandonetocompute

Page 35: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Forward-BackwardAlgorithm

‣ Ini=al:

‣ Recurrence:

‣ SameasViterbibutsumminginsteadofmaxing!

‣ Thesequan==esgetverysmall!Storeeverythingaslogprobabili=es

↵1(s) = exp(�e(s, 1,x))<latexit sha1_base64="H++BwoHSkR4cFwia4mrESCMTX7M=">AAACE3icbVBNS8NAEN3Ur1q/oh69LBahlVKSKuhFEL14VLC20JSw2U7apZtk2d1IS+h/8OJf8eJBEa9evPlv3NYetPXBwOO9GWbmBYIzpR3ny8otLC4tr+RXC2vrG5tb9vbOnUpSSaFOE57IZkAUcBZDXTPNoSkkkCjg0Aj6l2O/cQ9SsSS+1UMB7Yh0YxYySrSRfPvQI1z0iO+WVBmfYQ8GouSJHvOhpCpuxYuI7gVhNhiVy75ddKrOBHieuFNSRFNc+/an10loGkGsKSdKtVxH6HZGpGaUw6jgpQoEoX3ShZahMYlAtbPJTyN8YJQODhNpKtZ4ov6eyEik1DAKTOf4RjXrjcX/vFaqw9N2xmKRaojpz6Iw5VgneBwQ7jAJVPOhIYRKZm7FtEckodrEWDAhuLMvz5O7WtU9qtZujovnF9M48mgP7aMSctEJOkdX6BrVEUUP6Am9oFfr0Xq23qz3n9acNZ3ZRX9gfXwDGCWcbA==</latexit>

↵t(st) =X

st�1

↵t�1(st�1) exp(�e(st, t,x))<latexit sha1_base64="MsrARqnMVgv4lQtn8MktFrMi2DM=">AAAChHicbVFNT9wwEHVCS+n2g2175GKxWjWR0lUMreilCNFLjyB1AWmzshyvw1o4iWVPEKsov4R/1Rv/pk4IogsdydLze2/G45lUK2khju88f+PFy81XW68Hb96+e789/PDxzJaV4WLKS1Wai5RZoWQhpiBBiQttBMtTJc7Tq5+tfn4tjJVl8RtWWsxzdlnITHIGjqLD23HClF4ySgIb4h84ETc6SPRSUhHYiERJzmCZZvVNE4aD3gqBpdCZbZXT2tIavpCmwb3c3YKeDdcrUohgveb4UYeHpKitH2I6HMWTuAv8HJAejFAfJ3T4J1mUvMpFAVwxa2ck1jCvmQHJlWgGSWWFZvyKXYqZgwXLhZ3X3RAbPHbMAmelcacA3LH/ZtQst3aVp87Z9m+fai35P21WQfZ9XstCVyAKfv9QVikMJW43ghfSCA5q5QDjRrpeMV8ywzi4vQ3cEMjTLz8HZ3sTsj/ZO/06Ojrux7GFdtAuChBBB+gI/UInaIq453mfvdgj/qYf+fv+t3ur7/U5n9Ba+Id/AUKdvsc=</latexit> exp(�t(st�1, st))

<latexit sha1_base64="+aaYS+pf0KkwlV5IpyMknO6c7lg=">AAAChHicbVFNT9wwEHUCpXT7tdBjLxYr1ERKVzG0gksr1F44gtQFpM3KcrwOa+Eklj1BrKL8kv4rbvybOiGoLDCSpef33ozHM6lW0kIc33n+2vqrjdebbwZv373/8HG4tX1my8pwMeGlKs1FyqxQshATkKDEhTaC5akS5+nV71Y/vxbGyrL4A0stZjm7LGQmOQNH0eHf3YQpvWCUBDbEP3AibnSQ6IWkIrARiZKcwSLN6psmDAcPXggshc5tq5zWltbwlTQN7uXuFvRsuFqSQgSrRf/L8JATteVDTIejeBx3gZ8D0oMR6uOEDm+TecmrXBTAFbN2SmINs5oZkFyJZpBUVmjGr9ilmDpYsFzYWd0NscG7jpnjrDTuFIA79nFGzXJrl3nqnG379qnWki9p0wqyw1ktC12BKPj9Q1mlMJS43QieSyM4qKUDjBvpesV8wQzj4PY2cEMgT7/8HJztjcn+eO/02+joVz+OTfQZ7aAAEXSAjtAxOkETxD3P++LFHvE3/Mjf97/fW32vz/mEVsL/+Q8xFr7H</latexit>

Page 36: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Forward-BackwardAlgorithm

‣ Ini=al:�n(s) = 1

‣ Recurrence:

‣ Bigdifferences:countemissionforthenext=mestep(notcurrentone)

�t(st) =X

st+1

�t+1(st+1) exp(�e(st+1, t+ 1,x))

<latexit sha1_base64="o+YDBL44WMQt+0KfoSuWzIAvSK8=">AAAC3XicbVLLbtQwFHXCq4RHp7BkYzGqlIgwSgpS2SBVsGFZJKatmIwsx+N0rDqJZd+gjqJIbFiAEFv+ix3/wQfguClipr2SreNz7r3Hr1xJYSBJfnv+jZu3bt/Zuhvcu//g4fZo59GRqRvN+JTVstYnOTVciopPQYDkJ0pzWuaSH+dnb3v9+BPXRtTVB1gpPi/paSUKwShYioz+7GZUqiUlaWgi/Bpn/FyFmVoKwkMTp3FWUljmRXveRVFwmQuhIeCyTVOS1pAWnqddhwfZrcKBjdZbEohhs+k/HS6L4r5/hIMs53Ct3TNn51S3CAdy082RsZ3WPMlonEwSF/gqSAcwRkMcktGvbFGzpuQVMEmNmaWJgnlLNQgmeRdkjeGKsjN6ymcWVrTkZt661+nwrmUWuKi1HRVgx/5f0dLSmFWZ28x+j2ZT68nrtFkDxat5KyrVAK/YhVHRSAw17p8aL4TmDOTKAsq0sHvFbEk1ZWA/RGAvId088lVwtDdJX0z23r8cH7wZrmMLPUFPUYhStI8O0Dt0iKaIeR+9z95X75tP/C/+d//HRarvDTWP0Vr4P/8Cs27g5w==</latexit> exp(�t(st, st+1))<latexit sha1_base64="MMxuphUYCcsoCkHdOiUL0s5PNcs=">AAAC+nicbVLLbtQwFHVSHu3wmpYlG4tRpUSEUVKQYINUwYZlkZi20mRkOR6nY9VJLPsGOgr5FDYsQIgtX9Idf4PjZhAz7ZVsHZ9z7z1+ZUoKA3H8x/O3bt2+c3d7Z3Dv/oOHj4a7e8emqjXjE1bJSp9m1HApSj4BAZKfKs1pkUl+kp2/6/STT1wbUZUfYan4rKBnpcgFo2ApsusN91Mq1YKSJDAhfoNTfqGCVC0E4YGJkigtKCyyvLlow3CwyoXAEHDZpi5IY0gDz5O2xb3sVkHPhustCUSw2fSfDquiqOsfYqtlHG70e+b8nOoWQU9u2jkystO66bonRKvqEJPhKB7HLvB1kPRghPo4IsPLdF6xuuAlMEmNmSaxgllDNQgmeTtIa8MVZef0jE8tLGnBzaxxT9fifcvMcV5pO0rAjv2/oqGFMcsis5nd9s2m1pE3adMa8tezRpSqBl6yK6O8lhgq3P0DPBeaM5BLCyjTwu4VswXVlIH9LQN7Ccnmka+D44Nx8mJ88OHl6PBtfx3b6Al6igKUoFfoEL1HR2iCmPfZ++p99374X/xv/k//11Wq7/U1j9Fa+L//AuUc6iA=</latexit>

Page 37: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Forward-BackwardAlgorithm

�n(s) = 1

P (s3 = 2|x) = ↵3(2)�3(2)Pi ↵3(i)�3(i)

‣Whatdoesthedenominatorheremean?

‣ Doesthisexplainwhybetaiswhatitis?

↵1(s) = exp(�e(s, 1,x))<latexit sha1_base64="H++BwoHSkR4cFwia4mrESCMTX7M=">AAACE3icbVBNS8NAEN3Ur1q/oh69LBahlVKSKuhFEL14VLC20JSw2U7apZtk2d1IS+h/8OJf8eJBEa9evPlv3NYetPXBwOO9GWbmBYIzpR3ny8otLC4tr+RXC2vrG5tb9vbOnUpSSaFOE57IZkAUcBZDXTPNoSkkkCjg0Aj6l2O/cQ9SsSS+1UMB7Yh0YxYySrSRfPvQI1z0iO+WVBmfYQ8GouSJHvOhpCpuxYuI7gVhNhiVy75ddKrOBHieuFNSRFNc+/an10loGkGsKSdKtVxH6HZGpGaUw6jgpQoEoX3ShZahMYlAtbPJTyN8YJQODhNpKtZ4ov6eyEik1DAKTOf4RjXrjcX/vFaqw9N2xmKRaojpz6Iw5VgneBwQ7jAJVPOhIYRKZm7FtEckodrEWDAhuLMvz5O7WtU9qtZujovnF9M48mgP7aMSctEJOkdX6BrVEUUP6Am9oFfr0Xq23qz3n9acNZ3ZRX9gfXwDGCWcbA==</latexit>

↵t(st) =X

st�1

↵t�1(st�1) exp(�e(st, t,x))<latexit sha1_base64="MsrARqnMVgv4lQtn8MktFrMi2DM=">AAAChHicbVFNT9wwEHVCS+n2g2175GKxWjWR0lUMreilCNFLjyB1AWmzshyvw1o4iWVPEKsov4R/1Rv/pk4IogsdydLze2/G45lUK2khju88f+PFy81XW68Hb96+e789/PDxzJaV4WLKS1Wai5RZoWQhpiBBiQttBMtTJc7Tq5+tfn4tjJVl8RtWWsxzdlnITHIGjqLD23HClF4ySgIb4h84ETc6SPRSUhHYiERJzmCZZvVNE4aD3gqBpdCZbZXT2tIavpCmwb3c3YKeDdcrUohgveb4UYeHpKitH2I6HMWTuAv8HJAejFAfJ3T4J1mUvMpFAVwxa2ck1jCvmQHJlWgGSWWFZvyKXYqZgwXLhZ3X3RAbPHbMAmelcacA3LH/ZtQst3aVp87Z9m+fai35P21WQfZ9XstCVyAKfv9QVikMJW43ghfSCA5q5QDjRrpeMV8ywzi4vQ3cEMjTLz8HZ3sTsj/ZO/06Ojrux7GFdtAuChBBB+gI/UInaIq453mfvdgj/qYf+fv+t3ur7/U5n9Ba+Id/AUKdvsc=</latexit> exp(�t(st�1, st))

<latexit sha1_base64="+aaYS+pf0KkwlV5IpyMknO6c7lg=">AAAChHicbVFNT9wwEHUCpXT7tdBjLxYr1ERKVzG0gksr1F44gtQFpM3KcrwOa+Eklj1BrKL8kv4rbvybOiGoLDCSpef33ozHM6lW0kIc33n+2vqrjdebbwZv373/8HG4tX1my8pwMeGlKs1FyqxQshATkKDEhTaC5akS5+nV71Y/vxbGyrL4A0stZjm7LGQmOQNH0eHf3YQpvWCUBDbEP3AibnSQ6IWkIrARiZKcwSLN6psmDAcPXggshc5tq5zWltbwlTQN7uXuFvRsuFqSQgSrRf/L8JATteVDTIejeBx3gZ8D0oMR6uOEDm+TecmrXBTAFbN2SmINs5oZkFyJZpBUVmjGr9ilmDpYsFzYWd0NscG7jpnjrDTuFIA79nFGzXJrl3nqnG379qnWki9p0wqyw1ktC12BKPj9Q1mlMJS43QieSyM4qKUDjBvpesV8wQzj4PY2cEMgT7/8HJztjcn+eO/02+joVz+OTfQZ7aAAEXSAjtAxOkETxD3P++LFHvE3/Mjf97/fW32vz/mEVsL/+Q8xFr7H</latexit>

�t(st) =X

st+1

�t+1(st+1) exp(�e(st+1, t+ 1,x))

<latexit sha1_base64="o+YDBL44WMQt+0KfoSuWzIAvSK8=">AAAC3XicbVLLbtQwFHXCq4RHp7BkYzGqlIgwSgpS2SBVsGFZJKatmIwsx+N0rDqJZd+gjqJIbFiAEFv+ix3/wQfguClipr2SreNz7r3Hr1xJYSBJfnv+jZu3bt/Zuhvcu//g4fZo59GRqRvN+JTVstYnOTVciopPQYDkJ0pzWuaSH+dnb3v9+BPXRtTVB1gpPi/paSUKwShYioz+7GZUqiUlaWgi/Bpn/FyFmVoKwkMTp3FWUljmRXveRVFwmQuhIeCyTVOS1pAWnqddhwfZrcKBjdZbEohhs+k/HS6L4r5/hIMs53Ct3TNn51S3CAdy082RsZ3WPMlonEwSF/gqSAcwRkMcktGvbFGzpuQVMEmNmaWJgnlLNQgmeRdkjeGKsjN6ymcWVrTkZt661+nwrmUWuKi1HRVgx/5f0dLSmFWZ28x+j2ZT68nrtFkDxat5KyrVAK/YhVHRSAw17p8aL4TmDOTKAsq0sHvFbEk1ZWA/RGAvId088lVwtDdJX0z23r8cH7wZrmMLPUFPUYhStI8O0Dt0iKaIeR+9z95X75tP/C/+d//HRarvDTWP0Vr4P/8Cs27g5w==</latexit> exp(�t(st, st+1))<latexit sha1_base64="MMxuphUYCcsoCkHdOiUL0s5PNcs=">AAAC+nicbVLLbtQwFHVSHu3wmpYlG4tRpUSEUVKQYINUwYZlkZi20mRkOR6nY9VJLPsGOgr5FDYsQIgtX9Idf4PjZhAz7ZVsHZ9z7z1+ZUoKA3H8x/O3bt2+c3d7Z3Dv/oOHj4a7e8emqjXjE1bJSp9m1HApSj4BAZKfKs1pkUl+kp2/6/STT1wbUZUfYan4rKBnpcgFo2ApsusN91Mq1YKSJDAhfoNTfqGCVC0E4YGJkigtKCyyvLlow3CwyoXAEHDZpi5IY0gDz5O2xb3sVkHPhustCUSw2fSfDquiqOsfYqtlHG70e+b8nOoWQU9u2jkystO66bonRKvqEJPhKB7HLvB1kPRghPo4IsPLdF6xuuAlMEmNmSaxgllDNQgmeTtIa8MVZef0jE8tLGnBzaxxT9fifcvMcV5pO0rAjv2/oqGFMcsis5nd9s2m1pE3adMa8tezRpSqBl6yK6O8lhgq3P0DPBeaM5BLCyjTwu4VswXVlIH9LQN7Ccnmka+D44Nx8mJ88OHl6PBtfx3b6Al6igKUoFfoEL1HR2iCmPfZ++p99374X/xv/k//11Wq7/U1j9Fa+L//AuUc6iA=</latexit>

Page 38: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Compu=ngMarginals

y1 y2 yn…

�e

�t

P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

Z =X

y

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ ForbothHMMsandCRFs:

‣ Normalizingconstant

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

ZforCRFs,P(x)forHMMs

‣ AnalogoustoP(x)forHMMs

Page 39: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Posteriorsvs.Probabili=es

P (yi = s|x) = forwardi(s)backwardi(s)Ps0 forwardi(s

0)backwardi(s0)

‣ Posteriorisderivedfromtheparametersandthedata(condi=onedonx!)

HMM

CRF

Modelparameter(usuallymul=nomialdistribu=on)

Inferredquan=tyfromforward-backward

Inferredquan=tyfromforward-backward

Undefined(modelisbydefini=oncondi=onedonx)

P (xi|yi), P (yi|yi�1) P (yi|x), P (yi�1, yi|x)

Page 40: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

TrainingCRFs

‣ Transi=onfeatures:needtocompute

‣ …butyoucanbuildapre8ygoodsystemwithoutlearnedtransi=onfeatures(useheuris=cweights,orjustenforceconstraintslikeB-PER->I-ORGisillegal)

P (yi = s1, yi+1 = s2|x)usingforward-backwardaswell

‣ Foremissionfeatures:

goldfeatures—expectedfeaturesundermodel

@

@wL(y⇤,x) =

nX

i=1

fe(y⇤i , i,x)�

nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 41: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

CRFsOutline

‣Model: P (y|x) = 1

Z

nY

i=2

exp(�t(yi�1, yi))nY

i=1

exp(�e(yi, i,x))

‣ Inference:argmaxP(y|x)fromViterbi

‣ Learning:runforward-backwardtocomputeposteriorprobabili=es;then

P (y|x) / expw>

"nX

i=2

ft(yi�1, yi) +nX

i=1

fe(yi, i,x)

#

@

@wL(y⇤,x) =

nX

i=1

fe(y⇤i , i,x)�

nX

i=1

X

s

P (yi = s|x)fe(s, i,x)

Page 42: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

PseudocodeandTips

Page 43: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Pseudocode

foreachepoch

foreachexample

extractfeaturesoneachemissionandtransi=on(lookupincache)

computemarginalprobabili=eswithforward-backward

computepoten=alsphibasedonfeatures+weights

accumulategradientoverallemissionsandtransi=ons

Page 44: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

Implementa=onTipsforCRFs‣ Cachingisyourfriend!Cachefeaturevectorsespecially

‣ Trytoreduceredundantcomputa=on,e.g.ifyoucomputeboththegradientandtheobjec=vevalue,don’trerunthedynamicprogram

‣ Ifthingsaretooslow,runaprofilerandseewhere=meisbeingspent.Forward-backwardshouldtakemostofthe=me

‣ Exploitsparsityinfeaturevectorswherepossible,especiallyinfeaturevectorsandgradients

‣ Doalldynamicprogramcomputa=oninlogspacetoavoidunderflow

Page 45: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

DebuggingTipsforCRFs‣ Hardtoknowwhetherinference,learning,orthemodelisbroken!

‣ Computetheobjec=ve—isop=miza=onworking?

‣ Learning:istheobjec=vegoingdown?Trytofit1example/10examples.Areyouapplyingthegradientcorrectly?

‣ Inference:checkgradientcomputa=on(mostlikelyplaceforbugs)

‣ Isthesameforalli?‣ Doprobabili=esnormalizecorrectly+look“reasonable”?(Nearlyuniformwhenuntrained,thenslowlyconvergingtotherightthing)

‣ Ifobjec=veisgoingdownbutmodelperformanceisbad:

‣ Inference:checkperformanceifyoudecodethetrainingset

X

s

forwardi(s)backwardi(s)

Page 46: CS388: Natural Language Processing Lecture 5: Named En ...gdurrett/courses/sp2021/...CS388: Natural Language Processing Greg Durre8 Lecture 5: Named En=ty Recogni=on, CRFs Administrivia

NextTime

‣ FinishdiscussingNER

‣ Neuralnetworks