Top Banner
CS395T: Structured Models for NLP Lecture 10: Trees 4 Greg Durrett Administrivia Project 1 graded by late week / this weekend Recall: Eisner’s Algorithm LeM and right children are built independently, heads are edges of spans Complete item: all children are aPached, head is at the “tall end” Incomplete item: arc from “tall end” to “short end”, may sTll expect children DT NN TO VBD DT NN the house to ran the dog ROOT Recall: MST Algorithm Eisner: search over the space of projecTve trees, O(n 3 ) MST: find maximum directed spanning tree — finds nonprojecTve trees as well as projecTve trees O(n 2 ) MST restricted to features on single dependencies, Eisner can be generalized to incorporate higher-order features (grandparents, siblings, etc.) at a Tme complexity cost, or with beaming
11

Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

Jul 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

CS395T:StructuredModelsforNLPLecture10:Trees4

GregDurrett

Administrivia

‣ Project1gradedbylateweek/thisweekend

Recall:Eisner’sAlgorithm‣ LeMandrightchildrenarebuiltindependently,headsareedgesofspans‣ Completeitem:allchildrenareaPached,headisatthe“tallend”‣ Incompleteitem:arcfrom“tallend”to“shortend”,maysTllexpectchildren

DT NNTOVBDDT NNthe housetoranthe dog

ROOT

Recall:MSTAlgorithm

‣ Eisner:searchoverthespaceofprojecTvetrees,O(n3)

‣MST:findmaximumdirectedspanningtree—findsnonprojecTvetreesaswellasprojecTvetreesO(n2)

‣MSTrestrictedtofeaturesonsingledependencies,Eisnercanbegeneralizedtoincorporatehigher-orderfeatures(grandparents,siblings,etc.)ataTmecomplexitycost,orwithbeaming

Page 2: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

Recall:TransiTon-BasedParsing

‣ Start:stackcontains[ROOT],buffercontains[Iatesomespaghedbolognese]

‣ ShiM:topofbuffer->topofstack

‣ LeM-Arc: �|w�2, w�1 ! �|w�1 w�1w�2

‣ Right-Arc �|w�2, w�1 ! �|w�2

isnowachildof,

w�1 w�2,

‣ End:stackcontains[ROOT],bufferisempty[]

‣ Musttake2nstepsfornwords(nshiMs,nLA/RA)

isnowachildof

‣ Arc-standardsystem:threeoperaTons

Recall:TransiTon-BasedParsing

[ROOTate]

I

[somespaghedbolognese]

[ROOTatesomespaghed]

I

[bolognese]

[ROOTatespaghed]

I some

[bolognese]

S

L

Iatesomespaghedbolognese

S

ROOT Stopofbuffer->topofstackLARA

poptwo,leMarcbetweenthempoptwo,rightarcbetweenthem

ThisLecture

‣GlobalDecoding

‣ EarlyupdaTng

‣ ConnecTonstoreinforcementlearning,dynamicoracles

‣ State-of-the-artdependencyparsers,relatedtasks

GreedyTraining:StaTcStates

‣ Greedy:eachboxformsatrainingexample(s,a*)

Statespace

Goldendstate

Startstate

=BadalternaTvedecisions

Page 3: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

GlobalDecoding

‣ Greedyparser:trainedtomaketherightdecision(S,LA,RA)fromanygoldstatewemightcometo

‣ Whymightthisbebad?

‣ WhatweopTmizingwhenwedecodeeachsentence?

s abest(s)

abest argmaxaw>f(s, a)

‣ Nothing…we’reexecuTng:

GlobalDecoding

[ROOTgavehim]

I

[dinner]

‣ Correct:Right-arc,ShiM,Right-arc,Right-arc

Igavehimdinner

ROOT

[ROOTgave]

I

[dinner]

him

[ROOTgavedinner]

I

[]

him

[ROOTgave]

I

[]

him dinner

GlobalDecoding:ACartoon

S

LA

RA

‣ Bothwrong!Alsobothprobablylowscoring!

RA S‣ Correct,highscoringopTon

[ROOTgavehim]

I

[dinner]Igavehimdinner

ROOT

[ROOTgavehimdinner]

I

[]

LA

[ROOTgave]

I him

[dinner]

GlobalDecoding:ACartoon

[ROOTgavehim]

I

[dinner]Igavehimdinner

ROOT

‣ Lookaheadcanhelpusavoidgedngstuckinbadspots

‣ Globalmodel:maximizesumofscoresoveralldecisions

‣ SimilartohowViterbiworks:wemaintainuncertaintyoverthecurrentstatesothatifanotheronelooksmoreopTmalgoingforward,wecanusethatone

Page 4: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

GlobalShiM-ReduceParsing

[ROOTgavehim]

I

[dinner]Igavehimdinner

ROOT

‣Global:

‣ Canwedosearchexactly?

‣No!Usebeamsearch

‣Greedy:repeatedlyexecute

s abest(s)

abest argmaxaw>f(s, a)

argmaxs,af(s,a) =2nX

i=1

w>f(si, ai)

si+1 = ai(si)

‣ Howmanystatessarethere?

GlobalShiM-ReduceParsing

[ROOTgavehimdinner]

I

[]

[ROOTgave]

I him

[dinner]LA

RA

S

-1.2

+0.9

[ROOTgavehim]

I

[]-3.0

dinner

[ROOTgavedinner]

I

[]-2.0

him

[ROOTgavedinner]

I him+2.0

[]

‣ Beamsearchgaveusthelookaheadtomaketherightdecision

TrainingGlobalParsers‣ Cancomputeapproximatemaxeswithbeamsearch

‣ StructuredSVM:doloss-augmenteddecode,gradient=goldfeats-guessfeats

‣Whathappensifwesetbeamsize=1?

argmaxs,af(s,a) =2nX

i=1

w>f(si, ai)

‣ Structuredperceptron:normaldecode,gradient=goldfeats-guessfeats

GlobalTrainingForeachepoch

Foreachsentence

Fori=1…2*len(sentence)#2ntransiTonsinarc-standard

beam[i]=compute_successors(beam[i-1])

predicTon=beam[2*len(sentence),0]#argmax=topoflastbeam

apply_gradient_update(feats(gold)-feats(predicTon))#FeatsarecumulaTveoverthewholesentence

Page 5: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

GlobalTraining

‣ LearnnegaTveweightsforfeaturesinthesestates—greedytrainingwouldneverseethesestates

Statespace

GoldendstateStartstate

‣ Inglobal,wekeepgoingifwescrewup!

Predendstate

Globalvs.Greedy

‣ Greedy:2nlocaltrainingexamples

Statespace

GoldendstateStartstate

‣ Global:oneglobalexample

‣ Inglobal,wekeepgoingifwescrewup!

EarlyUpdaTng

EarlyUpdaTng

Thisdecisionwasbad

Butthesemight’vebeengood!hardtotell

CollinsandRoark(2004)

Statespace

GoldendstateStartstate

Page 6: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

EarlyUpdaTng

[ROOTgavedinner]

I

[]

him

Igavehimdinner

ROOT

‣ Ideallywedon’twanttopenalizethisdecision(updateawayfromit)—insteadjustpenalizethedecisionthatwasobviouslywrong

[ROOTgave]

I

[]

him‣ Wrongstate—wealreadymessedup!

dinner

‣ MadethebestofabadsituaTonbypudngagoodarcin(gave->dinner)

RA

CollinsandRoark(2004)

EarlyUpdaTng

‣ SoluTon:makeanupdateassoonasthegoldparsefallsoffthebeam

‣ goldfeats-guessfeatscomputeduptothispoint

EarlyUpdaTng

[ROOTgavehimdinner]

I

[]

[ROOTgave]

I him

[dinner]LA

RA

S

-1.2

+0.9[ROOTgavehim]

I

[]+1.0

dinner

[ROOTgavedinner]

I

[]-2.0

him

[ROOTgavedinner]

I him-3.0

[]‣ Goldhasfallenoffbeam!

‣ Update:goldfeats-predfeats

TrainingwithEarlyUpdaTngForeachepoch

Foreachsentence

Fori=1…2*len(sentence)#2ntransiTonsinarc-standard

beam[i]=compute_successors(beam[i-1])

Ifbeam[i]doesnotcontaingold:

break

apply_gradient_update(feats(gold[0:i])-feats(beam[i,0]))#FeatsarecumulaTveupunTlthispoint

apply_gradient_update(feats(gold)-feats(beam[2*len(sentence),0]))#GoldsurvivedtotheendbutmaysTllnotbeone-best

Page 7: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

ConnecTonstoReinforcementLearning

MoTvaTon

‣ Partofthebenefitisweseestateswewouldn’thaveseenduringgreedydecoding‣ (STlltrueevenwithearlyupdaTngduetobeamsearch)

BePerGreedyAlgorithm

Foreachepoch:

Foreachsentence:

Parsethesentencewiththecurrentweights

Foreachstatesintheparse:

DeterminewhattherightacTona*was

Trainonthisexample(updatetowardsf(s,a*),awayfromf(s,apred))

‣ Howdowedeterminethis?

DynamicOracles‣Whenyoumakesomebaddecisions,howdoyoudigyourselfout?

GoldbergandNivre(2012)

‣ Scoreofdecisionainstatesleadingtos’:loss(a)=loss(best_possible_tree(s’))-loss(best_possible_tree(s))

‣ best_possible_tree(s):computestheopTmaldecisionsequencefromstatestotheendresulTngthelowestoverallloss

‣ Implementedbyabunchoflogicthatlooksatthetree:“ifweputaright-arcfroma->b,wecan’tgivebanymorechildren,soloseapointforeveryunboundchild,alsoloseapointifaisn’tb’shead…”

‣ a*=argminaloss(a)

Page 8: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

ConnecTonstoReinforcementLearning

‣MarkovDecisionProcess:statess,acTonsa,transiTonsT,rewardsr,discountfactor

‣ TisdeterminisTcforus,=1(nodiscount)

‣ Usingthe“bePergreedyalgorithm”correspondstoon-policylearninghere

‣ Onerewardsystem:r=1ifacToniswhatdynamicoraclesays,0otherwise

‣ Butdynamicoraclesarehardtobuild:(

‣ Maximizesumofrewardsovertheparse

Searn

Daumeetal.(2009)

‣ Searn:frameworkforturningstructuredproblemsintoclassificaTonproblems

‣ Takethecurrentpolicy(=weights),generatestatessbyrunningthatpolicyonagivenexample

‣ EvaluteacTonainstatesbytakinga,thenfollowingyourcurrentpolicytocompleTonandcompuTngtheloss(=best_possible_lossisapproximatedbycurrentpolicy)

‣ WhatifwejusthadalossfuncTonl(y,y*)thatscoredwholepredicTons?I.e.,allrewardcomesattheend

‣ DAGGERalgorithmfromRLliterature

MoTvaTon

States,evaluateacTonsa

y*

…bycompuTnglosseshere`(y1,y

⇤)

`(y2,y⇤)

`(y3,y⇤)

GlobalModelsvs.RL

‣ RLtechniquesareusuallynottherightthingtodounlessyoulossfuncTonandstatespacearereallycomplicated

‣Otherwise,besttousedynamicoraclesorglobalmodels

‣ StructuredpredicTonproblemsaren’treally“RL”inthattheenvironmentdynamicsareunderstood

‣ Theseissuesarisefarbeyondparsing!Coreference,machinetranslaTon,dialoguesystems,…

Page 9: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

State-of-the-artParsers

State-of-the-artParsers

‣ 2012:MaltparserwasSOTAwasfortransiTon-based(~90UAS),similartowhatyou’llbuild

‣ 2010:Koo’s3rd-orderparserwasSOTAforgraph-based(~93UAS)

‣ 2014:ChenandManninggot92UASwithtransiTon-basedneuralmodel

‣ 2005:MSTParsergotsolidperformance(~91UAS)

State-of-the-artParsers

ChenandManning(2014)

ParseyMcParseFace

Andoretal.(2016)

‣ Currentstate-of-the-art,releasedbyGooglepublicly

‣ 94.61UASonthePennTreebankusingaglobaltransiTon-basedsystemwithearlyupdaTng

‣ FeedforwardneuralnetslookingatwordsandPOSassociatedwith‣ Wordsatthetopofthestack‣ Thosewords’children‣ Wordsinthebuffer‣ FeaturesetpioneeredbyChenandManning(2014),Googlefine-tunedit

‣ AddiTonaldataharvestedvia“tri-training”

Page 10: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

StackLSTMs

Dyeretal.(2015)

‣ UseLSTMsoverstack,buffer,pastacTonsequence.Trainedgreedily‣ SlightlylessgoodthanParsey

SemanTcRoleLabeling‣ Anotherkindoftree-structuredannotaTon,likeasubsetofdependency

‣ VerbrolesfromPropbank(Palmeretal.,2005),nominalpredicatestoo

FigurefromHeetal.(2017)

quicken:

AbstractMeaningRepresentaTon‣Graph-structuredannotaTon

Theboywantstogo

Banarescuetal.(2014)

‣ SupersetofSRL:fullsentenceanalyses,containscoreferenceandmulT-wordexpressionsaswell

‣ F1scoresinthe60s:hard!

‣ Socomprehensivethatit’shardtopredict,butsTlldoesn’thandletenseorsomeotherthings…

Takeaways

‣GlobaltrainingisanalternaTvetogreedytraining

‣UsebeamsearchforinferencecombinedwithearlyupdaTngforbestresults

‣Dynamicoracles+followingthepredictedpathinthestatespacelookslikereinforcementlearning

Page 11: Administrivia CS395T: Structured Models for NLP Lecture 10: …gdurrett/courses/fa2017/... · 2017-10-03 · ‣DAGGER algorithm from RL literature MoTvaTon State s, evaluate acTons

Survey

‣ Paceoflastlecture+thislecture:[tooslow][justright][toofast]

‣ Paceofclassoverall:[tooslow][justright][toofast]

‣Writeonethingyoulikeabouttheclass

‣Writeonethingyoudon’tlikeabouttheclass