Top Banner
POS Tagging / Parsing I Taylor Berg-Kirkpatrick – CMU Slides: Dan Klein – UC Berkeley Algorithms for NLP
128

Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Jun 21, 2018

Download

Documents

dinhnguyet
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

POSTagging/ParsingITaylorBerg-Kirkpatrick– CMU

Slides:DanKlein– UCBerkeley

AlgorithmsforNLP

Page 2: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

SpeechTraining

Page 3: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

WhatNeedstobeLearned?

§ Emissions:P(x|phoneclass)§ XisMFCC-valued

§ Transitions:P(state|prev state)§ Ifbetweenwords,thisisP(word|history)§ Ifinsidewords,thisisP(advance|phoneclass)§ (Reallyahierarchicalmodel)

s s s

x x x

Page 4: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Estimation fromAlignedData§ Whatifeachtimestepwaslabeledwithits(context-

dependentsub)phone?

§ CanestimateP(x|/ae/)asempiricalmeanand(co-)varianceofx’swithlabel/ae/

§ Problem:Don’tknowalignmentattheframeandphonelevel

/k/ /ae/ /ae/

x x x

/ae/ /t/

x x

Page 5: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Forced Alignment§ WhatiftheacousticmodelP(x|phone)wasknown?

§ …andalsothecorrectsequencesofwords/phones

§ Canpredictthebestalignmentofframestophones

§ Called“forcedalignment”

ssssssssppppeeeeeeetshshshshllllaeaeaebbbbb

“speech lab”

Page 6: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ForcedAlignment§ Createanewstatespacethatforcesthehiddenvariablestotransition

throughphonesinthe(known)order

§ Stillhaveuncertaintyaboutdurations

§ InthisHMM,alltheparametersareknown§ Transitionsdeterminedbyknownutterance§ Emissionsassumedtobeknown§ Minordetail:self-loopprobabilities

§ JustrunViterbi(orapproximations)togetthebestalignment

/s/ /p/ /ee/ /ch/ /l/ /ae/ /b/

Page 7: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EMforAlignment§ Input:acousticsequenceswithword-leveltranscriptions

§ Wedon’tknoweithertheemissionmodelortheframealignments

§ ExpectationMaximization(HardEMfornow)§ Alternatingoptimization§ Imputecompletionsforunlabeledvariables(here,thestatesateach

timestep)§ Re-estimatemodelparameters(here,Gaussianmeans,variances,

mixtureids)§ Repeat§ OneoftheearliestusesofEM!

Page 8: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

SoftEM§ HardEMusesthebestsinglecompletion

§ Here,singlebestalignment§ Notalwaysrepresentative§ Certainlybadwhenyourparametersareinitializedandthealignments

arealltied§ Usesthecountofvariousconfigurations(e.g.howmanytokensof

/ae/haveself-loops)

§ Whatwe’dreallylikeistoknowthefractionofpathsthatincludeagivencompletion§ E.g.0.32ofthepathsalignthisframeto/p/,0.21alignitto/ee/,etc.§ Formallywanttoknowtheexpectedcountofconfigurations§ Keyquantity:P(st |x)

Page 9: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Computing Marginals

= sum of all paths through s at tsum of all paths

Page 10: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ForwardScores

Page 11: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Backward Scores

Page 12: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TotalScores

Page 13: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

FractionalCounts

§ Computingfractional(expected)counts§ Computeforward/backwardprobabilities§ Foreachposition,computemarginalposteriors§ Accumulateexpectations§ Re-estimateparameters(e.g.means,variances,self-loopprobabilities)fromratiosoftheseexpectedcounts

Page 14: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

StagedTrainingandStateTying

§ CreatingCDphones:§ Startwithmonophone,doEM

training§ CloneGaussiansintotriphones§ Builddecisiontreeandcluster

Gaussians§ Cloneandtrainmixtures

(GMMs)

§ Generalidea:§ Introducecomplexitygradually§ Interleaveconstraintwith

flexibility

Page 15: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

PartsofSpeech

Page 16: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Parts-of-Speech(English)§ Onebasickindoflinguisticstructure:syntacticwordclasses

Open class (lexical) words

Closed class (functional)

Nouns Verbs

Proper Common

Auxiliary

Main

Adjectives

Adverbs

Prepositions

Particles

Determiners

Conjunctions

Pronouns

… more

… more

IBMItaly

cat / catssnow

seeregistered

canhad

yellow

slowly

to with

off up

the some

and or

he its

Numbers

122,312one

Page 17: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

CC conjunction, coordinating and both but either orCD numeral, cardinal mid-1890 nine-thirty 0.5 oneDT determiner a all an every no that theEX existential there there FW foreign word gemeinschaft hund ich jeuxIN preposition or conjunction, subordinating among whether out on by ifJJ adjective or numeral, ordinal third ill-mannered regrettable

JJR adjective, comparative braver cheaper tallerJJS adjective, superlative bravest cheapest tallestMD modal auxiliary can may might will would NN noun, common, singular or mass cabbage thermostat investment subhumanity

NNP noun, proper, singular Motown Cougar Yvette LiverpoolNNPS noun, proper, plural Americans Materials StatesNNS noun, common, plural undergraduates bric-a-brac averagesPOS genitive marker ' 's PRP pronoun, personal hers himself it we themPRP$ pronoun, possessive her his mine my our ours their thy your

RB adverb occasionally maddeningly adventurouslyRBR adverb, comparative further gloomier heavier less-perfectlyRBS adverb, superlative best biggest nearest worst RP particle aboard away back by on open throughTO "to" as preposition or infinitive marker to UH interjection huh howdy uh whammo shucks heckVB verb, base form ask bring fire see take

VBD verb, past tense pleaded swiped registered sawVBG verb, present participle or gerund stirring focusing approaching erasingVBN verb, past participle dilapidated imitated reunifed unsettledVBP verb, present tense, not 3rd person singular twist appear comprise mold postponeVBZ verb, present tense, 3rd person singular bases reconstructs marks usesWDT WH-determiner that what whatever which whichever WP WH-pronoun that what whatever which who whomWP$ WH-pronoun, possessive whose WRB Wh-adverb however whenever where why

Page 18: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Part-of-SpeechAmbiguity§ Wordscanhavemultiplepartsofspeech

§ Twobasicsourcesofconstraint:§ Grammaticalenvironment§ Identityofthecurrentword

§ Manymorepossiblefeatures:§ Suffixes,capitalization,namedatabases(gazetteers),etc…

Fed raises interest rates 0.5 percentNNP NNS NN NNS CD NNVBN VBZ VBP VBZVBD VB

Page 19: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

WhyPOSTagging?§ Usefulinandofitself(morethanyou’dthink)

§ Text-to-speech:record,lead§ Lemmatization:saw[v]® see,saw[n]® saw§ Quick-and-dirtyNP-chunkdetection:grep{JJ|NN}*{NN|NNS}

§ Usefulasapre-processingstepforparsing§ Lesstagambiguitymeansfewerparses§ However,sometagchoicesarebetterdecidedbyparsers

DT NN IN NN VBD NNS VBDThe average of interbank offered rates plummeted …

DT NNP NN VBD VBN RP NN NNSThe Georgia branch had taken on loan commitments …

IN

VDN

Page 20: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Part-of-SpeechTagging

Page 21: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ClassicSolution:HMMs§ Wewantamodelofsequencessandobservationsw

§ Assumptions:§ Statesaretagn-grams§ Usuallyadedicatedstartandendstate/word§ Tag/statesequenceisgeneratedbyamarkov model§ Wordsarechosenindependently,conditionedonlyonthetag/state§ Thesearetotallybrokenassumptions:why?

s1 s2 sn

w1 w2 wn

s0

Page 22: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

States§ Statesencodewhatisrelevantaboutthepast§ TransitionsP(s|s’)encodewell-formedtagsequences

§ Inabigramtagger,states=tags

§ Inatrigramtagger,states=tagpairs

<¨,¨>

s1 s2 sn

w1 w2 wn

s0

< ¨, t1> < t1, t2> < tn-1, tn>

<¨>

s1 s2 sn

w1 w2 wn

s0

< t1> < t2> < tn>

Page 23: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EstimatingTransitions

§ Usestandardsmoothingmethodstoestimatetransitions:

§ Cangetalotfancier(e.g.KNsmoothing)orusehigherorders,butinthiscaseitdoesn’tbuymuch

§ Oneoption:encodemoreintothestate,e.g.whetherthepreviouswordwascapitalized(Brants 00)

§ BIGIDEA:Thebasicapproachofstate-splitting/refinementturnsouttobeveryimportantinarangeoftasks

)(ˆ)1()|(ˆ),|(ˆ),|( 211121221 iiiiiiiii tPttPtttPtttP llll --++= -----

Page 24: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EstimatingEmissions

§ Emissionsaretrickier:§ Wordswe’veneverseenbefore§ Wordswhichoccurwithtagswe’veneverseenthemwith§ Oneoption:breakoutthefancysmoothing(e.g.KN,Good-Turing)§ Issue:unknownwordsaren’tblackboxes:

§ Basicsolution:unknownwordsclasses(affixesorshapes)

§ Commonapproach:EstimateP(t|w)andinvert§ [Brants 00]usedasuffixtrie asits(inverted)emissionmodel

343,127.23 11-year Minteria reintroducibly

D+,D+.D+ D+-x+ Xx+ x+-“ly”

Page 25: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Disambiguation(Inference)§ Problem:findthemostlikely(Viterbi)sequenceunderthemodel

§ Givenmodelparameters,wecanscoreanytagsequence

§ Inprinciple,we’redone– listallpossibletagsequences,scoreeachone,pickthebestone(theViterbistatesequence)

Fed raises interest rates 0.5 percent .NNP VBZ NN NNS CD NN .

P(NNP|<¨,¨>) P(Fed|NNP) P(VBZ|<NNP,¨>) P(raises|VBZ) P(NN|VBZ,NNP)…..

NNP VBZ NN NNS CD NNNNP NNS NN NNS CD NNNNP VBZ VB NNS CD NN

logP = -23

logP = -29logP = -27

<¨,¨> <¨,NNP> <NNP, VBZ> <VBZ, NN> <NN, NNS> <NNS, CD> <CD, NN> <STOP>

Page 26: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

FindingtheBestTrajectory§ Toomanytrajectories(statesequences)tolist§ Option1:BeamSearch

§ Abeamisasetofpartialhypotheses§ Startwithjustthesingleemptytrajectory§ Ateachderivationstep:

§ Considerallcontinuationsofprevioushypotheses§ Discardmost,keeptopk,orthosewithinafactorofthebest

§ Beamsearchworksokinpractice§ …butsometimesyouwanttheoptimalanswer§ …andyouneedoptimalanswerstovalidateyourbeamsearch§ …andthere’susuallyabetteroptionthannaïvebeams

<>Fed:NNP

Fed:VBN

Fed:VBD

Fed:NNP raises:NNS

Fed:NNP raises:VBZFed:VBN raises:NNS

Fed:VBN raises:VBZ

Page 27: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TheStateLattice/Trellis

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 28: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TheStateLattice/Trellis

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 29: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TheViterbiAlgorithm§ Dynamicprogramforcomputing

§ Thescoreofabestpathuptopositioniendinginstates

§ Alsocanstoreabacktrace(butnoonedoes)

§ Memoizedsolution§ Iterativesolution

)...,...(max)( 1110... 10--

-

= iisssi wwsssPsi

d

)'()'|()'|(max)( 1'sswPssPs isi -= dd

îíì >••=<

=otherwisesif

s0

,1)(0d

)'()'|()'|(maxarg)( 1'

sswPssPs is

i -= dy

Page 30: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

SoHowWellDoesItWork?§ Choosethemostcommontag

§ 90.3%withabadunknownwordmodel§ 93.7%withagoodone

§ TnT (Brants,2000):§ Acarefullysmoothedtrigramtagger§ Suffixtreesforemissions§ 96.7%onWSJtext(SOTAis97+%)

§ Noiseinthedata§ Manyerrorsinthetrainingandtestcorpora

§ Probablyabout2%guaranteederrorfromnoise(onthisdata)

NN NN NNchief executive officer

JJ NN NNchief executive officer

JJ JJ NNchief executive officer

NN JJ NNchief executive officer

DT NN IN NN VBD NNS VBDThe average of interbank offered rates plummeted …

Page 31: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Overview:Accuracies§ Roadmapof(known/unknown)accuracies:

§ Mostfreq tag: ~90%/~50%

§ TrigramHMM: ~95%/~55%

§ TnT (HMM++): 96.2%/86.0%

§ Maxent P(t|w): 93.7%/82.6%§ MEMMtagger: 96.9%/86.9%§ State-of-the-art: 97+%/89+%§ Upperbound: ~98%

Most errors on unknown

words

Page 32: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

CommonErrors§ Commonerrors[fromToutanova&Manning00]

NN/JJ NN

official knowledge

VBD RP/IN DT NN

made up the story

RB VBD/VBN NNS

recently sold shares

Page 33: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

RicherFeatures

Page 34: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

BetterFeatures§ Candosurprisinglywelljustlookingatawordbyitself:

§ Word the:the® DT§ Lowercasedword Importantly:importantly® RB§ Prefixes unfathomable:un-® JJ§ Suffixes Surprisingly:-ly® RB§ Capitalization Meridian:CAP® NNP§ Wordshapes 35-year:d-x® JJ

§ Thenbuildamaxent(orwhatever)modeltopredicttag§ MaxentP(t|w): 93.7%/82.6% s3

w3

Page 35: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

WhyLinearContextisUseful§ Lotsofrichlocalinformation!

§ Wecouldfixthiswithafeaturethatlookedatthenextword

§ Wecouldfixthisbylinkingcapitalizedwordstotheirlowercaseversions

§ Solution:discriminativesequencemodels(MEMMs,CRFs)

§ Realitycheck:§ Taggersarealreadyprettygoodonnewswiretext…§ Whattheworldneedsistaggersthatworkonothertext!

PRP VBD IN RB IN PRP VBD .They left as soon as he arrived .

NNP NNS VBD VBN .Intrinsic flaws remained undetected .

RB

JJ

Page 36: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Sequence-FreeTagging?

§ Whataboutlookingatawordanditsenvironment,butnosequenceinformation?

§ Addinprevious/nextword the__§ Previous/nextwordshapes X__X§ Occurrencepatternfeatures [X:xXoccurs]§ Crudeentitydetection __…..(Inc.|Co.)§ Phrasalverbinsentence? put……__§ Conjunctionsofthesethings

§ Allfeaturesexceptsequence:96.6%/86.8%§ Useslotsoffeatures:>200K§ Whyisn’tthisthestandardapproach?

t3

w3 w4w2

Page 37: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

NamedEntityRecognition§ Othersequencetasksusesimilarmodels

§ Example:nameentityrecognition(NER)

Prev Cur NextState Other ??? ???Word at Grace RoadTag IN NNP NNPSig x Xx Xx

Local Context

Tim Boon has signed a contract extension with Leicestershire which will keep him at Grace Road .

PER PER O O O O O O ORG O O O O O LOC LOC O

Page 38: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

MEMMTaggers§ Idea:left-to-rightlocaldecisions,conditiononprevioustags

andalsoentireinput

§ TrainupP(ti|w,ti-1,ti-2)asanormalmaxent model,thenusetoscoresequences

§ ThisisreferredtoasanMEMMtagger[Ratnaparkhi 96]§ Beamsearcheffective!(Why?)§ Whataboutbeamsize1?

§ Subtleissueswithlocalnormalization(cf.Laffertyetal01)

Page 39: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

NERFeatures

Feature Type Feature PERS LOCPrevious word at -0.73 0.94Current word Grace 0.03 0.00Beginning bigram <G 0.45 -0.04Current POS tag NNP 0.47 0.45Prev and cur tags IN NNP -0.10 0.14Previous state Other -0.70 -0.92Current signature Xx 0.80 0.46Prev state, cur sig O-Xx 0.68 0.37Prev-cur-next sig x-Xx-Xx -0.69 0.37P. state - p-cur sig O-x-Xx -0.20 0.82…Total: -0.58 2.68

Prev Cur NextState Other ??? ???Word at Grace RoadTag IN NNP NNPSig x Xx Xx

Local Context

Feature WeightsBecause of regularization term, the more common prefixes have larger weights even though entire-word features are more specific.

Page 40: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Decoding§ DecodingMEMMtaggers:

§ JustlikedecodingHMMs,differentlocalscores§ Viterbi,beamsearch,posteriordecoding

§ Viterbialgorithm(HMMs):

§ Viterbialgorithm(MEMMs):

§ General:

Page 41: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ConditionalRandomFields(andFriends)

Page 42: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

MaximumEntropyII

§ Remember:maximumentropyobjective

§ Problem:lotsoffeaturesallowperfectfittotrainingset§ Regularization(comparetosmoothing)

Page 43: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

DerivativeforMaximumEntropy

Big weights are bad

Total count of feature n in correct candidates

Expected count of feature n in predicted

candidates

Page 44: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

PerceptronReview

Page 45: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Perceptron§ Linearmodel:

§ …thatdecomposealongthesequence

§ …allowustopredictwiththeViterbialgorithm

§ …whichmeanswecantrainwiththeperceptronalgorithm(orrelatedupdates,likeMIRA)

[Collins 01]

Page 46: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ConditionalRandomFields§ Makeamaxentmodeloverentiretaggings

§ MEMM

§ CRF

Page 47: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

CRFs§ Likeanymaxent model,derivativeis:

§ Soallweneedistobeabletocomputetheexpectationofeachfeature(forexamplethenumberoftimesthelabelpairDT-NNoccurs,orthenumberoftimesNN-interestoccurs)underthemodeldistribution

§ Criticalquantity:countsofposteriormarginals:

Page 48: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ComputingPosteriorMarginals§ Howmany(expected)timesiswordwtaggedwiths?

§ Howtocomputethatmarginal?^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 49: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

GlobalDiscriminativeTaggers

§ Newer,higher-powereddiscriminativesequencemodels§ CRFs(alsoperceptrons,M3Ns)§ Donotdecomposetrainingintoindependentlocalregions§ Canbedeathlyslowtotrain– requirerepeatedinferenceontraining

set§ DifferencestendnottobetooimportantforPOStagging§ Differencesmoresubstantialonothersequencetasks§ However:oneissueworthknowingaboutinlocalmodels

§ “Labelbias”andotherexplainingawayeffects§ MEMMtaggers’localscorescanbenearonewithouthavingboth

good“transitions”and“emissions”§ Thismeansthatoftenevidencedoesn’tflowproperly§ Whyisn’tthisabigdealforPOStagging?§ Also:indecoding,conditiononpredicted,notgold,histories

Page 50: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Transformation-BasedLearning

§ [Brill95]presentsatransformation-basedtagger§ Labelthetrainingsetwithmostfrequenttags

DTMDVBDVBD.Thecanwasrusted.

§ Addtransformationruleswhichreducetrainingmistakes

§ MD® NN:DT__§ VBD® VBN:VBD__.

§ Stopwhennotransformationsdosufficientgood§ Doesthisremindanyoneofanything?

§ Probablythemostwidelyusedtagger(esp.outsideNLP)§ …butdefinitelynotthemostaccurate:96.6%/82.0%

Page 51: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

LearnedTransformations§ Whatgetslearned?[fromBrill95]

Page 52: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EngCGTagger

§ Englishconstraintgrammartagger§ [TapanainenandVoutilainen94]§ Somethingelseyoushouldknowabout§ Hand-writtenandknowledgedriven§ “Don’tguessifyouknow”(generalpoint

aboutmodelingmorestructure!)§ Tagsetdoesn’tmakeallofthehard

distinctionsasthestandardtagset(e.g.JJ/NN)

§ Theygetstellaraccuracies:99%ontheirtagset

§ Linguisticrepresentationmatters…§ …butit’seasiertowinwhenyoumakeup

therules

Page 53: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

DomainEffects§ Accuraciesdegradeoutsideofdomain

§ Uptotripleerrorrate§ Usuallymakethemosterrorsonthethingsyoucareaboutinthedomain(e.g.proteinnames)

§ Openquestions§ Howtoeffectivelyexploitunlabeleddatafromanewdomain(whatcouldwegain?)

§ Howtobestincorporatedomainlexicainaprincipledway(e.g.UMLSspecialistlexicon,ontologies)

Page 54: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

UnsupervisedTagging

Page 55: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

UnsupervisedTagging?§ AKApart-of-speechinduction§ Task:

§ Rawsentencesin§ Taggedsentencesout

§ Obviousthingtodo:§ Startwitha(mostly)uniformHMM§ RunEM§ Inspectresults

Page 56: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EMforHMMs:Process§ Alternatebetweenrecomputingdistributionsoverhiddenvariables(the

tags)andreestimatingparameters§ Crucialstep:wewanttotallyuphowmany(fractional)countsofeach

kindoftransitionandemissionwehaveundercurrentparams:

§ SamequantitiesweneededtotrainaCRF!

Page 57: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EMforHMMs:Quantities§ Totalpathvalues(correspondtoprobabilitieshere):

Page 58: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TheStateLattice/Trellis

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 59: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EMforHMMs:Process

§ Fromthesequantities,cancomputeexpectedtransitions:

§ Andemissions:

Page 60: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Merialdo:Setup§ Some(discouraging)experiments[Merialdo94]

§ Setup:§ Youknowthesetofallowabletagsforeachword§ Fixktrainingexamplestotheirtruelabels

§ LearnP(w|t)ontheseexamples§ LearnP(t|t-1,t-2)ontheseexamples

§ Onnexamples,re-estimatewithEM

§ Note:weknowallowedtagsbutnotfrequencies

Page 61: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Merialdo:Results

Page 62: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

DistributionalClustering

president the __ ofpresident the __ saidgovernor the __ ofgovernor the __ appointedsaid sources __ ¨said president __ thatreported sources __ ¨

presidentgovernor

saidreported

thea

¨ the president said that the downturn was over ¨

[Finch and Chater 92, Shuetze 93, many others]

Page 63: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

DistributionalClustering§ Threemainvariantsonthesameidea:

§ Pairwisesimilaritiesandheuristicclustering§ E.g.[FinchandChater92]§ Producesdendrograms

§ Vectorspacemethods§ E.g.[Shuetze93]§ Modelsofambiguity

§ Probabilisticmethods§ Variousformulations,e.g.[LeeandPereira99]

Page 64: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

NearestNeighbors

Page 65: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Dendrograms_

Page 66: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Dendrograms_

Page 67: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

VectorSpaceVersion§ [Shuetze93]clusterswordsaspointsinRn

§ Vectorstoosparse,useSVDtoreduce

Mw

context counts

US V

w

context counts

Cluster these 50-200 dim vectors instead.

Page 68: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Õ -=i

iiii ccPcwPCSP )|()|(),( 1

Õ +-=i

iiiiii cwwPcwPcPCSP )|,()|()(),( 11

AProbabilisticVersion?

¨ the president said that the downturn was over ¨

c1 c2 c6c5 c7c3 c4 c8

¨ the president said that the downturn was over ¨

c1 c2 c6c5 c7c3 c4 c8

Page 69: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

WhatElse?§ Variousnewerideas:

§ Contextdistributionalclustering[Clark00]§ Morphology-drivenmodels[Clark03]§ Contrastiveestimation[SmithandEisner05]§ Feature-richinduction[HaghighiandKlein06]

§ Also:§ Whataboutambiguouswords?§ Usingwidercontextsignatureshasbeenusedforlearningsynonyms(what’swrongwiththisapproach?)

§ Canextendtheseideasforgrammarinduction(later)

Page 70: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Computing Marginals

= sum of all paths through s at tsum of all paths

Page 71: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ForwardScores

Page 72: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Backward Scores

Page 73: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TotalScores

Page 74: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Syntax

Page 75: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ParseTrees

The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market

Page 76: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

PhraseStructureParsing§ Phrasestructureparsing

organizessyntaxintoconstituentsorbrackets

§ Ingeneral,thisinvolvesnestedtrees

§ Linguistscan,anddo,argueaboutdetails

§ Lotsofambiguity

§ Nottheonlykindofsyntax…

new art critics write reviews with computers

PP

NPNP

N’

NP

VP

S

Page 77: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ConstituencyTests

§ Howdoweknowwhatnodesgointhetree?

§ Classicconstituencytests:§ Substitutionbyproform§ Questionanswers§ Semanticgounds

§ Coherence§ Reference§ Idioms

§ Dislocation§ Conjunction

§ Cross-linguisticarguments,too

Page 78: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ConflictingTests§ Constituencyisn’talwaysclear

§ Unitsoftransfer:§ thinkabout~penser ৠtalkabout~hablar de

§ Phonologicalreduction:§ Iwillgo® I’llgo§ Iwanttogo® Iwanna go§ alecentre® aucentre

§ Coordination§ Hewenttoandcamefromthestore.

La vélocité des ondes sismiques

Page 79: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ClassicalNLP:Parsing

§ Writesymbolicorlogicalrules:

§ Usedeductionsystemstoproveparsesfromwords§ Minimalgrammaron“Fedraises”sentence:36parses§ Simple10-rulegrammar:592parses§ Real-sizegrammar:manymillionsofparses

§ Thisscaledverybadly,didn’tyieldbroad-coveragetools

Grammar (CFG) Lexicon

ROOT ® S

S ® NP VP

NP ® DT NN

NP ® NN NNS

NN ® interest

NNS ® raises

VBP ® interest

VBZ ® raises

NP ® NP PP

VP ® VBP NP

VP ® VBP NP PP

PP ® IN NP

Page 80: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Ambiguities

Page 81: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Ambiguities:PPAttachment

Page 82: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Attachments

§ Icleanedthedishesfromdinner

§ Icleanedthedisheswithdetergent

§ Icleanedthedishesinmypajamas

§ Icleanedthedishesinthesink

Page 83: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

SyntacticAmbiguitiesI

§ Prepositionalphrases:Theycookedthebeansinthepotonthestovewithhandles.

§ Particlevs.preposition:Thepuppytoreupthestaircase.

§ ComplementstructuresThetouristsobjectedtotheguidethattheycouldn’thear.Sheknowsyoulikethebackofherhand.

§ Gerundvs.participialadjectiveVisitingrelativescanbeboring.Changingschedulesfrequentlyconfusedpassengers.

Page 84: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

SyntacticAmbiguitiesII§ ModifierscopewithinNPs

impracticaldesignrequirementsplasticcupholder

§ MultiplegapconstructionsThechickenisreadytoeat.Thecontractorsarerichenoughtosue.

§ Coordinationscope:Smallratsandmicecansqueezeintoholesorcracksinthewall.

Page 85: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

DarkAmbiguities

§ Darkambiguities: mostanalysesareshockinglybad(meaning,theydon’thaveaninterpretationyoucangetyourmindaround)

§ Unknownwordsandnewusages§ Solution:Weneedmechanismstofocusattentiononthebestones,probabilistictechniquesdothis

Thisanalysiscorrespondstothecorrectparseof

“Thiswillpanicbuyers!”

Page 86: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

AmbiguitiesasTrees

Page 87: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

PCFGs

Page 88: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ProbabilisticContext-FreeGrammars

§ Acontext-freegrammarisatuple<N,T,S,R>§ N :thesetofnon-terminals

§ Phrasalcategories:S,NP,VP,ADJP,etc.§ Parts-of-speech(pre-terminals):NN,JJ,DT,VB

§ T :thesetofterminals(thewords)§ S :thestartsymbol

§ OftenwrittenasROOTorTOP§ Notusuallythesentencenon-terminalS

§ R :thesetofrules§ OftheformX® Y1 Y2 …Yk,withX,Yi Î N§ Examples:S® NPVP,VP® VPCCVP§ Alsocalledrewrites,productions,orlocaltrees

§ APCFGadds:§ Atop-downproductionprobabilityperruleP(Y1 Y2 …Yk|X)

Page 89: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TreebankSentences

Page 90: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TreebankGrammars

§ NeedaPCFGforbroadcoverageparsing.§ Cantakeagrammarrightoffthetrees(doesn’tworkwell):

§ Betterresultsbyenrichingthegrammar(e.g.,lexicalization).§ Canalsogetstate-of-the-artparserswithoutlexicalization.

ROOT ® S 1

S ® NP VP . 1

NP ® PRP 1

VP ® VBD ADJP 1

…..

Page 91: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

PLURAL NOUN

NOUNDETDET

ADJ

NOUN

NP NP

CONJ

NP PP

TreebankGrammarScale

§ Treebankgrammarscanbeenormous§ AsFSAs,therawgrammarhas~10Kstates,excludingthelexicon§ Betterparsersusuallymakethegrammarslarger,notsmaller

NP

Page 92: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ChomskyNormalForm

§ Chomskynormalform:§ AllrulesoftheformX® YZorX® w§ Inprinciple,thisisnolimitationonthespaceof(P)CFGs

§ N-aryrulesintroducenewnon-terminals

§ Unaries/emptiesare“promoted”§ Inpracticeit’skindofapain:

§ Reconstructingn-ariesiseasy§ Reconstructingunariesistrickier§ Thestraightforwardtransformationsdon’tpreservetreescores

§ Makesparsingalgorithmssimpler!

VP

[VP ® VBD NP •]

VBD NP PP PP

[VP ® VBD NP PP •]

VBD NP PP PP

VP

Page 93: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

CKYParsing

Page 94: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ARecursiveParser

§ Willthisparserwork?§ Whyorwhynot?§ Memoryrequirements?

bestScore(X,i,j,s)if (j = i+1)

return tagScore(X,s[i])else

return max score(X->YZ) *bestScore(Y,i,k) *bestScore(Z,k,j)

Page 95: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

AMemoizedParser§ Onesmallchange:

bestScore(X,i,j,s)if (scores[X][i][j] == null)

if (j = i+1)score = tagScore(X,s[i])

elsescore = max score(X->YZ) *

bestScore(Y,i,k) *bestScore(Z,k,j)

scores[X][i][j] = scorereturn scores[X][i][j]

Page 96: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

§ Canalsoorganizethingsbottom-up

ABottom-UpParser(CKY)

bestScore(s)for (i : [0,n-1])for (X : tags[s[i]])score[X][i][i+1] =

tagScore(X,s[i])for (diff : [2,n])for (i : [0,n-diff])j = i + difffor (X->YZ : rule)for (k : [i+1, j-1])score[X][i][j] = max score[X][i][j],

score(X->YZ) *score[Y][i][k] *score[Z][k][j]

Y Z

X

i k j

Page 97: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

UnaryRules§ Unaryrules?

bestScore(X,i,j,s)if (j = i+1)

return tagScore(X,s[i])else

return max max score(X->YZ) *bestScore(Y,i,k) *bestScore(Z,k,j)

max score(X->Y) *bestScore(Y,i,j)

Page 98: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

CNF+UnaryClosure

§ Weneedunariestobenon-cyclic§ Canaddressbypre-calculatingtheunaryclosure§ Ratherthanhavingzeroormoreunaries,alwayshaveexactlyone

§ Alternateunaryandbinarylayers§ Reconstructunarychainsafterwards

NP

DT NN

VP

VBDNP

DT NN

VP

VBD NP

VP

S

SBAR

VP

SBAR

Page 99: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

AlternatingLayers

bestScoreU(X,i,j,s)if (j = i+1)

return tagScore(X,s[i])else

return max max score(X->Y) *bestScoreB(Y,i,j)

bestScoreB(X,i,j,s)return max max score(X->YZ) *

bestScoreU(Y,i,k) *bestScoreU(Z,k,j)

Page 100: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Analysis

Page 101: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Memory§ Howmuchmemorydoesthisrequire?

§ Havetostorethescorecache§ Cachesize:|symbols|*n2 doubles§ Fortheplaintreebankgrammar:

§ X~20K,n=40,double~8bytes=~256MB§ Big,butworkable.

§ Pruning:Beams§ score[X][i][j]cangettoolarge(when?)§ Cankeepbeams(truncatedmapsscore[i][j])whichonlystorethebestfew

scoresforthespan[i,j]

§ Pruning:Coarse-to-Fine§ UseasmallergrammartoruleoutmostX[i,j]§ Muchmoreonthislater…

Page 102: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Time:Theory§ Howmuchtimewillittaketoparse?

§ Foreachdiff(<=n)§ Foreachi (<=n)

§ ForeachruleX® YZ§ ForeachsplitpointkDoconstantwork

§ Totaltime:|rules|*n3

§ Somethinglike5secforanunoptimized parseofa20-wordsentence

Y Z

X

i k j

Page 103: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Time:Practice

§ Parsingwiththevanillatreebank grammar:

§ Why’sitworseinpractice?§ Longersentences“unlock”moreofthegrammar§ Allkindsofsystemsissuesdon’tscale

~ 20K Rules

(not an optimized parser!)

Observed exponent:

3.6

Page 104: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Same-SpanReachability

ADJP ADVPFRAG INTJ NPPP PRN QP SSBAR UCP VP

WHNP

TOP

LST

CONJP

WHADJP

WHADVP

WHPP

NX

NAC

SBARQ

SINV

RRCSQ X

PRT

Page 105: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

RuleStateReachability

§ Manystatesaremorelikelytomatchlargerspans!

Example: NP CC •

NP CC

0 nn-11 Alignment

Example: NP CC NP •

NP CC

0 nn-k-1n AlignmentsNP

n-k

Page 106: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EfficientCKY

§ LotsoftrickstomakeCKYefficient§ Someofthemarelittleengineeringdetails:

§ E.g.,firstchoosek,thenenumeratethroughtheY:[i,k]whicharenon-zero,thenloopthroughrulesbyleftchild.

§ Optimallayoutofthedynamicprogramdependsongrammar,input,evensystemdetails.

§ Anotherkindismoreimportant(andinteresting):§ ManyX[i,j]canbesuppressedonthebasisoftheinputstring§ We’llseethisnextclassasfigures-of-merit,A*heuristics,coarse-to-fine,etc

Page 107: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Agenda-BasedParsing

Page 108: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Agenda-BasedParsing§ Agenda-basedparsingislikegraphsearch(butovera

hypergraph)§ Concepts:

§ Numbering:wenumberfencepostsbetweenwords§ “Edges”oritems:spanswithlabels,e.g.PP[3,5],representthesetsof

treesoverthosewordsrootedatthatlabel(cf.searchstates)§ Achart:recordsedgeswe’veexpanded(cf.closedset)§ Anagenda:aqueuewhichholdsedges(cf.afringeoropenset)

0 1 2 3 4 5critics write reviews with computers

PP

Page 109: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

WordItems§ Buildinganitemforthefirsttimeiscalleddiscovery.Itemsgo

intotheagendaondiscovery.§ Toinitialize,wediscoverallworditems(withscore1.0).

critics write reviews with computers

critics[0,1], write[1,2], reviews[2,3], with[3,4], computers[4,5]

0 1 2 3 4 5

AGENDA

CHART [EMPTY]

Page 110: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

UnaryProjection§ Whenwepopaworditem,thelexicontellsusthetagitem

successors(andscores)whichgoontheagenda

critics write reviews with computers

0 1 2 3 4 5critics write reviews with computers

critics[0,1] write[1,2]NNS[0,1]

reviews[2,3] with[3,4] computers[4,5]VBP[1,2] NNS[2,3] IN[3,4] NNS[4,5]

Page 111: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

ItemSuccessors§ Whenwepopitemsoffoftheagenda:

§ Graphsuccessors:unaryprojections(NNS® critics,NP® NNS)

§ Hypergraph successors:combinewithitemsalreadyinourchart

§ Enqueue /promoteresultingitems(ifnotinchartalready)§ Recordbacktraces asappropriate§ Stickthepoppededgeinthechart(closedset)

§ Queriesachartmustsupport:§ IsedgeX[i,j]inthechart?(Whatscore?)§ WhatedgeswithlabelYendatpositionj?§ WhatedgeswithlabelZstartatpositioni?

Y[i,j] with X ® Y forms X[i,j]

Y[i,j] and Z[j,k] with X ® Y Z form X[i,k]

Y Z

X

Page 112: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

AnExample

0 1 2 3 4 5critics write reviews with computers

NNS VBP NNS IN NNS

NNS[0,1] VBP[1,2] NNS[2,3] IN[3,4] NNS[3,4] NP[0,1] NP[2,3] NP[4,5]

NP NP NP

VP[1,2] S[0,2]

VP

PP[3,5]

PP

VP[1,3]

VP

ROOT[0,2]

SROOT

SROOT

S[0,3] VP[1,5]

VP

NP[2,5]

NP

ROOT[0,3] S[0,5] ROOT[0,5]

S

ROOT

Page 113: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EmptyElements§ Sometimeswewanttopositnodesinaparsetreethatdon’t

containanypronouncedwords:

§ Theseareeasytoaddtoaagenda-basedparser!§ Foreachpositioni,addthe“word”edgee[i,i]§ AddruleslikeNP® e tothegrammar§ That’sit!

0 1 2 3 4 5I like to parse empties

e e e e e e

NP VP

I want you to parse this sentence

I want [ ] to parse this sentence

Page 114: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

UCS/A*

§ Withweightededges,ordermatters§ Mustexpandoptimalparsefrom

bottomup(subparsesfirst)§ CKYdoesthisbyprocessingsmaller

spansbeforelargerones§ UCSpopsitemsofftheagendain

orderofdecreasingViterbiscore§ A*searchalsowelldefined

§ Youcanalsospeedupthesearchwithoutsacrificingoptimality§ Canselectwhichitemstoprocessfirst§ Candowithany“figureofmerit”

[Charniak98]§ Ifyourfigure-of-meritisavalidA*

heuristic,nolossofoptimiality[KleinandManning03]

X

n0 i j

Page 115: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

(Speech)Lattices§ Therewasnothingmagicalaboutwordsspanningexactly

oneposition.§ Whenworkingwithspeech,wegenerallydon’tknow

howmanywordsthereare,orwheretheybreak.§ Wecanrepresentthepossibilitiesasalatticeandparse

thesejustaseasily.

Iawe

of

van

eyes

sawa

‘ve

an

Ivan

Page 116: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

UnsupervisedTagging

Page 117: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

UnsupervisedTagging?§ AKApart-of-speechinduction§ Task:

§ Rawsentencesin§ Taggedsentencesout

§ Obviousthingtodo:§ Startwitha(mostly)uniformHMM§ RunEM§ Inspectresults

Page 118: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EMforHMMs:Process§ Alternatebetweenrecomputingdistributionsoverhiddenvariables(the

tags)andreestimatingparameters§ Crucialstep:wewanttotallyuphowmany(fractional)countsofeach

kindoftransitionandemissionwehaveundercurrentparams:

§ SamequantitiesweneededtotrainaCRF!

Page 119: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EMforHMMs:Quantities§ Totalpathvalues(correspondtoprobabilitieshere):

Page 120: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

TheStateLattice/Trellis

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 121: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

EMforHMMs:Process

§ Fromthesequantities,cancomputeexpectedtransitions:

§ Andemissions:

Page 122: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Merialdo:Setup§ Some(discouraging)experiments[Merialdo94]

§ Setup:§ Youknowthesetofallowabletagsforeachword§ Fixktrainingexamplestotheirtruelabels

§ LearnP(w|t)ontheseexamples§ LearnP(t|t-1,t-2)ontheseexamples

§ Onnexamples,re-estimatewithEM

§ Note:weknowallowedtagsbutnotfrequencies

Page 123: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Merialdo:Results

Page 124: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

DistributionalClustering

president the __ ofpresident the __ saidgovernor the __ ofgovernor the __ appointedsaid sources __ ¨said president __ thatreported sources __ ¨

presidentgovernor

saidreported

thea

¨ the president said that the downturn was over ¨

[Finch and Chater 92, Shuetze 93, many others]

Page 125: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

DistributionalClustering§ Threemainvariantsonthesameidea:

§ Pairwisesimilaritiesandheuristicclustering§ E.g.[FinchandChater92]§ Producesdendrograms

§ Vectorspacemethods§ E.g.[Shuetze93]§ Modelsofambiguity

§ Probabilisticmethods§ Variousformulations,e.g.[LeeandPereira99]

Page 126: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

NearestNeighbors

Page 127: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Dendrograms_

Page 128: Algorithms for NLP - cs.cmu.edutbergkir/11711fa17/FA17 11-711 lecture 9 -- parsing... · §If between words, this is P(word | history) §If inside words, this is P ... self-loop probabilities

Dendrograms_