Top Banner
Part 2: Detecting and Correcting Odd Collocations in Text 1 Commonsense for Machine Intelligence: Text to Knowledge and Knowledge to Text
33

Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

May 09, 2018

Download

Documents

buihanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

Part2:DetectingandCorrectingOddCollocationsinText

1

CommonsenseforMachineIntelligence:TexttoKnowledgeandKnowledgetoText

Page 2: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

IntroductiontoCollocations

• Correctnativespeakerexpressioninagivenlanguage

• Strongtea(notpowerfultea)• Clearsky(notpuresky)• Gohome(notgotohome)• Gotoschool(notgoschool)• Housearrest(notarresthouse)• Friendcircle(notcirclefriend)

2

Page 3: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollocationErrorsorOddCollocations

• Expressionsthatmaybegrammaticallycorrect,nottypicalamongnativespeakers• Redmeat&whitemeatarecorrectcollocationsinEnglish• TheirliteraltranslationsareoddcollocationsinGerman• NotusuallyusedbyDeutschespeakers• Machinetranslationcanoftencausesuchcollocationerrors• Canbeduetolackofcommonsense&worldknowledge

3

Page 4: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollocationsandIdioms

• Somecollocationsareidiomaticexpressions:“couchpotato”• Literalidiomtranslationmaybetotallyabsurd:“sofapotato”• Note:Correctidiomusage&translationisharder

• Allcollocationsarenotidioms,e.g.,“fastcars”(vs“quickcars”)• Yet,correctcollocationusageisimportantinmanysituations

4

Page 5: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

MotivationtoAddressCollocations– DailyCommunication

• Touristwants“blackcoffee”(regularcoffeewithoutmilk)inacoffeeshop• Asksfor“darkcoffee”usingonlinetranslationhelp• Serverbringscoffeewithmilk,madewithdarkestcoffeebeansavailable• Thisisnotwhatthetouristintended…• Whatifheislactoseintolerant?

• Note:“CoffeeShop”inAmsterdammightmeansomethingcompletelydifferentJ Aplacefordrugs!• Importanttoaddresscollocationswithcommonsense&worldknowledge

5

Page 6: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

MotivationtoAddressCollocations– WrittenTexts

• ClassicBiblequotealsoinShakespeare’sHamlet

• Literalmachinetranslationcanyielddifferentmeaning!• Collocationse.g.,“willingspirit”&“weakflesh”mustbetranslatedwithcommonsense&referencetocontext

6

Page 7: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

MotivationtoAddressCollocations– SearchEngines

• Oddcollocation“quickcars”returnsfewerhits& lessappropriateresults• Correctcollocation“fastcars”showsbettersite&imagesofcarsasgoodsearchresults• Machinetranslationhelpforsearchenginesshouldfixcollocationerrors

7

Page 8: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

TechniquestoAddressOddCollocations

• TreatmentofCollocations• Differenttypesoddlycollocatedterms• Examplesofeachtypewithproblemscaused

• LinguisticClassification• Classifyingtermsascorrectvsincorrectcollocations• Consideringassociations/usingsourcelanguage

• DetectionandCorrection• Findingvariousincorrectlycollocatedtermsusingfrequencyetc.• Providingcorrectresponses,similaritymeasures,rankingthesuggestions

8

Page 9: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

TreatmentofCollocations

• Collocationsaretypicallytreatedindifferentcategories• InsertionErrors:addingawrongterm• DeletionErrors:omittingarequiredterm• TranspositionErrors:changingorderofterms• SubstitutionErrors:usingoneterminsteadofanother

• Webrieflydescribeeachtypewithexamplesandtheproblemstheycouldcause

9

Page 10: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

InsertionErrors• Theseincludeaddingatermnotappropriateinacorrectnativespeakerexpression

“Iwentto home” vs“Iwenthome”

“Whenwillyoureturnbackfrom Singapore?”vs“WhenwillyoureturnfromSingapore?”

“Takeabreakforthelunch”vs“Takeabreakforlunch”

• Articleerrorsquitecommoninthiscategory(addingunnecessaryarticles)• Manyoftheseerrorsinvolvegrammaticalmistakes• Thesetypesoferrorscreateproblemsin

• Fluencyofspeechespeciallyatformalevents• Clarityofwrittendocuments

10

Page 11: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

DeletionErrors• Thesearetheoppositeofinsertionerrors&involvemissingatermneededinanexpression

“Einsteinwasscientist”vs“Einsteinwasascientist”

“Hiresomeonetodojob”vs“Hiresomeonetodothejob”

“Letuswaither”vs“Letuswaitforher”

• Theyalsocreatesimilarproblemswithrespecttofluencyandclarity• Manydeletionerrorsalsopertaintoodduseofarticles(omittinganecessaryone)• Approachesintheliteratureforarticleerrortreatmentareapplicablehere• Thesealsooftenpertaintogrammaticalmistakes 11

Page 12: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

TranspositionErrors

• Theseerrorsoccurwhentermsarenotplacedintheappropriateorder• Theycouldbemoreproblematicthaninsertion&deletionerrors

“Don’ttalkwithyourfullmouth”vs“Don’ttalkwithyourmouthfull”

“Howtomakefriendshipsclose”vs“Howtomakeclosefriendships”

• Theymightconveythewrongmeaning,e.g.,talkingwithyourfullmouthisdifferentfromtalkingwithyourmouthfull• Sometimesit’salmosttheoppositemeaning,e.g.,closefriendshipsvsfriendshipsclose• Often,knowingnativelanguageofspeaker/originofthesourcetextmighthelphere

12

Page 13: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

SubstitutionErrors

• Theseinvolveusinganinappropriateterminanexpressioninsteadofatermincorrectusage

“Thisactordoes money”vs“Thisactormakesmoney”

“Whereisthenearestquickfood place?”vs“Whereisthenearestfastfoodplace?”

• Mostcommontypesofcollocationerrors• Oftencausemiscommunicationproblemswhiletalking,writing,searchingetc.• Manyapproachesintheliteratureaddressmainlysubstitutionerrors• Theycanbepotentiallyappliedtoaddresstheothertypesaswell• Incorporationofcommonsenseknowledgeisparticularlyusefulhere

13

Page 14: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

AddressingOddCollocationsbyLinguisticClassification

• Someworksfocusonclassifyingcollocationerrorsfromalinguisticperspective• Usingcollocationmeasuresonsyntacticpatternsforlexicalclassificationascorrectlycollocatedtermvserror[Futagi etal.,2008]• Consideringsourcelanguage(ofESLlearnerormachinegeneratedtext)toclassifycollocations[Dahlmeier,2011]

14

Page 15: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollocationMeasuresonSyntacticPatterns

• Thisworkaddresses7aspectsoflexicalcollocations• Collocationerrorslexicallyclassifiedusingcandidatewordstrings• POStaggingoftextsisconductedfollowedbypatternmatching

15

[Futagi etal.]

Page 16: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollocationMeasuresonSyntacticPatterns(Contd.)

• Afterspellchecking,variantsofwordstringsbuiltwitharticles,synonymsetc.• WordstringslookedupinareferenceDB(RRDB)tofindamatch• Ifnomatchfound,itisclassifiedasacollocationerror

[Futagi etal.]

16

Page 17: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollocationMeasuresonSyntacticPatterns(Contd.)

• Measureofcollocationstrength• Rankratiostatistic• From1bwordsofnativespeakertexts• Incorporatingcommonsenseknowledge

• Whenevaluatedbyagoldstandardwithnativespeakers,this workgivesaround85%precisioninclassification• Thisworkdoesnotprovidecorrectsuggestionsasresponsestocollocationerrors

[Futagi etal.]

17

Page 18: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

SourceLanguagetoClassifyCollocations

• Errorsoftencausedbysemanticsimilarityofwordsinsourcelanguage• ThisiscalledtheL1language• Literaltranslationtodestinationlanguagecancausecollocationerrors• Thus,L1inducedparaphrasesareproposedforclassifyingcollocations

18

OveradozenEnglishTranslations:look,see,watch,readetc.

vs

[Dahlmeier etal.]

PossibletranslationfromsourceIliketolookmovies

Iliketowatchmovies

Page 19: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

SourceLanguagetoClassifyCollocations(Contd.)

• NUCLE:Annotated1mwordcorpusof1400essaysbyESLuniversitystudents• Annotatedwithstart&endoffset,errortype,goldstandardcorrection• IncorporatescommonsenseknowledgefromprofessionalEnglishinstructors• Theyfilteroutpreposition&articleerrors,focusoncollocationsinvolvingsemantics

19

StatisticsofNUCLEAnalysis

[Dahlmeier etal.]

Page 20: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

SourceLanguagetoClassifyCollocations(Contd.)

• Detectederrorsclassifiedas:Spelling,Homophone,Synonyms,L1-transfer• Spelling:Editdist.(erroneousphrase,correction)<threshold• Homophone:(erroneousword,correction)havesamepronunciation• Synonym:(erroneousword,correction)havesimilarmeaning• L1-transfer:(erroneousphrase,correction)shareacommontranslation

[Dahlmeier etal.]

20

Page 21: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

SourceLanguagetoClassifyCollocations(Contd.)

• NumberoferrorsinL1-transfer> othertypes• ExtractEnglish-L1,L1-Englishphrasesmax3words• Phraseextractionheuristic:

• Here,f:foreignlanguagephrase• Translationprobabilitiesp(e1|f),p(f|e2)predictedbymaxlikelihoodestimation• Onlykeepphraseswithprobability>threshold(0.001inthiswork)• Thisservesasthebasisforsuggestingcorrections

[Dahlmeier etal.]

AnalysisofCollocationErrors

21

Page 22: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

Discussion

• Theseresearchworksclearlyfocusmoreonlexicalclassificationofcollocationerrors• Linguisticperspectivesaresignificanthere• Commonsenseknowledgeisincludedincollocationerrorclassificationusingcorporafromnativespeakers/Englishinstructors• Theseworksprovideaninsightintothereasonsforcollocationerrorsandtheirgrammaticalplacements• Suchresearchheadstowardsproposingcorrectivemeasures

22

Page 23: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollocationErrorDetectionandCorrection

• Theseapproachesdeveloptoolsfortheactualdetectionandcorrectionofcollocationerrors• AwkChecker:Whileauserwritesatextdocument,flagcollocationerrorsandsuggestreplacementsthatcorrespondcloselytoconsensususingword-levelstatisticaln-grams[Parketal.,2008]• CollOrder:Whenauserentersaterminthetool,detectcollocationerrorsandprovidecorrectlyorderedcollocatedresponsesasoutputsusinganensembleofsimilaritymeasures[Vargheseetal.,2015]

23

Page 24: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

AwkChecker

• End-usertooltocorrectcollocationerrorsinwrittendocuments• Userswritetext,AwkwardphrasesareCheckedbyhighlightingthem• Userscanclickawkwardphrasestoseesuggestedreplacements• 1st evertoolforcollocationerrorcorrection

24

AwkChecker’s userinterface:A)FlaggedphrasesinthecompositionwindowB)Suggestedreplacementfor“powerfultea”

[Parketal.]

Page 25: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

AwkChecker (Contd.)

• Buildsstatisticaln-grams(sequencesofnwords)fromtrainingcorpus&recordsfrequencies• Analyzesuserinputagainstcorpustofindifaphraseisacollocationerror• Flagserrorifthereexistsimilarphraseswithfrequency>inputfrequency• Generatesreplacementsusingn-gramfrequencybasedapproach• Candidateswithmuchhigherfrequencyarepotentialreplacements

25

[Parketal.]

Page 26: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

AwkChecker (Contd.)

• Statisticaln-gramsareusedoverrelevantcorporaincludingWikipedia• Helpfulincapturingcommonsensewithdomain-specificknowledgeusingfrequency-basedapproach• Example:Referringtoamedicalcorpustoflagphrasesawkwardinmedicalresearchwriting• Assumption:Relevantcorporaarecorrectmorefrequentlythantheyareincorrect• Evaluationrevealsusefulnessincollocationcorrection,butdetailsofaccuracynotdiscussed

26

[Parketal.]

Page 27: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollOrder• Detects&correctscollocationerrorsintermsinputtothetool• Outputsrankedresponsesofcorrectlycollocatedterms• Correctcollocationssource:ANC/BNC(American/BritishNationalCorpus)• Includescommonsenseknowledgefromnativespeakers’writings• UsefulinWebqueries,textdocuments,ESLtranslationetc.

27

ApproachintheCollOrder tool[Vargheseetal.]

Page 28: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollOrder (Contd.)• Ensembleofmeasuresisusedforsimilaritysearchandranking• ConditionalProbability:MeasuresrelativeoccurrenceoftermsA&B

• Jaccard’s Coefficient:MeasuresextentofsemanticsimilaritybetweenA&B

• WebJaccard:Toreduceadverseeffectsofrandomco-occurrence(duetoscale&noiseinWebdata)[Bolegalla etal.,2009]

28

[Vargheseetal.]

Page 29: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

CollOrder (Contd.)

• These&othermeasures(FrequencyNormalized,FrequencyRatio)areused[Vargheseetal.,2015]• Differentmeasuresempiricallyyieldgoodresultsindifferentscenarios• Ensembleofmeasureswithclassifiersthusproposedtooptimizeperformance• Classifierused:JRIP,implementationofRIPPER(RepeatedIncrementalPruningtoProduceErrorReduction)[Cohen,1995]• CollOrder evaluationwithMTurk onnativespeakers:Averageaccuracy92.44%

29

Exampleofensemblelearningbytheclassifier“bluesky”isavalidsuggestion,classifiedas“y”“nightsky”isnotavalidsuggestion,classifiedas“n”

[Vargheseetal.]

Page 30: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

OtherRelatedWorks

• [Ramosetal.,2010]buildannotationschemawith3DtopologytoclassifycollocationsmainlyinSpanish&Englishtranslation:• 1st dimensionfindsiferrorisforwholeorpartofcollocation• 2nd dimensiondoeslanguage-orientederroranalysis• 3rd dimensiondoesinterpretiveerroranalysis

• [Lietal.,2009]useaprobabilisticapproachforcollocationcorrection:• UseBNCandWordNetaslanguagelearningsources• Suggestcorrectionsbasedoncommonlyusedexpressions• Donotdevelopatoolforcollocationdetection&correction

30

Page 31: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

Discussion

• Collocationerrorcorrectiontoolsintheliteraturearefoundusefulbyusers• Commonsenseknowledgefromnativespeakersistypicallyentailedinthesourcecorporausedforlearning• Approachesinlinguisticclassificationaswellasincollocationcorrectionrelyheavilyonfrequency

• Thus,potentialissuesrelatedtosparsedatawithcorrectcollocationscallforfurtherresearch

31

Page 32: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

TexttoKnowledgeandKnowledgetoText

• Collocationapproachesstartwithtextandextractknowledgefromcorpora• Differentmethodsusedforknowledgeextraction - probabilistic,ensemble• Extractedknowledgeusedforlinguisticclassification,errorcorrection

• Statisticaltextcategorizationoccursduetoanalysisinlinguisticclassification• Correctlycollocatedtextresponsesofferedassuggestionsinerrorcorrection• Thus,extractedknowledge servestoprovidetextbasedoutputs

• Commonsense knowledgeplaysarolemainlyinsourcecorporafromnativespeakers&expertwritings

• Thiscontributestomachineintelligencebyprovidingbettermachinetranslationincorporatingcommonsense

32

Page 33: Commonsense for Machine Intelligence: Text to Knowledge ...people.mpi-inf.mpg.de/~ntandon/presentations/cikm-2017-tutorial... · Commonsense for Machine Intelligence: Text to Knowledge

References• Bollegala,D.,Matsuo,Y.andIshizuka,M.,Measuringthesimilaritybetweenimplicitsemanticrelationsusingwebsearchengines,WSDM2009,pp.104-113.

• Cohen,W.,Fasteffectiveruleinduction.InProceedingsoftheInternationalConferenceonMachineLearning,ICML1995,pp.115–123.

• Dahlmeier,D.andNg.,H.T.,Correctingsemanticcollocationerrorswithl1-inducedparaphrases.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,EMNLP2011,pp.107–117.

• Futagi,Y.,Deane,P.,Chodorow,M.andTetreault.,J.,Acomputationalapproachtodetectingcollocationerrorsinthewritingofnon-nativespeakersofEnglish, ComputerAssistedLanguageLearning2008,21(4):353–367.

• Li-E,L.A.,Wible,D.andTsao,N-L.,Automatedsuggestionsformiscollocations,Proceedingsofthe4thWorkshoponInnovativeUseofNLPforBuildingEducationalApplications,2009,pp.47-50.

• Park,T.,Lank,E.,Poupart,P.andTerry,M.,Istheskypuretoday- Awkchecker:Anassistivetoolfordetectingandcorrectingcollocationerrors,ACMSymposiumonUserInterfaceSoftwareandTechnology2008,pages121–130.

• Ramos,M.A.,Wanner,L.,Vincze,O.,delBosque,G.C.,Veiga,N.V.,Suárez,E.M.andGonzález,S.P.,TowardsaMotivatedAnnotationSchemaofCollocationErrorsinLearnerCorpora,LREC2010, pp.3209-3214.

• Varghese,A.,Varde,A.,Peng,J.andFitzpatrick.E.,AframeworkforcollocationerrorcorrectioninWebpagesandtextdocuments,ACMSIGKDDExplorations2015,17(1):14–23. 33