Top Banner
5525: Speech and Language Processing Alan Ri8er (many slides from Greg Durrett)
156

5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Jul 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

5525:SpeechandLanguageProcessing

AlanRi8er(many slides from Greg Durrett)

Page 2: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Administrivia

‣ Coursewebsite: h8p://ari8er.github.io/courses/5525_fall19.html

‣ Piazza:linkonthecoursewebsite

‣ Myofficehours:Friday4-5pmDL595

‣ TA:AshutoshBaheP;Officehours:Wednesday1-2pm,DL574

Page 3: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

CourseRequirements

‣ Priorexposuretomachinelearningveryhelpfulbutnotrequired

‣ Programming/Pythonexperience

‣ Probability

‣ LinearAlgebra

‣ Calculus

Page 4: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

CourseRequirements

‣ Priorexposuretomachinelearningveryhelpfulbutnotrequired

‣ Programming/Pythonexperience

‣ Probability

‣ LinearAlgebra

‣ Calculus

There will be a lot of math and programming!

Page 5: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Enrollment

Page 6: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Enrollment

‣ Homework1isoutnow(dueAugust30):

Page 7: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Enrollment

‣ Homework1isoutnow(dueAugust30):

‣ Pleaselookattheassignmentwellbeforethen

Page 8: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Enrollment

‣ Homework1isoutnow(dueAugust30):

‣ Ifthisseemslikeit’llbechallengingforyou,comeandtalktome(thisissmaller-scalethanthelaterassignments,whicharesmaller-scalethanthefinalproject)

‣ Pleaselookattheassignmentwellbeforethen

Page 9: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Texts

‣ 2greattextbooksforNLP

‣ Therewillbeassignedreadingsfromboth

‣ Bothfreelyavailableonline

Page 10: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

Page 11: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Page 12: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

‣ Example:dialoguesystems

Page 13: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

‣ Example:dialoguesystems

Page 14: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

‣ Example:dialoguesystems

Page 15: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

‣ Example:dialoguesystems

Page 16: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

WhoisitsCEO?

‣ Example:dialoguesystems

Page 17: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

WhoisitsCEO?

‣ Example:dialoguesystems

TimCook

Page 18: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

recognizemarketCapisthetargetvalue

WhoisitsCEO?

‣ Example:dialoguesystems

TimCook

Page 19: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

recognizemarketCapisthetargetvalue

recognizepredicate

WhoisitsCEO?

‣ Example:dialoguesystems

TimCook

Page 20: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

recognizemarketCapisthetargetvalue

recognizepredicate

docomputaPon

WhoisitsCEO?

‣ Example:dialoguesystems

TimCook

Page 21: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

recognizemarketCapisthetargetvalue

recognizepredicate

docomputaPon

WhoisitsCEO?

‣ Example:dialoguesystems

resolvereferences

TimCook

Page 22: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

AutomaPcSummarizaPon

Page 23: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

AutomaPcSummarizaPon

Page 24: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

AutomaPcSummarizaPon

OneofNewAmerica’swriterspostedastatementcriPcalofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

Page 25: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

AutomaPcSummarizaPon

OneofNewAmerica’swriterspostedastatementcriPcalofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

compresstext

Page 26: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

AutomaPcSummarizaPon

OneofNewAmerica’swriterspostedastatementcriPcalofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

providemissingcontext

compresstext

Page 27: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

AutomaPcSummarizaPon

OneofNewAmerica’swriterspostedastatementcriPcalofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

providemissingcontext

paraphrasetoprovideclarity

compresstext

Page 28: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

MachineTranslaPon

People’sDaily,August30,2017

Page 29: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

MachineTranslaPon

People’sDaily,August30,2017

Page 30: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

MachineTranslaPon

TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony

People’sDaily,August30,2017

Page 31: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

MachineTranslaPon

TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony

People’sDaily,August30,2017

Page 32: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

Page 33: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipelineText

Page 34: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

TextAnalysisText

Page 35: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

TextAnalysisText Annota.ons

Page 36: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

TextAnalysis Applica.onsText Annota.ons

Page 37: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

TextAnalysis Applica.onsText Annota.ons

Page 38: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

TextAnalysis Applica.onsText Annota.ons

Page 39: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

TextAnalysis Applica.onsText Annota.ons

Page 40: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

IdenPfysenPment

TextAnalysis Applica.onsText Annota.ons

Page 41: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

IdenPfysenPment

Translate

TextAnalysis Applica.onsText Annota.ons

Page 42: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

IdenPfysenPment

‣ NLPisaboutbuildingthesepieces!Translate

TextAnalysis Applica.onsText Annota.ons

Page 43: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

IdenPfysenPment

‣ NLPisaboutbuildingthesepieces!Translate

TextAnalysis Applica.onsText Annota.ons

‣ AllofthesecomponentsaremodeledwithstaPsPcal approachestrainedwithmachinelearning

Page 44: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Howdowerepresentlanguage?Text

Page 45: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Howdowerepresentlanguage?LabelsText

Page 46: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Howdowerepresentlanguage?LabelsText

themoviewasgood +

Page 47: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Howdowerepresentlanguage?LabelsText

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec.ve

Page 48: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Howdowerepresentlanguage?Labels

Sequences/tags

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec.ve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Page 49: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Howdowerepresentlanguage?Labels

Sequences/tags

Trees

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec.ve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Ieatcakewithicing

PPNP

S

NPVP

VBZ NN

Page 50: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Howdowerepresentlanguage?Labels

Sequences/tags

Trees

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec.ve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Ieatcakewithicing

PPNP

S

NPVP

VBZ NNflightstoMiami

λx.flight(x)∧dest(x)=Miami

Page 51: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Page 52: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

Page 53: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

ExtractsyntacPcfeatures

Page 54: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

Page 55: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

Page 56: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

Page 57: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

end-to-endmodels …

Page 58: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

‣MainquesPon:WhatrepresentaPonsdoweneedforlanguage?Whatdowewanttoknowaboutit?

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

end-to-endmodels …

Page 59: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

‣MainquesPon:WhatrepresentaPonsdoweneedforlanguage?Whatdowewanttoknowaboutit?

‣ Boilsdownto:whatambiguiPesdoweneedtoresolve?

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

end-to-endmodels …

Page 60: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Whyislanguagehard? (andhowcanwehandlethat?)

Page 61: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Page 62: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

Page 63: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

Page 64: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyadvocated

Page 65: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyadvocated

Page 66: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyfeared

theyadvocated

Page 67: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyfeared

theyadvocated

Page 68: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyfeared

theyadvocated

‣ Thisissocomplicatedthatit’sanAIchallengeproblem!(AI-complete)

Page 69: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyfeared

theyadvocated

‣ Thisissocomplicatedthatit’sanAIchallengeproblem!(AI-complete)

‣ ReferenPal/semanPcambiguity

Page 70: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

slidecredit:DanKlein

Page 71: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

Page 72: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids

Page 73: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors

Page 74: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk

Page 75: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

Page 76: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

‣ StolenPainPngFoundbyTree

Page 77: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

‣ StolenPainPngFoundbyTree‣ KidsMakeNutriPousSnacks

Page 78: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

‣ StolenPainPngFoundbyTree‣ KidsMakeNutriPousSnacks‣ LocalHSDropoutsCutinHalf

Page 79: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ SyntacPc/semanPcambiguity:parsingneededtoresolvethese,butneedcontexttofigureoutwhichparseiscorrect

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

‣ StolenPainPngFoundbyTree‣ KidsMakeNutriPousSnacks‣ LocalHSDropoutsCutinHalf

Page 80: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Page 81: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

ilfaitvraimentbeau

Page 82: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau

Page 83: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallynice

Page 84: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPful

Page 85: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutside

Page 86: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutsideHemakestrulybeauPful

Page 87: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutsideHemakestrulybeauPfulHemakestrulyboyfriend

Page 88: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutsideHemakestrulybeauPful

ItfactactuallyhandsomeHemakestrulyboyfriend

Page 89: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

‣ CombinatoriallymanypossibiliPes,manyyouwon’tevenregisterasambiguiPes,butsystemssPllhavetoresolvethem

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutsideHemakestrulybeauPful

ItfactactuallyhandsomeHemakestrulyboyfriend

Page 90: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

‣ Lotsofdata!

slidecredit:DanKlein

Whatdoweneedtounderstandlanguage?

Page 91: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

Page 92: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

Page 93: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

Page 94: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

DepartmentofJus6ce

Page 95: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

metaphor;“approves”

DepartmentofJus6ce

Page 96: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

metaphor;“approves”

DepartmentofJus6ce

‣ Whatisagreenlight?Howdoweunderstandwhat“greenlighPng”does?

Page 97: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

‣ Grounding:learnwhatfundamentalconceptsactuallymeaninadata-drivenway

Whatdoweneedtounderstandlanguage?

Page 98: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

‣ Grounding:learnwhatfundamentalconceptsactuallymeaninadata-drivenway

Gollandetal.(2010)

Whatdoweneedtounderstandlanguage?

Page 99: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

‣ Grounding:learnwhatfundamentalconceptsactuallymeaninadata-drivenway

McMahanandStone(2015)Gollandetal.(2010)

Whatdoweneedtounderstandlanguage?

Page 100: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

‣ LinguisPcstructure

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

Page 101: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

‣ LinguisPcstructure‣ …butcomputersprobablywon’tunderstandlanguagethesamewayhumansdo

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

Page 102: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

‣ LinguisPcstructure‣ …butcomputersprobablywon’tunderstandlanguagethesamewayhumansdo

‣ However,linguisPcstellsuswhatphenomenaweneedtobeabletodealwithandgivesushintsabouthowlanguageworks

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

Page 103: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

‣ LinguisPcstructure‣ …butcomputersprobablywon’tunderstandlanguagethesamewayhumansdo

‣ However,linguisPcstellsuswhatphenomenaweneedtobeabletodealwithandgivesushintsabouthowlanguageworks

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

Page 104: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Whattechniquesdoweuse?(tocombinedata,knowledge,linguisPcs,etc.)

Page 105: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

Page 106: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

“AIwinter”rule-based,expertsystems

Page 107: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Page 108: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Page 109: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Page 110: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Page 111: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Page 112: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Unsup:topicmodels,grammarinducPon

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Page 113: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Unsup:topicmodels,grammarinducPon

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Semi-sup,structuredpredicPon

Page 114: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Unsup:topicmodels,grammarinducPon

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Semi-sup,structuredpredicPon

Neural

Page 115: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Unsup:topicmodels,grammarinducPon

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Semi-sup,structuredpredicPon

Neural

Page 116: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

StructuredPredicPon

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

Page 117: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

StructuredPredicPon

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

Page 118: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

Page 119: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

Page 120: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

unsupervisedlearning

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

Page 121: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

annotaPon(twohours!)

unsupervisedlearning

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

Page 122: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

annotaPon(twohours!)

unsupervisedlearning

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

be8ersystem!

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

Page 123: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

annotaPon(twohours!)

unsupervisedlearning

‣ Evenneuralnetscandopre8ywell!

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

be8ersystem!

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

Page 124: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Bahdanauetal.(2014)DeNeroetal.(2008)

LessManualStructure?

Page 125: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Doesmanualstructurehaveaplace?

Page 126: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

Page 127: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

Page 128: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

MoosaviandStrube(2017)

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

Wikipedia

Newswire

Page 129: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

MoosaviandStrube(2017)

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

‣ LORELEI:transiPonpointbelowwhichphrase-basedsystemsarebe8er

Wikipedia

Newswire

Page 130: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

MoosaviandStrube(2017)

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

‣ LORELEI:transiPonpointbelowwhichphrase-basedsystemsarebe8er

‣ Whyisthis?InducPvebias!

Wikipedia

Newswire

Page 131: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

MoosaviandStrube(2017)

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

‣ LORELEI:transiPonpointbelowwhichphrase-basedsystemsarebe8er

‣ Whyisthis?InducPvebias!

‣ CanmulP-tasklearninghelp?

Wikipedia

Newswire

Page 132: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony

Doesmanualstructurehaveaplace?

Page 133: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony

‣ Maybemanualstructurewouldhelp…

Doesmanualstructurehaveaplace?

Page 134: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Wherearewe?

Page 135: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresentaPonsfortext,solvingproblemsinvolvingtext

Page 136: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresentaPonsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguisPcstosolve

Page 137: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresentaPonsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguisPcstosolve

‣ Knowingwhichtechniquesuserequiresunderstandingdatasetsize,problemcomplexity,andalotoftricks!

Page 138: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresentaPonsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguisPcstosolve

‣ Knowingwhichtechniquesuserequiresunderstandingdatasetsize,problemcomplexity,andalotoftricks!

‣ NLPencompassesallofthesethings

Page 139: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPvs.ComputaPonalLinguisPcs

‣ NLP:buildsystemsthatdealwithlanguagedata

‣ CL:usecomputaPonaltoolstostudylanguage

Hamiltonetal.(2016)

Page 140: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPvs.ComputaPonalLinguisPcs

‣ NLP:buildsystemsthatdealwithlanguagedata

‣ CL:usecomputaPonaltoolstostudylanguage

Hamiltonetal.(2016)

Page 141: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPvs.ComputaPonalLinguisPcs

‣ ComputaPonaltoolsforotherpurposes:literarytheory,poliPcalscience…

Bamman,O’Connor,Smith(2013)

Page 142: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

NLPvs.ComputaPonalLinguisPcs

‣ ComputaPonaltoolsforotherpurposes:literarytheory,poliPcalscience…

Bamman,O’Connor,Smith(2013)

Page 143: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

OutlineoftheCourse

Page 144: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

OutlineoftheCourse

MLandstructuredpredicPonforNLP {

Page 145: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

OutlineoftheCourse

MLandstructuredpredicPonforNLP {

Neuralnets {

Page 146: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

OutlineoftheCourse

MLandstructuredpredicPonforNLP {

Neuralnets {{Syntax/

semanPcs

Page 147: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

OutlineoftheCourse

MLandstructuredpredicPonforNLP {

Neuralnets {{Syntax/

semanPcs

{ApplicaPons:MT,IE,summarizaPon,dialogue,etc.

Page 148: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

CourseGoals

Page 149: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

Page 150: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ UnderstandhowtolookatlanguagedataandapproachlinguisPcphenomena

Page 151: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheacPveresearchtopicsin2018?

‣ UnderstandhowtolookatlanguagedataandapproachlinguisPcphenomena

Page 152: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ Makeyoua“producer”ratherthana“consumer”ofNLPtools

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheacPveresearchtopicsin2018?

‣ UnderstandhowtolookatlanguagedataandapproachlinguisPcphenomena

Page 153: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ Makeyoua“producer”ratherthana“consumer”ofNLPtools

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheacPveresearchtopicsin2018?

‣ Thefourassignmentsshouldteachyouwhatyouneedtoknowtounderstandnearlyanysystemintheliterature

‣ UnderstandhowtolookatlanguagedataandapproachlinguisPcphenomena

Page 154: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Assignments

‣ 4HomeworkAssignments

‣ ImplementaPon-oriented,withanopen-endedcomponenttoeach

‣ Homework1(NaiveBayesforsenPmentclassificaPon)isoutNOW

‣ ~2weeksperassignment,3“slipdays”forautomaPcextensions

Page 155: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

Assignments

‣ 4HomeworkAssignments

‣ ImplementaPon-oriented,withanopen-endedcomponenttoeach

‣ Homework1(NaiveBayesforsenPmentclassificaPon)isoutNOW

‣ ~2weeksperassignment,3“slipdays”forautomaPcextensions

Theseprojectsrequireunderstandingoftheconcepts,abilitytowriteperformantcode,andabilitytothinkabouthowtodebugcomplexsystems.Theyarechallenging,sostartearly!

Page 156: 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage is Ambiguous! ‣ Headlines slide credit: Dan Klein ‣ SyntacPc/semanPc ambiguity:

FinalProject

‣ Finalproject(20%)‣ Groupsof3-4preferred,1ispossible.‣ Goodideatotalktorunyourprojectideabymeinofficehoursoremail.

‣ 4pagereport+finalprojectpresentaPon.