Applications in NLP
Hanna Hajishirzi [email protected]
Slides adapted from Dan Klein, Yejin Choi, Luke Zettlemoyer, Julia Hockenmaier
What is NLP?
§ Fundamental goal: deep understand of broad language § Not just string processing or keyword matching
§ End systems that we want to build: § Simple: spelling correction, text categorization… § Complex: speech recognition, machine translation, information
extraction, sentiment analysis, question answering… § Unknown: human-level comprehension (is this just NLP?)
Machine Translation
§ Translate text from one language to another § Recombines fragments of example translations § Challenges:
§ What fragments? [learning to translate] § How to make efficient? [fast translation search] § Fluency (second half of this class) vs fidelity (later)
US Cities: Its largest airport is named for a World War II hero; its second largest, for a World War II battle.
Jeopardy! World Champion
Information Extraction § Unstructured text to database entries
§ SOTA: perhaps 80% accuracy for multi-sentence temples, 90%+ for single easy fields
§ But remember: information is redundant!
New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent.
start president and CEO New York Times Co. Lance R. Primis
end executive vice president
New York Times newspaper
Russell T. Lewis
start president and general manager
New York Times newspaper
Russell T. Lewis
State Post Company Person
Question Answering § Question Answering:
§ More than search § Can be really easy:
“What’s the capital of Wyoming?”
§ Can be harder: “How many US states’ capitals are also their largest cities?”
§ Can be open ended: “What are the main issues in the global warming debate?”
§ Natural Language
Interaction: § Understand requests and
act on them § “Make me a reservation for
two at Quinn’s tonight’’
https://www.youtube.com/watch?v=KkOCeAtKHIc https://www.youtube.com/watch?v=qGU-SqUTees • Why are these appearing now? • What are fundamental limitations in
current art?
Will this Be Part of All Our Home Devices?
§ Automatic Speech Recognition (ASR) § Audio in, text out § SOTA: 0.3% error for digit strings, 5% dictation, 50%+ TV
§ Text to Speech (TTS)
§ Text in, audio out § SOTA: totally intelligible (if sometimes unnatural)
Speech Recognition
“Speech Lab”
CS447: Natural Language Processing (J. Hockenmaier)
Sentiment/opinion analysis
10
Source: www.amazon.com
• Today: In 2012 election, automatic sentiment analysis actually being used to complement traditional methods (surveys, focus groups)
• Past: “Sentiment Analysis” research started in 2002 or so • Challenge: Need statistical models for deeper semantic
understanding --- subtext, intent, nuanced messages
Analyzing public opinion, making political forecasts
Summarization
§ Condensing documents § Single or
multiple docs § Extractive or
synthetic § Aggregative or
representative
§ Very context-dependent!
§ An example of analysis with generation
Start-up Summly à Yahoo!
CEO Marissa Mayer announced an update to the app in a blog post, saying, "The new Yahoo! mobile app is also smarter, using Summly’s natural-language algorithms and machine learning to deliver quick story summaries. We acquired Summly less than a month ago, and we’re thrilled to introduce this game-changing technology in our first mobile application.” Launched 2011, Acquired 2013 for $30M
Can a robot write news? Despite an expected dip in profit, analysts are generally optimistic about Steelcase as it prepares to reports its third-quarter earnings on Monday, December 22, 2014. The consensus earnings per share estimate is 26 cents per share. The consensus estimate remains unchanged over the past month, but it has decreased from three months ago when it was 27 cents. Analysts are expecting earnings of 85 cents per share for the fiscal year. Revenue is projected to be 5% above the year-earlier total of $784.8 million at $826.1 million for the quarter. For the year, revenue is projected to come in at $3.11 billion. The company has seen revenue grow for three quarters straight. The less than a percent revenue increase brought the figure up to $786.7 million in the most recent quarter. Looking back further, revenue increased 8% in the first quarter from the year earlier and 8% in the fourth quarter. The majority of analysts (100%) rate Steelcase as a buy. This compares favorably to the analyst ratings of three similar companies, which average 57% buys. Both analysts rate Steelcase as a buy. Steelcase is a designer, marketer and manufacturer of office furniture. Other companies in the furniture and fixtures industry with upcoming earnings release dates include: HNI and Knoll.
Some of the formulaic news articles are now written by computers.
• Definitely far from “perfect” • Can we make the generation engine statistically learned rather than engineered?
Writer-bots for earthquake & financial reports
Language and Vision “Imagine, for example, a computer that could look at an arbitrary scene anything from a sunset over a fishing village to Grand Central Station at rush hour and produce a verbal description. This is a problem of overwhelming difficulty, relying as it does on finding solutions to both vision and language and then integrating them. I suspect that scene analysis will be one of the last cognitive tasks to be performed well by computers” -- David Stork (HAL’s Legacy, 2001) on A. Rosenfeld’s vision
2/4/2015 Stanford University CS231n: Convolutional Neural Networks for Visual Recognition
http://cs.stanford.edu/people/karpathy/deepimagesent/ 2/10
Multimodal Recurrent Neural Network
Our Multimodal Recurrent Neural Architecture generates sentence descriptions from images.
Below are a few examples of generated sentences:
Retrieval Demo
Our full retrival results for a test set of 1,000 COCO images can be found in this interactive
retrieval web demo (http://cs.stanford.edu/people/karpathy/deepimagesent/rankingdemo/).
Region Annotations
We are collecting region annotations in text with AMT and will release them here. Coming
soon.
"man in black shirt is playing guitar."
2/4/2015 Stanford University CS231n: Convolutional Neural Networks for Visual Recognition
http://cs.stanford.edu/people/karpathy/deepimagesent/ 3/10
"construction worker in orange safety vest is working on road."
"two young girls are playing with lego toy."
2/4/2015 Stanford University CS231n: Convolutional Neural Networks for Visual Recognition
http://cs.stanford.edu/people/karpathy/deepimagesent/ 7/10
"baseball player is throwing ball in game."
"woman is holding bunch of bananas."
What begins to work
The flower was so vivid and attractive.
Scenes around the lake on my bike ride.
Blue flowers have no scent. Small white flowers have no idea what they are.
Spring in a white dress.
This horse walking along the road as we drove by.
What begins to work (e.g., Kuznetsova et al. 2014)
We sometimes do well: 1 out of 4 times, machine captions were preferred over the original Flickr captions:
CS447: Natural Language Processing (J. Hockenmaier)
Task: Tokenization/segmentation
We need to split text into words and sentences.-Languages like Chinese don’t have spaces between words.-Even in English, this cannot be done deterministically: There was an earthquake near D.C. You could even feel it in Philadelphia, New York, etc.
NLP task:What is the most likely segmentation/tokenization?
15
CS447: Natural Language Processing (J. Hockenmaier)
Task: Part-of-speech-tagging
16
Open the pod door, Hal.
Verb Det Noun Noun , Name . Open the pod door , Hal .
open: verb, adjective, or noun?
Verb: open the door Adjective: the open door Noun: in the open
CS447: Natural Language Processing (J. Hockenmaier)
What are the most likely tags T for the sentence S ?
We need to define a statistical model of P(T |S), e.g.:
We need to estimate the parameters of P(T |S), e.g.: P( ti =V | ti-1 =N ) = 0.3
How do we decide?
17
argmax
TP(T |S)
argmax
TP(T |S) = argmax
TP(T )P(S|T )
P(T ) =de f ’i
P(ti|ti�1
)
P(S|T ) =de f ’i
P(wi|i)
argmax
TP(T |S) = argmax
TP(T )P(S|T )
P(T ) =de f ’i
P(ti|ti�1
)
P(S|T ) =de f ’i
P(wi|i)
argmax
TP(T |S) = argmax
TP(T )P(S|T )
P(T ) =de f ’i
P(ti|ti�1
)
P(S|T ) =de f ’i
P(wi|i)
Corpora
§ A corpus is a collection of text § Often annotated in some way § Sometimes just lots of text
§ Examples § Newswire collections: 500M+ words § Brown corpus: 1M words of tagged
text § Penn Treebank: 1M words of parsed
WSJ § Canadian Hansards: 10M+ words of
aligned French / English sentences § The Web: billions of words of who
knows what
Problem: Ambiguity § Ambiguity is a core problem in NLP task § Statistical models are the main tool to deal
with ambiguity
Problem: Ambiguities § Headlines:
§ Ban on Nude Dancing on Governor’s Desk § Teacher Strikes Idle Kids § Hospitals Are Sued by 7 Foot Doctors § Iraqi Head Seeks Arms § Stolen Painting Found by Tree § Local HS Dropouts Cut in Half
§ Why are these funny?
CS447: Natural Language Processing (J. Hockenmaier)
Task: Syntactic parsing
21
Verb Det Noun Noun , Name . Open the pod door , Hal .
NOUNNP
VP
NP
S
CS447: Natural Language Processing (J. Hockenmaier)
Observation: Structure corresponds to meaning
Correct analysis
Incorrect analysis
eat with tunasushiNPNP
VP
PPNP
V P
sushi eat with chopsticksNPNP
VP
PPVPV P
eat sushi with tuna
eat sushi with chopsticks
eat sushi with chopsticksNPNP
NPVP
PPV P
eat with tunasushiNPNP
VP
PPVPV P
eat sushi with tuna
eat sushi with chopsticks
22
Syntactic Analysis
§ SOTA: ~90% accurate for many languages when given many training examples, some progress in analyzing languages given few or no examples
Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun ,
where frightened tourists squeezed into musty shelters .
The Penn Treebank: Size Data for Parsing Experiments
IPenn WSJ Treebank = 50,000 sentences with associated trees
IUsual set-up: 40,000 training sentences, 2400 test sentences
An example tree:
Canadian
NNP
Utilities
NNPS
NP
had
VBD
1988
CD
revenue
NN
NP
of
IN
C$
$
1.16
CD
billion
CD
,
PUNC,
QP
NP
PP
NP
mainly
RB
ADVP
from
IN
its
PRP$
natural
JJ
gas
NN
and
CC
electric
JJ
utility
NN
businesses
NNS
NP
in
IN
Alberta
NNP
,
PUNC,
NP
where
WRB
WHADVP
the
DT
company
NN
NP
serves
VBZ
about
RB
800,000
CD
QP
customers
NNS
.
PUNC.
NP
VP
S
SBAR
NP
PP
NP
PP
VP
S
TOP
Canadian Utilities had 1988 revenue of C$ 1.16 billion ,
mainly from its natural gas and electric utility businesses in
Alberta , where the company serves about 800,000
customers .
Penn Treebank Non-terminals THE PENN TREEBANK: AN OVERVIEW 9
Table 1.2. The Penn Treebank syntactic tagset
ADJP Adjective phraseADVP Adverb phraseNP Noun phrasePP Prepositional phraseS Simple declarative clauseSBAR Subordinate clauseSBARQ Direct question introduced by wh-elementSINV Declarative sentence with subject-aux inversionSQ Yes/no questions and subconstituent of SBARQ excluding wh-elementVP Verb phraseWHADVP Wh-adverb phraseWHNP Wh-noun phraseWHPP Wh-prepositional phraseX Constituent of unknown or uncertain category
“Understood” subject of infinitive or imperative0 Zero variant of that in subordinate clausesT Trace of wh-Constituent
Predicate-argument structure. The new style of annotation providedthree types of information not included in the first phase.
1 A clear, concise distinction between verb arguments and adjuncts wheresuch distinctions are clear, with an easy-to-use notational device to indi-cate where such a distinction is somewhat murky.
2 A non-context free annotational mechanism to allow the structure of dis-continuous constituents to be easily recovered.
3 A set of null elements in what can be thought of as “underlying” posi-tion for phenomena such as wh-movement, passive, and the subjects ofinfinitival constructions, co-indexed with the appropriate lexical mate-rial.
The goal of a well-developed predicate-argument scheme is to label eachargument of the predicate with an appropriate semantic label to identify itsrole with respect to that predicate (subject, object, etc.), as well as distinguish-ing the arguments of the predicate, and adjuncts of the predication. Unfortu-nately, while it is easy to distinguish arguments and adjuncts in simple cases,it turns out to be very difficult to consistently distinguish these two categoriesfor many verbs in actual contexts. It also turns out to be very difficult to de-termine a set of underlying semantic roles that holds up in the face of a few
Problem: Sparsity § However: sparsity is always a problem
§ New unigram (word), bigram (word pair)
00.10.20.30.40.50.60.70.80.91
0 200000 400000 600000 800000 1000000
Frac
tion
Seen
Number of Words
Unigrams
Bigrams
CS447: Natural Language Processing (J. Hockenmaier)
More than a decade ago, Carl Lewis stood on the threshold of what was to become the greatest athletics career in history. He had just broken two of the legendary Jesse Owens' college records, but never believed he would become a corporate icon, the focus of hundreds of millions of dollars in advertising. His sport was still nominally amateur. Eighteen Olympic and World Championship gold medals and 21 world records later, Lewis has become the richest man in the history of track and field -- a multi-millionaire.
Who is Carl Lewis?Did Carl Lewis break any world records?(and how do you know that?)
Understanding texts
28
Coreference Resolution
§ Coreference resolution: § Determine when two mentions refer to same individual
[Michael Eisner] and [Donald Tsang] announced the grand opening of [Hong Kong Disneyland] yesterday. [Eisner] thanked [Mr. Tsang] and welcomed [fans] to [the park].
• Require world knowledge to better coreference
Hajishirzi et al., EMNLP’13
Coreference%ResoluRon%Goal:%predict%what%the%(primarily)%noun%phrases%in%the%text%refer%to%
• Johni%hid%Billj’s%car%keys.%Hei/j%was%drunk.%
Many%different%cues%can%be%used%to%disambiguate:%
• Maryi%hid%Billj’s%car%keys.%Shei/j%was%drunk.%%Many%other%factors%play%a%role%%– syntacRc%structure,%discourse%relaRons,%world%
knowledge.%%
%
%
%
The%Problem:%Find%and%Cluster%MenRons%[Victoria%Chen]1,%[Chief%Financial%Officer%of%[Megabucks%banking%corp]2%since%2004]3,%saw%[[her]4%pay]5%jump%20%,%to%$1.3%million,%as%[the%37%year%old]6%also%became%the%[[Denver;based%financial%services%company]7’s%president]8.%It%has%been%ten%years%since%she%came%to%[Megabucks]9%from%rival%[Lotsabucks]10.%
MenRon%Clustering%
Co;reference%chains:%1 % {Victoria%Chen,%Chief%Financial%Officer...since%2004,%her,%the%37;
year;old,%the%Denver;based%financial%services%company’s%president}%
2 {Megabucks%Banking%Corp,%Denver;based%financial%services%company,%Megabucks}%
3 {her%pay}%
4 {rival%Lotsabucks}%
Data%Sets%Lee et al. Deterministic coreference resolution based on entity-centric, precision-ranked rules
Corpora # Documents # Sentences # Words # Entities # MentionsOntoNotes-Dev 303 6,894 136K 3,752 14,291OntoNotes-Test 322 8,262 142K 3,926 16,291ACE2004-Culotta-Test 107 1,993 33K 2,576 5,455ACE2004-nwire 128 3,594 74K 4,762 11,398MUC6-Test 30 576 13K 496 2,136
Table 3Corpora statistics.
the ACE and MUC corpora using the Stanford parser (Klein and Manning 2003) and the
Stanford named entity recognizer (NER) (Finkel, Grenager, and Manning 2005). We used
the provided parse trees and named entity labels (not gold) in the OntoNotes corpora
to facilitate the comparison with other systems.
4.2 Evaluation Metrics
We use five evaluation metrics widely used in the literature. B3 and CEAF have im-
plementation variations in how to take system mentions into account. We followed the
same implementation as used in CoNLL-2011 shared task.
r MUC (Vilain et al. 1995) – link-based metric which measures how many
predicted and gold mention clusters need to be merged to cover the gold
and predicted clusters respectively.
R =P
(|Gi|�|p(Gi)|)P(|Gi|�1) (Gi: a gold mention cluster, p(Gi): partitions of Gi).
P =P
(|Si|�|p(Si)|)P(|Si|�1) (Si: a system mention cluster, p(Si): partitions of Si).
F1 = 2PRP+Rr B3 (Bagga and Baldwin 1998) – mention-based metric which measures the
proportion of overlap between predicted and gold mention clusters for a
given mention. When Gmi is the gold cluster of mention mi and Smi is the
system cluster of mention mi,
R =P
i|Gmi
\Smi|
|Gmi|
, P =P
i|Gmi
\Smi|
|Smi|
, F1 = 2PRP+R
23
Computational Linguistics Just Accepted MS. doi: 10.1162/COLI_a_00152 © Association for Computational Linguistics
• TradiRonally,%systems%have%used%different%sets%– Has%made%direct%comparison%surprisingly%difficult…%
• Differing%assumpRons%about%menRons%– We%will%assume%gold%standard%in%this%lecture%
Semantic Ambiguity
§ Direct Meanings: § It understands you like your mother (does) [presumably well] § It understands (that) you like your mother § It understands you like (it understands) your mother
§ But there are other possibilities, e.g. mother could mean: § a woman who has given birth to a child § a stringy slimy substance consisting of yeast cells and bacteria;
is added to cider or wine to produce vinegar § Context matters, e.g. what if previous sentence was:
§ Wow, Amazon predicted that you would need to order a big batch of new vinegar brewing ingredients. J
At last, a computer that understands you like your mother.
[Example from L. Lee]
Dark Ambiguities § Dark ambiguities: most structurally permitted analyses
are so bad that you can’t get your mind to produce them
§ Unknown words and new usages § Solution: We need mechanisms to focus attention on
the best ones, probabilistic techniques do this
This analysis corresponds to the correct parse of
“This will panic buyers ! ”
Language to Meaning
More informative
Information Extraction
Recover information about pre-specified
relations and entities
Relation ExtractionExample Task
is a(OBAMA,PRESIDENT )
from: [Semantic parsing tutorial, Yoav Artzi]
Language to Meaning
More informative
Broad-coverage Semantics
SummarizationExample Task
Obama wins election. Big party in Chicago. Romney a bit down, asks for some tea.
Focus on specific phenomena (e.g., verb-
argument matching)
Language to Meaning
More informative
Semantic Parsing
Recover complete meaning
representation
Database QueryExample Task
What states border Texas?
Oklahoma!New Mexico!
Arkansas!Louisiana
CS447: Natural Language Processing (J. Hockenmaier)
Summary: The NLP PipelineAn NLP system may use some or allof the following steps:
Tokenizer/Segmenterto identify words and sentences
Morphological analyzer/POS-taggerto identify the part of speech and structure of words
Word sense disambiguationto identify the meaning of words
Syntactic/semantic Parserto obtain the structure and meaning of sentences
Coreference resolution/discourse modelto keep track of the various entities and events mentioned
29