Applications in NLP - University of Washingtonssli.ee.washington.edu/classes/ee517/notes/nlp-applications.pdf · What is NLP? ! Fundamental goal: deep understand of broad language

Applications in NLP

Hanna Hajishirzi [email protected]

Slides adapted from Dan Klein, Yejin Choi, Luke Zettlemoyer, Julia Hockenmaier

What is NLP?

§  Fundamental goal: deep understand of broad language §  Not just string processing or keyword matching

§  End systems that we want to build: §  Simple: spelling correction, text categorization… §  Complex: speech recognition, machine translation, information

extraction, sentiment analysis, question answering… §  Unknown: human-level comprehension (is this just NLP?)

Machine Translation

§  Translate text from one language to another §  Recombines fragments of example translations §  Challenges:

§  What fragments? [learning to translate] §  How to make efficient? [fast translation search] §  Fluency (second half of this class) vs fidelity (later)

CS447: Natural Language Processing (J. Hockenmaier)

Machine Translation

8

Google Translate

US Cities: Its largest airport is named for a World War II hero; its second largest, for a World War II battle.

Jeopardy! World Champion

Information Extraction §  Unstructured text to database entries

§  SOTA: perhaps 80% accuracy for multi-sentence temples, 90%+ for single easy fields

§  But remember: information is redundant!

New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent.

start president and CEO New York Times Co. Lance R. Primis

end executive vice president

New York Times newspaper

Russell T. Lewis

start president and general manager

New York Times newspaper

Russell T. Lewis

State Post Company Person

Knowledge Graph: “things not strings”

Question Answering §  Question Answering:

§  More than search §  Can be really easy:

“What’s the capital of Wyoming?”

§  Can be harder: “How many US states’ capitals are also their largest cities?”

§  Can be open ended: “What are the main issues in the global warming debate?”

§  Natural Language

Interaction: §  Understand requests and

act on them §  “Make me a reservation for

two at Quinn’s tonight’’

Mobile devices can now answer (some or our) questions and execute commands...

https://www.youtube.com/watch?v=KkOCeAtKHIc https://www.youtube.com/watch?v=qGU-SqUTees •  Why are these appearing now? •  What are fundamental limitations in

current art?

Will this Be Part of All Our Home Devices?

Language Comprehension?

§  Automatic Speech Recognition (ASR) §  Audio in, text out §  SOTA: 0.3% error for digit strings, 5% dictation, 50%+ TV

§  Text to Speech (TTS)

§  Text in, audio out §  SOTA: totally intelligible (if sometimes unnatural)

Speech Recognition

“Speech Lab”


Sentiment/opinion analysis

10

Source: www.amazon.com

•  Today: In 2012 election, automatic sentiment analysis actually being used to complement traditional methods (surveys, focus groups)

•  Past: “Sentiment Analysis” research started in 2002 or so •  Challenge: Need statistical models for deeper semantic

understanding --- subtext, intent, nuanced messages

Analyzing public opinion, making political forecasts

Summarization

§  Condensing documents §  Single or

multiple docs §  Extractive or

synthetic §  Aggregative or

representative

§  Very context-dependent!

§  An example of analysis with generation

Start-up Summly à Yahoo!

CEO Marissa Mayer announced an update to the app in a blog post, saying, "The new Yahoo! mobile app is also smarter, using Summly’s natural-language algorithms and machine learning to deliver quick story summaries. We acquired Summly less than a month ago, and we’re thrilled to introduce this game-changing technology in our first mobile application.” Launched 2011, Acquired 2013 for $30M

Can a robot write news? Despite an expected dip in profit, analysts are generally optimistic about Steelcase as it prepares to reports its third-quarter earnings on Monday, December 22, 2014. The consensus earnings per share estimate is 26 cents per share. The consensus estimate remains unchanged over the past month, but it has decreased from three months ago when it was 27 cents. Analysts are expecting earnings of 85 cents per share for the fiscal year. Revenue is projected to be 5% above the year-earlier total of $784.8 million at $826.1 million for the quarter. For the year, revenue is projected to come in at $3.11 billion. The company has seen revenue grow for three quarters straight. The less than a percent revenue increase brought the figure up to $786.7 million in the most recent quarter. Looking back further, revenue increased 8% in the first quarter from the year earlier and 8% in the fourth quarter. The majority of analysts (100%) rate Steelcase as a buy. This compares favorably to the analyst ratings of three similar companies, which average 57% buys. Both analysts rate Steelcase as a buy. Steelcase is a designer, marketer and manufacturer of office furniture. Other companies in the furniture and fixtures industry with upcoming earnings release dates include: HNI and Knoll.

Some of the formulaic news articles are now written by computers.

•  Definitely far from “perfect” •  Can we make the generation engine statistically learned rather than engineered?

Writer-bots for earthquake & financial reports

Language and Vision “Imagine, for example, a computer that could look at an arbitrary scene anything from a sunset over a fishing village to Grand Central Station at rush hour and produce a verbal description. This is a problem of overwhelming difficulty, relying as it does on finding solutions to both vision and language and then integrating them. I suspect that scene analysis will be one of the last cognitive tasks to be performed well by computers” -- David Stork (HAL’s Legacy, 2001) on A. Rosenfeld’s vision

2/4/2015 Stanford University CS231n: Convolutional Neural Networks for Visual Recognition

http://cs.stanford.edu/people/karpathy/deepimagesent/ 2/10

Multimodal Recurrent Neural Network

Our Multimodal Recurrent Neural Architecture generates sentence descriptions from images.

Below are a few examples of generated sentences:

Retrieval Demo

Our full retrival results for a test set of 1,000 COCO images can be found in this interactive

retrieval web demo (http://cs.stanford.edu/people/karpathy/deepimagesent/rankingdemo/).

Region Annotations

We are collecting region annotations in text with AMT and will release them here. Coming

soon.

"man in black shirt is playing guitar."



"construction worker in orange safety vest is working on road."

"two young girls are playing with lego toy."



"baseball player is throwing ball in game."

"woman is holding bunch of bananas."

What begins to work

The flower was so vivid and attractive.

Scenes around the lake on my bike ride.

Blue flowers have no scent. Small white flowers have no idea what they are.

Spring in a white dress.

This horse walking along the road as we drove by.

What begins to work (e.g., Kuznetsova et al. 2014)

We sometimes do well: 1 out of 4 times, machine captions were preferred over the original Flickr captions:

Problems and Task in NLP


Task: Tokenization/segmentation

We need to split text into words and sentences.-Languages like Chinese don’t have spaces between words.-Even in English, this cannot be done deterministically: There was an earthquake near D.C. You could even feel it in Philadelphia, New York, etc.

NLP task:What is the most likely segmentation/tokenization?

15


Task: Part-of-speech-tagging

16

Open the pod door, Hal.

Verb Det Noun Noun , Name . Open the pod door , Hal .

open: verb, adjective, or noun?

Verb: open the door Adjective: the open door Noun: in the open


What are the most likely tags T for the sentence S ?

We need to define a statistical model of P(T |S), e.g.:

We need to estimate the parameters of P(T |S), e.g.: P( ti =V | ti-1 =N ) = 0.3

How do we decide?

17

argmax

TP(T |S)

argmax

TP(T |S) = argmax

TP(T )P(S|T )

P(T ) =de f ’i

P(ti|ti�1

)

P(S|T ) =de f ’i

P(wi|i)

argmax

TP(T |S) = argmax

TP(T )P(S|T )

P(T ) =de f ’i

P(ti|ti�1

)

P(S|T ) =de f ’i

P(wi|i)

argmax

TP(T |S) = argmax

TP(T )P(S|T )

P(T ) =de f ’i

P(ti|ti�1

)

P(S|T ) =de f ’i

P(wi|i)

Corpora

§  A corpus is a collection of text §  Often annotated in some way §  Sometimes just lots of text

§  Examples §  Newswire collections: 500M+ words §  Brown corpus: 1M words of tagged

text §  Penn Treebank: 1M words of parsed

WSJ §  Canadian Hansards: 10M+ words of

aligned French / English sentences §  The Web: billions of words of who

knows what

Problem: Ambiguity §  Ambiguity is a core problem in NLP task §  Statistical models are the main tool to deal

with ambiguity

Problem: Ambiguities §  Headlines:

§  Ban on Nude Dancing on Governor’s Desk §  Teacher Strikes Idle Kids §  Hospitals Are Sued by 7 Foot Doctors §  Iraqi Head Seeks Arms §  Stolen Painting Found by Tree §  Local HS Dropouts Cut in Half

§  Why are these funny?


Task: Syntactic parsing

21

Verb Det Noun Noun , Name . Open the pod door , Hal .

NOUNNP

VP

NP

S


Observation: Structure corresponds to meaning

Correct analysis

Incorrect analysis

eat with tunasushiNPNP

VP

PPNP

V P

sushi eat with chopsticksNPNP

VP

PPVPV P

eat sushi with tuna

eat sushi with chopsticks

eat sushi with chopsticksNPNP

NPVP

PPV P

eat with tunasushiNPNP

VP

PPVPV P

eat sushi with tuna

eat sushi with chopsticks

22

Syntactic Analysis

§  SOTA: ~90% accurate for many languages when given many training examples, some progress in analyzing languages given few or no examples

Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun ,

where frightened tourists squeezed into musty shelters .

The Penn Treebank: Size Data for Parsing Experiments

IPenn WSJ Treebank = 50,000 sentences with associated trees

IUsual set-up: 40,000 training sentences, 2400 test sentences

An example tree:

Canadian

NNP

Utilities

NNPS

NP

had

VBD

1988

CD

revenue

NN

NP

of

IN

C$

$

1.16

CD

billion

CD

,

PUNC,

QP

NP

PP

NP

mainly

RB

ADVP

from

IN

its

PRP$

natural

JJ

gas

NN

and

CC

electric

JJ

utility

NN

businesses

NNS

NP

in

IN

Alberta

NNP

,

PUNC,

NP

where

WRB

WHADVP

the

DT

company

NN

NP

serves

VBZ

about

RB

800,000

CD

QP

customers

NNS

.

PUNC.

NP

VP

S

SBAR

NP

PP

NP

PP

VP

S

TOP

Canadian Utilities had 1988 revenue of C$ 1.16 billion ,

mainly from its natural gas and electric utility businesses in

Alberta , where the company serves about 800,000

customers .

Penn Treebank Non-terminals THE PENN TREEBANK: AN OVERVIEW 9

Table 1.2. The Penn Treebank syntactic tagset

ADJP Adjective phraseADVP Adverb phraseNP Noun phrasePP Prepositional phraseS Simple declarative clauseSBAR Subordinate clauseSBARQ Direct question introduced by wh-elementSINV Declarative sentence with subject-aux inversionSQ Yes/no questions and subconstituent of SBARQ excluding wh-elementVP Verb phraseWHADVP Wh-adverb phraseWHNP Wh-noun phraseWHPP Wh-prepositional phraseX Constituent of unknown or uncertain category

“Understood” subject of infinitive or imperative0 Zero variant of that in subordinate clausesT Trace of wh-Constituent

Predicate-argument structure. The new style of annotation providedthree types of information not included in the first phase.

1 A clear, concise distinction between verb arguments and adjuncts wheresuch distinctions are clear, with an easy-to-use notational device to indi-cate where such a distinction is somewhat murky.

2 A non-context free annotational mechanism to allow the structure of dis-continuous constituents to be easily recovered.

3 A set of null elements in what can be thought of as “underlying” posi-tion for phenomena such as wh-movement, passive, and the subjects ofinfinitival constructions, co-indexed with the appropriate lexical mate-rial.

The goal of a well-developed predicate-argument scheme is to label eachargument of the predicate with an appropriate semantic label to identify itsrole with respect to that predicate (subject, object, etc.), as well as distinguish-ing the arguments of the predicate, and adjuncts of the predication. Unfortu-nately, while it is easy to distinguish arguments and adjuncts in simple cases,it turns out to be very difficult to consistently distinguish these two categoriesfor many verbs in actual contexts. It also turns out to be very difficult to de-termine a set of underlying semantic roles that holds up in the face of a few

Problem: Sparsity §  However: sparsity is always a problem

§  New unigram (word), bigram (word pair)

00.10.20.30.40.50.60.70.80.91

0 200000 400000 600000 800000 1000000

Frac

tion

Seen

Number of Words

Unigrams

Bigrams


More than a decade ago, Carl Lewis stood on the threshold of what was to become the greatest athletics career in history. He had just broken two of the legendary Jesse Owens' college records, but never believed he would become a corporate icon, the focus of hundreds of millions of dollars in advertising. His sport was still nominally amateur. Eighteen Olympic and World Championship gold medals and 21 world records later, Lewis has become the richest man in the history of track and field -- a multi-millionaire.

Who is Carl Lewis?Did Carl Lewis break any world records?(and how do you know that?)

Understanding texts

28

Coreference Resolution

§  Coreference resolution: §  Determine when two mentions refer to same individual

[Michael Eisner] and [Donald Tsang] announced the grand opening of [Hong Kong Disneyland] yesterday. [Eisner] thanked [Mr. Tsang] and welcomed [fans] to [the park].

•  Require world knowledge to better coreference

Hajishirzi et al., EMNLP’13

Coreference%ResoluRon%Goal:%predict%what%the%(primarily)%noun%phrases%in%the%text%refer%to%

•  Johni%hid%Billj’s%car%keys.%Hei/j%was%drunk.%

Many%different%cues%can%be%used%to%disambiguate:%

•  Maryi%hid%Billj’s%car%keys.%Shei/j%was%drunk.%%Many%other%factors%play%a%role%%–  syntacRc%structure,%discourse%relaRons,%world%

knowledge.%%

%

%

%

The%Problem:%Find%and%Cluster%MenRons%[Victoria%Chen]1,%[Chief%Financial%Officer%of%[Megabucks%banking%corp]2%since%2004]3,%saw%[[her]4%pay]5%jump%20%,%to%$1.3%million,%as%[the%37%year%old]6%also%became%the%[[Denver;based%financial%services%company]7’s%president]8.%It%has%been%ten%years%since%she%came%to%[Megabucks]9%from%rival%[Lotsabucks]10.%

MenRon%Clustering%

Co;reference%chains:%1  % {Victoria%Chen,%Chief%Financial%Officer...since%2004,%her,%the%37;

year;old,%the%Denver;based%financial%services%company’s%president}%

2  {Megabucks%Banking%Corp,%Denver;based%financial%services%company,%Megabucks}%

3  {her%pay}%

4  {rival%Lotsabucks}%

Data%Sets%Lee et al. Deterministic coreference resolution based on entity-centric, precision-ranked rules

Corpora # Documents # Sentences # Words # Entities # MentionsOntoNotes-Dev 303 6,894 136K 3,752 14,291OntoNotes-Test 322 8,262 142K 3,926 16,291ACE2004-Culotta-Test 107 1,993 33K 2,576 5,455ACE2004-nwire 128 3,594 74K 4,762 11,398MUC6-Test 30 576 13K 496 2,136

Table 3Corpora statistics.

the ACE and MUC corpora using the Stanford parser (Klein and Manning 2003) and the

Stanford named entity recognizer (NER) (Finkel, Grenager, and Manning 2005). We used

the provided parse trees and named entity labels (not gold) in the OntoNotes corpora

to facilitate the comparison with other systems.

4.2 Evaluation Metrics

We use five evaluation metrics widely used in the literature. B3 and CEAF have im-

plementation variations in how to take system mentions into account. We followed the

same implementation as used in CoNLL-2011 shared task.

r MUC (Vilain et al. 1995) – link-based metric which measures how many

predicted and gold mention clusters need to be merged to cover the gold

and predicted clusters respectively.

R =P

(|Gi|�|p(Gi)|)P(|Gi|�1) (Gi: a gold mention cluster, p(Gi): partitions of Gi).

P =P

(|Si|�|p(Si)|)P(|Si|�1) (Si: a system mention cluster, p(Si): partitions of Si).

F1 = 2PRP+Rr B3 (Bagga and Baldwin 1998) – mention-based metric which measures the

proportion of overlap between predicted and gold mention clusters for a

given mention. When Gmi is the gold cluster of mention mi and Smi is the

system cluster of mention mi,

R =P

i|Gmi

\Smi|

|Gmi|

, P =P

i|Gmi

\Smi|

|Smi|

, F1 = 2PRP+R

23

Computational Linguistics Just Accepted MS. doi: 10.1162/COLI_a_00152 © Association for Computational Linguistics

•  TradiRonally,%systems%have%used%different%sets%– Has%made%direct%comparison%surprisingly%difficult…%

•  Differing%assumpRons%about%menRons%– We%will%assume%gold%standard%in%this%lecture%

Semantic Ambiguity

§  Direct Meanings: §  It understands you like your mother (does) [presumably well] §  It understands (that) you like your mother §  It understands you like (it understands) your mother

§  But there are other possibilities, e.g. mother could mean: §  a woman who has given birth to a child §  a stringy slimy substance consisting of yeast cells and bacteria;

is added to cider or wine to produce vinegar §  Context matters, e.g. what if previous sentence was:

§  Wow, Amazon predicted that you would need to order a big batch of new vinegar brewing ingredients. J

At last, a computer that understands you like your mother.

[Example from L. Lee]

Dark Ambiguities §  Dark ambiguities: most structurally permitted analyses

are so bad that you can’t get your mind to produce them

§  Unknown words and new usages §  Solution: We need mechanisms to focus attention on

the best ones, probabilistic techniques do this

This analysis corresponds to the correct parse of

“This will panic buyers ! ”

Language to Meaning

More informative

Information Extraction

Recover information about pre-specified

relations and entities

Relation ExtractionExample Task

is a(OBAMA,PRESIDENT )

from: [Semantic parsing tutorial, Yoav Artzi]

Language to Meaning

More informative

Broad-coverage Semantics

SummarizationExample Task

Obama wins election. Big party in Chicago. Romney a bit down, asks for some tea.

Focus on specific phenomena (e.g., verb-

argument matching)

Language to Meaning

More informative

Semantic Parsing

Recover complete meaning

representation

Database QueryExample Task

What states border Texas?

Oklahoma!New Mexico!

Arkansas!Louisiana


Summary: The NLP PipelineAn NLP system may use some or allof the following steps:

Tokenizer/Segmenterto identify words and sentences

Morphological analyzer/POS-taggerto identify the part of speech and structure of words

Word sense disambiguationto identify the meaning of words

Syntactic/semantic Parserto obtain the structure and meaning of sentences

Coreference resolution/discourse modelto keep track of the various entities and events mentioned

29