Top Banner
Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign CRI-06 Workshop on Machine Learning in Natural Language Processing
51

Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Dec 25, 2015

Download

Documents

Derek Barker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 1

Global Inference and Learning

Towards Natural Language Understanding

Dan RothDepartment of Computer Science

University of Illinois at Urbana-Champaign

CRI-06 Workshop on Machine Learning in Natural Language Processing

Page 2: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 2

Nice to Meet You

Page 3: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 3

Learning and Inference

Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome.

(Learned) classifiers for different sub-problems

Incorporate classifiers’ information, along with constraints, in making coherent decisions – decisions that respect the local classifiers as well as domain & context specific constraints.

Global inference for the best assignment to all variables of interest.

Page 4: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 4

Comprehension

1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.

A process that maintains and updates a collection of propositions about the state of affairs.

(ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.

Page 5: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 5

How to Address Comprehension? (cartoon)

It’s an Inference Problem

Map into a well defined language

Use standard reasoning tools

Huge number of problems:

Variability of Language

Knowledge Acquisition

Reasoning Patterns

multiple levels of ambiguity make language precise

Canonical Mapping? Underspecificity calls for “purposeful”, goal specific mapping

Statistics: know a word by its neighbors

Counting

Machine Learning

Clustering

Probabilistic models

Structured Models

Classification

Page 6: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 6

Illinois’ bored of education board

...Nissan Car and truck plant is ……divide life into plant and animal kingdom

(This Art) (can N) (will MD) (rust V) V,N,N

The dog bit the kid. He was taken to a veterinarian a hospital

What we Know: Stand Alone Ambiguity Resolution

Learn a function f: X Y that maps observations in a domain to one of several categories or <

Broad Coverage

Page 7: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 7

Theoretically: generalization bounds How many example does one need to see in order to

guarantee good behavior on previously unobserved examples.

Algorithmically: good learning algorithms for linear representations.

Can deal with very high dimensionality (106 features) Very efficient in terms of computation and # of examples.

On-line. Key issues remaining:

Learning protocols: how to minimize interaction (supervision); how to map domain/task information to supervision; semi-supervised learning; active learning; ranking; sequences

What are the features? No good theoretical understanding here.

Programming systems that have multiple classifiers

Classification is Well Understood

Page 8: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 8

Comprehension

1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.

A process that maintains and updates a collection of propositions about the state of affairs.

(ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.

Page 9: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 9

This Talk

Integrating Learning and Inference Historical Perspective Role of Learning

Global Inference over Classifiers

Semantic Parsing

Multiple levels of processing

Textual Entailment

Summary and Future Directions

Page 10: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 10

Learning, Knowledge Representation, Reasoning

There has been a lot of work on these three topics in AI

But, mostly, work on Learning, and work on Knowledge Representation and Reasoning

Very little has been done on an integrative framework.

Inductive Logic Programming Some work from the perspective of probabilistic AI Some work from the perspective of Machine Learning

(EBL) Recent: Valiant’s Robust Logic Learning to Reason

Page 11: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 19

An unified framework to study Learning, Knowledge Representation and Reasoning

The goal is to Reason (deduction; abduction - best explanation)

Reasoning is not done from a static Knowledge Base but rather done with knowledge that is learned via interaction with the world.

Intermediate Representation is important – but only to the extent that it is learnable, and it facilitates reasoning.

Feedback to learning is given by the reasoning stage.

There may not be a need (or even a possibility) to learn the intermediate representation exactly, but only to the extent that is supports Reasoning.

[Khardon & Roth JACM97, AAAI94; Roth95, Roth96, Khardon&Roth99Learning to Plan: Khardon’99]

Learning to Reason [’94-’97]: Relevant?

How?

Page 12: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 20

Comprehension

1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.

A process that maintains and updates a collection of propositions about the state of affairs.

(ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.

Page 13: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 21

Learning serves to support abstraction A context sensitive operation that is done at multiple

levels From names (Mr. Robin, Christopher Robin) to Relations

(wrote, author) and concepts

Learning serves to generate the vocabulary over which reasoning is possible (Part-of-speech; a

subject-of,…) Knowledge Acquisition; tuning; memory,…. Learning in the context of a reasoning system

Training with an Inference Mechanism Feedback is done at the inference level, not at the single

classifier level Labeling using Inference

It’s an Inference Problem

Today’s Research IssuesWhat is the role of Learning?

Page 14: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 22

This Talk

Integrating Learning and Inference Historical Perspective Role of Learning

Global Inference over Classifiers

Semantic Parsing

Multiple levels of processing

Textual Entailment

Summary and Future Directions

Page 15: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 23

A unified representation that is used as an input to learning processes, and as an output of learning processes ……

Specifically, we use: An abstract representation that is centered around a

semantic parse (predicate-argument representation) augmented by additional information.

Formalized as a hierarchical concept graph (Description Logic inspired)

Feature Description Logic [Cumby&Roth’00, 02, 03]

What I will not talk about: Knowledge Representation

city

person

datemonth(April)year(2001)

country

meeting

participant

location time

name(Iraq)

affiliation

nationality

name(Prague)

organization

Mohammed Atta met with an Iraqi intelligence agent in Prague in April 2001.

Page 16: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 24

Inference and Learning Global decisions in which several local decisions play

a role but there are mutual dependencies on their outcome.

Learned classifiers for different sub-problems

Incorporate classifiers’ information, along with constraints, in making coherent decisions – decisions that respect the local classifiers as well as domain & context specific constraints.

Global inference for the best assignment to all variables of interest.

How to induce a predicate argument representation of a sentence.

How to use inference methods over learned outcomes.

How to use declarative information over/along with learned information.

Page 17: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 25

Semantic Role Labeling

I left my pearls to my daughter in my will .

[I]A0 left [my pearls]A1 [to my daughter]A2 [in my will]AM-LOC .

A0 Leaver A1 Things left A2 Benefactor AM-LOC Location I left my pearls to my daughter in my will .

Special Case (structure output problem): here, all the data is available at one time; in general, classifiers might be learned from different sources, at different times, at different contexts.

Implications on training paradigms

Overlapping arguments

If A2 is present, A1 must also be

present.

Page 18: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 26

Random

Variables Y:

Conditional Distributions P (learned by classifiers) Constraints C– any Boolean function defined on partial assignments (possibly: + weights

W ) Goal: Find the “best” assignment

The assignment that achieves the highest global accuracy.

This is an Integer Programming Problem

Problem Setting

y4 y5 y6 y7 y8

y1 y2 y3C(y1,y4)

C(y2,y3,y6,y7,y8)

Y*=argmaxY PY subject to constraints C(+ WC)

Page 19: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 27

A General Inference Setting

Inference as Optimization [Yih&Roth CoNLL’04]

[Punyakanok et. al COLING’04]

[Punyakanok et. al IJCAI’04]

Markov Random Field [standard] Optimization Problem (e.g., Metric Labeling

Problems) [Chekuri et. al’01] Linear Programming Problems

An Integer linear programming (ILP) formulation General: works on non-sequential constraint structure Expressive: can represent many types of declarative constraints Optimal: finds the optimal solution Fast: commercial packages are able to quickly solve very large

problems (hundreds of variables and constraints; sparsity is important)

Page 20: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 28

For each verb in a sentence1. Identify all constituents that fill a semantic role2. Determine their roles

• Core Arguments, e.g., Agent, Patient or Instrument• Their adjuncts, e.g., Locative, Temporal or Manner

I left my pearls to my daughter-in-law in my will.

A0 : leaver

A1 : thing left

A2 : benefactor

AM-LOC

The pearls which I left to my daughter-in-law are fake.

A0 : leaver

A1 : thing left

A2 : benefactor

R-A1

The pearls, I said, were left to my daughter-in-law.

A0 : sayer

A1 : utterance

C-A1 : utterance

Who did what to whom, when, where, why,…

Semantic Role Labeling (1/2)

Page 21: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 29

PropBank [Palmer et. al. 05] provides a large human-annotated corpus of semantic verb-argument relations. It adds a layer of generic semantic labels to Penn Tree

Bank II. (Almost) all the labels are on the constituents of the

parse trees.

Core arguments: A0-A5 and AA different semantics for each verb specified in the PropBank Frame files

13 types of adjuncts labeled as AM-arg where arg specifies the adjunct type

Semantic Role Labeling (2/2)

Page 22: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 30

Algorithmic Approach

Identify argument candidates Pruning [Xue&Palmer, EMNLP’04] Argument Identifier

Binary classification (SNoW)

Classify argument candidates Argument Classifier

Multi-class classification (SNoW)

Inference Use the estimated probability

distribution given by the argument classifier

Use structural and linguistic constraints

Infer the optimal global output

I left my nice pearls to her

I left my nice pearls to her[ [ [ [ [ ] ] ] ] ]

I left my nice pearls to her[ [ [ [ [ ] ] ] ] ]

I left my nice pearls to her

I left my nice pearls to her

Identify Vocabulary

Inference over (old and new)

Vocabulary

candidate arguments

Page 23: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 31

Argument Identification & Classification

Both argument identifier and argument

classifier are trained phrase-based classifiers.

Features (some examples) voice, phrase type, head word, path, chunk,

chunk pattern, etc. [some make use of a full syntactic parse]

Learning Algorithm – SNoW Sparse network of linear functions

weights learned by regularized Winnow multiplicative update rule

Probability conversion is done via softmax pi = exp{acti}/j exp{actj}

I left my nice pearls to her[ [ [ [ [ ] ] ] ] ]

I left my nice pearls to her

Page 24: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 32

InferenceI left my nice pearls to her

The output of the argument classifier often violates some constraints, especially when the sentence is long.

Finding the best legitimate output is formalized as an optimization problem and solved via Integer Linear Programming. [Punyakanok et. al 04, Roth & Yih 04]

Input: The probability estimation (by the argument classifier)

Structural and linguistic constraints

Allows incorporating expressive (non-sequential) constraints on the variables (the arguments types).

Page 25: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 33

Integer Linear Programming (ILP)

Maximize:

Subject to

Page 26: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 34

Integer Linear Programming Inference

For each argument ai

Set up a Boolean variable: ai,t indicating whether ai is classified as t

Goal is to maximize i score(ai = t ) ai,t

Subject to the (linear) constraints Any Boolean constraints can be encoded as linear

constraint(s).

If score(ai = t ) = P(ai = t ), the objective is to find the assignment that maximizes the expected number of arguments that are correct and satisfies the constraints.

Page 27: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 35

Maximize expected number correct T* = argmaxT i P( ai = ti )

Subject to some constraints Structural and Linguistic (R-A1A1)

Solved with Integer Learning Programming

0.3 0.2 0.2 0.3

0.6 0.0 0.0 0.4

0.1 0.3 0.5 0.1

0.1 0.2 0.3 0.4

I left my nice pearls to her

I left my nice pearls to her

Cost = 0.3 + 0.4 + 0.5 + 0.4 = 1.6 Non-OverlappingCost = 0.3 + 0.4 + 0.3 + 0.4 = 1.4 BlueRed & N-OCost = 0.3 + 0.6 + 0.5 + 0.4 = 1.8 Independent Max

Inference

Page 28: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 36

No duplicate argument classes

a POTARG x{a = A0} 1

R-ARG

a2 POTARG , a POTARG x{a = A0} x{a2 = R-A0}

C-ARG a2 POTARG ,

(a POTARG) (a is before a2 ) x{a = A0} x{a2 = C-A0}

Many other possible constraints: Unique labels No overlapping or embedding Relations between number of arguments If verb is of type A, no argument of type B

Any Boolean rule can be encoded as a linear constraint.

If there is an R-ARG phrase, there is an ARG Phrase

If there is an C-ARG phrase, there is an ARG before it

Constraints

Universally quantified rules

Page 29: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 37

This approach produces a very good semantic parser. Top ranked system in CoNLL’05 shared task:

Key difference is the Inference Easy and fast: ~1 Sentence/Second (using Xpress-MP) A lot of room for improvement (additional constraints) Demo available http://L2R.cs.uiuc.edu/~cogcomp

Significant also in enabling knowledge acquisition

So far, shown the use of only declarative (deterministic) constraints.

In fact, this approach can be used both with statistical and declarative constraints.

Semantic Parsing: Summary I

Page 30: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 38

ILP as a Unified Algorithmic Scheme

Consider a common model for sequential inference: HMM/CRF Inference in this model is done via the Viterbi Algorithm.

Viterbi is a special case of the Linear Programming based Inference. Viterbi is a shortest path problem, which is a LP, with a

canonical matrix that is totally unimodular. Therefore, you can get integrality constraints for free.

One can now incorporate non-sequential/expressive/declarative constraints by modifying this canonical matrix

The extension reduces to a polynomial scheme under some conditions (e.g., when constraints are sequential, when the solution space does not change, etc.)

Not necessarily increases complexity and very efficient in practice

[Roth&Yih, ICML’05]

y1 y2 y3 y4 y5y

x x1 x2 x3 x4 x5s

ABC

ABC

ABC

ABC

ABC

t

Page 31: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 39

An Inference method for the “best explanation”, used here to induce a semantic representation of a sentence. A general Information Integration framework.

Allows expressive constraints Any Boolean rule can be represented by a set of linear

(in)equalities Combining acquired (statistical) constraints with

declarative constraints Start with shortest path matrix and constraints Add new constraints to the basic integer linear program.

Solved using off-the-shelf packages If the additional constraints don’t change the solution, LP is

enough Otherwise, the computational time depends on sparsity; fast

in practice Demo available http://L2R.cs.uiuc.edu/~cogcomp

Integer Linear Programming Inference - Summary

Page 32: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 40

This Talk

Integrating Learning and Inference Historical Perspective Role of Learning

Global Inference over Classifiers

Semantic Parsing

Multiple levels of processing

Textual Entailment

Summary and Future Directions

Page 33: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 41

Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome.

So far, this was a single stage process.

Learn (acquire a new vocabulary) and

Run inference over it to guarantee the coherency of the outcome.

Is that it? Of course, this isn’t sufficient. The process of learning and

Inference needs to be done in phases.

Inference and Learning

It’s turtles all the way down…

Page 34: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 42

Pipelining is a crude approximation; interactions occur across levels and down stream decisions often interact with previous decisions.

Leads to propagation of errors Occasionally, later stage problems are easier but upstream

mistakes will not be corrected. There are good reasons for pipelining decisions Global inference over the outcomes of different levels can be used

to break away from this paradigm. [between pipeline & fully global]

Allows a flexible way to incorporate linguistic and structural constraints.

POS Tagging

Phrases

Semantic Entities

Relations

Vocabulary is generated in phases Left to Right processing of sentences is also a pipeline process

Parsing

WSD Semantic Role Labeling

Raw Data

Pipeline

Page 35: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 43

J.V. Oswald was murdered at JFK after his assassin, K. F. Johns…

Identify:

J.V. Oswald was murdered at JFK after his assassin, K. F. Johns…location

personperson

Kill (X, Y)

Identify named entities Identify relations between entities Exploit mutual dependencies between named

entities and relation to yield a coherent global detection. [Roth & Yih, COLING’02;CoNLL’04]

Some knowledge (classifiers) may be known in advanceSome constraints may be

available only at decision time

Entities and Relations: Information Integration

Page 36: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 44

This Talk

Integrating Learning and Inference Historical Perspective Role of Learning

Global Inference over Classifiers

Semantic Parsing

Multiple levels of processing

Textual Entailment

Summary and Future Directions

Page 37: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 45

Comprehension

1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now.

A process that maintains and updates a collection of propositions about the state of affairs.

(ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in 1925. Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.

Page 38: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 46

Given: Q: Who acquired Overture? Determine: A: Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year.

Textual Entailment

Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc. last year

Yahoo acquired Overture

Entails

Subsumed by

Overture is a search company Google is a search company

……….

Google owns Overture

By “semantically entailed” we mean: most people would agree that one sentence implies the

other. Simply – making plausible

inferences

Phrasal verb paraphrasing [Connor&Roth’06]

Entity matching [Li et. al, AAAI’04, NAACL’04]

Semantic Role Labeling

Page 39: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 47

Discussing Textual Entailment

Requires an inference process that makes use of a large number of learned (and knowledge-based) operators. A sound approach for determining whether a statement

of interest holds in a given sentence. [Braz et. al, AAAI05]

A pair (Sentence, hypothesis) is transformed into a simpler pair, in an entailment preserving manner.

Constrained Optimization formulation, over a large number of learned operators. Aimed at the best (simplest) mapping between predicate-argument representations.

Inference is purposeful: No canonical representation, but rather reasoning on sentences transformation depends on the hypothesis.

What is shown next is a proof. At any stage, a large number of operators are entertained,

some do not fire, some lead nowhere. This is a path through the optimization process, that leads to

a justifiable (and explainable) answer.

Page 40: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 48

Sample Entailment Pair

S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

Does ‘T’ follow from ‘S’?

Page 41: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 49

S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

OPERATOR 1: Phrasal Verb

Replace phrasal verbswith an equivalent single word verb

Page 42: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 50

S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

OPERATOR 2: Coreference ResolutionReplace pronouns/possessive pronouns

with the entity to which they refer

Page 43: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 51

S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments, involved supplies of crude or refined oil products.

S: Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus.U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

OPERATOR 3: Focus of AttentionRemove segments of a sentence that do not appear to be necessary;

may allow more accurate annotation of remaining words

Page 44: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 52

S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments offered supplies of crude or refined oil products.

S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Offers by individual European governments involved supplies of crude or refined oil products.

OPERATOR 4: Nominalization PromotionReplace a verb that does not express a useful/meaningful relationship

with a nominalization in one of its arguments

involved

supplies …Offers

by individual …

offered

Individual … supplies …

Requires semantic role labeling (for

noun predicates)

Page 45: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 53

S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments offered supplies of crude or refined oil products.

S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments supplied crude or refined oil products.

S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments offered supplies of crude or refined oil products.

OPERATOR 4: Nominalization PromotionReplace a verb that does not express a useful/meaningful relationship

with a nominalization in one of its arguments

offered

Individual … supplies

of crude …

supplied

Individual … crude …

Page 46: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 54

S: U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments supplied crude or refined oil products.

OPERATOR 5: Predicate Embedding ResolutionReplace a verb compound where the first verb may indicate modality or negation

with a single verb, marked with negation/modality attribute

decided

U.S. and … release

2 million barrelsa day, of oil …

released

U.S. and … 2 million barrelsa day, of oil … S: U.S. and European governments finally

decided to release 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments supplied crude or refined oil products.

S: U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments supplied crude or refined oil products.

‘decided’ (almost) does not change the meaning of the embedded verbBut what if the embedding verb had

been ‘refused’?ENTAILMENT SHOULD NOT BE SUCCEEDED

refused

U.S. and … release

2 million barrelsa day, of oil …

released

U.S. and … 2 million barrelsa day, of oil …

negation

Requires semantic role labeling (for

noun predicates)

Page 47: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 55

S: U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments supplied crude or refined oil products.

OPERATOR 6: Predicate MatchingSystem matches PREDICATES and their ARGUMENTS

-- accounts for monotonicity, modality, negation, and quantifiers

S: U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves.

T: Individual European governments supplied crude or refined oil products.

ENTAILMENT SUCCEEDS

Requires lexical

abstraction

Page 48: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 56

Discussed a general paradigm for learning and inference in the context of natural language understanding tasks

Did not discuss:

Knowledge Representation How to train?

Key insight – what to learn is driven by global decisions. Luckily: # of components is much smaller than # of

decisions. Emphasis should be on

Learn Locally and make use globally (via global inference) [Punyakanok et. al IJCAI’05]

Ability to make use of domain & constraints to drive supervision

[Klementiev & Roth, ACL’06]

Conclusions

Page 49: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 57

Discussed a general paradigm for learning and inference in the context of natural language understanding tasks Incorporate classifiers’ information, along with

expressive constraints, within an inference framework for the best explanation.

We can now incorporate many good old ideas.

Learning allows us to develop the right vocabulary, and supports appropriate abstractions so that we can study natural language understanding as a problem of reasoning.

Room for new research on reasoning patterns in NLP

Conclusions (2)

Page 50: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 58

Acknowledgement

Many of my students contributed significantly to this line of workVasin Punyakanok, Scott Yih, Mark SammonsXin Li, Dav Zimak, Rodrigo de salvo Braz,Chad Cumby, Yair Even Zohar, Michael Connor; Kevin Small, Alex Klementiev

Funding ARDA, under the AQUAINT program NSF: ITR IIS-0085836, ITR IIS-0428472; ITR IIS-

0085980 A DOI grant under the Reflex program, DASH Optimization

Page 51: Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Page 59

Questions?

Thank you