Top Banner
Semantic Inference for Question Answering and Sanda Harabagiu Department of Computer Science University of Texas at Dallas Srini Narayanan International Computer Science Institute Berkeley, CA
168

Semantic Inference for Question Answering

Feb 02, 2016

Download

Documents

dyllis

Semantic Inference for Question Answering. Sanda Harabagiu Department of Computer Science University of Texas at Dallas. Srini Narayanan International Computer Science Institute Berkeley, CA. and. Outline. Part I. Introduction: The need for Semantic Inference in QA - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Semantic Inference for  Question Answering

Semantic Inference for Question Answering

andSanda Harabagiu

Department of Computer ScienceUniversity of Texas at Dallas

Srini NarayananInternational Computer Science Institute

Berkeley, CA

Page 2: Semantic Inference for  Question Answering

Outline

Part I. Introduction: The need for Semantic Inference in QA Current State-of-the-art in QA Parsing with Predicate Argument Structures Parsing with Semantic Frames Special Text Relations

Part II. Extracting Semantic Relations from Questions and Texts Knowledge-intensive techniques Supervised and unsupervised techniques

Page 3: Semantic Inference for  Question Answering

Outline

Part III. Knowledge representation and inference Representing the semantics of answers Extended WordNet and abductive inference Intentional Structure and Probabilistic Metonymy An example of Event Structure Modeling relations, uncertainty and dynamics Inference methods and their mapping to answer

types

Page 4: Semantic Inference for  Question Answering

Outline Part IV. From Ontologies to Inference

From OWL to CPRM FrameNet in OWL FrameNet to CPRM mapping

Part V. Results of Event Structure Inference for QA AnswerBank examples Current results for Inference Type Current results for Answer Structure

Page 5: Semantic Inference for  Question Answering

The need for Semantic Inference in QA

Some questions are complex! Example:

How can a biological weapons program be detected ? Answer: In recent months, Milton Leitenberg, an expert on

biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory. He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago. A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries. The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel. US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors.

Page 6: Semantic Inference for  Question Answering

Complex questions Example:

How can a biological weapons program be detected ? This question is complex because:

It is a manner question All other manner questions that were evaluated in TREC were asking

about 3 things:• Manners to die, e.g. “How did Cleopatra die?”, “How did Einstein die?”• Manners to get a new name, e.g. “How did Cincinnati get its name?”• Manners to say something in another language, e.g. “How do you say

house in Spanish?” The answer does not contain any explicit manner of detection

information, instead it talks about reports that give indications that Iraq may be trying to develop a new viral agent and assessments by the United Nations suggesting that Iraq still has chemical and biological weapons

Page 7: Semantic Inference for  Question Answering

Complex questions and semantic information

Complex questions are not characterized only by a question class (e.g. manner questions) Example: How can a biological weapons program be detected ? Associated with the pattern “How can X be detected?” And the topic X = “biological weapons program”

Processing complex questions is also based on access to the semantics of the question topic The topic is modeled by a set of discriminating relations, e.g.

Develop(program); Produce(biological weapons); Acquire(biological weapons) or stockpile(biological weapons)

Such relations are extracted from topic-relevant texts

Page 8: Semantic Inference for  Question Answering

Alternative semantic representations Using PropBank to access a 1 million word

corpus annotated with predicate-argument structures.(www.cis.upenn.edu/~ace)

We can train a generative model for recognizing the arguments of each predicate in questions and in the candidate answers.

Example: How can a biological weapons program be detected ?

Predicate: detectArgument 0 = detector : Answer(1)Argument 1 = detected: biological weaponsArgument 2 = instrument : Answer(2)

ExpectedAnswer Type

Page 9: Semantic Inference for  Question Answering

More predicate-argument structures for questions Example: From which country did North Korea import its

missile launch pad metals?

Example: What stimulated India’s missile programs?

Predicate: importArgument 0: (role = importer): North KoreaArgument 1: (role = commodity): missile launch pad metalsArgument 2 (role = exporter): ANSWER

Predicate: stimulateArgument 0: (role = agent): ANSWER (part 1)Argument 1: (role = thing increasing): India’s missile programsArgument 2 (role = instrument): ANSWER (part 2)

Page 10: Semantic Inference for  Question Answering

Additional semantic resources Using FrameNet

• frame-semantic descriptions of several thousand English lexical items with semantically annotated attestations (www.icsi.berkeley.edu/~framenet)

Example: What stimulated India’s missile programs?

Frame: STIMULATEFrame Element: CIRCUMSTANCES: ANSWER (part 1)Frame Element: EXPERIENCER: India’s missile programsFrame Element: STIMULUS: ANSWER (part 2)

Frame: SUBJECT STIMULUSFrame Element: CIRCUMSTANCES: ANSWER (part 3)Frame Element: COMPARISON SET: ANSWER (part 4)Frame Element: EXPERIENCER: India’s missile programsFrame Element: PARAMETER: nuclear/biological proliferation

Page 11: Semantic Inference for  Question Answering

Semantic inference for Q/A The problem of classifying questions

E.g. “manner questions”: Example “How did Hitler die?”

The problem of recognizing answer types/structures Should “manner of death” by considered an answer type? What other manner of event/action should be considered as answer

types?

The problem of extracting/justifying/ generating answers to complex questions Should we learn to extract “manner” relations? What other types of relations should we consider? Is relation recognition sufficient for answering complex questions? Is

it necessary?

Page 12: Semantic Inference for  Question Answering

Manner-of-deathIn previous TREC evaluations 31 questions asked about

manner of death: “How did Adolf Hitler die?”

State-of-the-art solution (LCC): We considered “Manner-of-Death” as an answer type, pointing

to a variety of verbs and nominalizations encoded in WordNet We developed text mining techniques for identifying such

information based on lexico-semantic patterns from WordNet Example:

• [kill #sense1 (verb) – CAUSE die #sense1 (verb)] Source of the troponyms of the [kill #sense1 (verb)] concept

are candidates for the MANNER-OF-DEATH hierarchy• e.g., drown, poison, strangle, assassinate, shoot

Page 13: Semantic Inference for  Question Answering

Practical Hurdle

Not all MANNER-OF-DEATH concepts are lexicalized as a verb we set out to determine additional patterns that capture such cases Goal: (1) set of patterns (2) dictionaries corresponding to such patterns well known IE technique: (IJCAI’99, Riloff&Jones)

Results: 100 patterns were discovered

X DIE in ACCIDENT seed: train, accident, be killed (ACCIDENT) car wreck

X DIE {from|of} DISEASE seed: cancer be killed (DISEASE) AIDS

X DIE after suffering MEDICAL seed: stroke, suffering of CONDITION (ACCIDENT) complications caused by diabetes

Page 14: Semantic Inference for  Question Answering

Outline Part I. Introduction: The need for

Semantic Inference in QA Current State-of-the-art in QA Parsing with Predicate Argument

Structures Parsing with Semantic Frames Special Text Relations

Page 15: Semantic Inference for  Question Answering

Answer types in State-of-the-art QA systems

FeaturesAnswer type

Labels questions with answer type based on a taxonomy

Classifies questions (e.g. by using a maximum entropy model)

QuestionExpansion IR

AnswerSelection

Question Answer

Answer TypePrediction

answer type

Ranked setof passagesDocs

Answer Type Hierarchy

Page 16: Semantic Inference for  Question Answering

In Question Answering two heads are better than one The idea originated in the IBM’s PIQUANT project

Traditional Q/A systems employ a pipeline approach:• Questions analysis• Document/passage retrieval• Answer selection

Questions are classified based on the expected answer type Answers are also selected based on the expected answer

type, regardless of the question class

Motivated by the success of ensemble methods in machine learning, use multiple classifiers to produce the final output for the ensemble made of multiple QA agents

A multi-strategy, multi-source approach.

Page 17: Semantic Inference for  Question Answering

Multiple sources, multiple agents

QUESTIONQuestion Analysis

AnswerClassification

Q-FrameAnswer Type

Answering AgentsAnswering AgentsQPlan Generator

QPlan Executor

Predictive Annot.Answering Agents

StatisticalAnswering Agents

Definitional Q.Answering Agents

KSP-Based Answering Agents

Pattern-Based Answering Agents

Knowledge Source Portal

WordNet

Cyk

Web

SemanticSearch

Keyword Search

QGoals

AQUAINTTRECCNSAnswer ResolutionAnswer ResolutionANSWER Answers

Page 18: Semantic Inference for  Question Answering

Multiple Strategies In PIQUANT, the answer resolution strategies consider that

different combinations of the questions processing, passage retrieval and answer selection from different agents is ideal. This entails the fact that all questions are processed

depending on the questions class, not the question type• There are multiple question classes, e.g. “What” questions asking

about people, “What” questions asking about products, etc.• There are only three types of questions that have been evaluated

yet in systematic ways: • Factoid questions• Definition questions• List questions

Another options is to build an architecture in which question types are processed differently, and the semantic representations and inference mechanisms are adapted for each question type.

Page 19: Semantic Inference for  Question Answering

The Architecture of LCC’s QA System

Question Parse

SemanticTransformation

Recognition ofExpected

Answer Type

Keyword Extraction

FactoidQuestion

ListQuestion

Named EntityRecognition

(CICERO LITE)

Answer TypeHierarchy(WordNet)

Question Processing

Question Parse

Pattern Matching

Keyword Extraction

Question ProcessingDefinitionQuestion Definition

Answer

Answer Extraction

Pattern Matching

Definition Answer Processing

Answer Extraction

Threshold Cutoff

List Answer Processing ListAnswer

Answer Extraction

Answer Justification

Answer Reranking

Theorem Prover

Factoid Answer Processing

Axiomatic Knowledge Base

FactoidAnswer

MultipleDefinitionPassages

PatternRepository

Single FactoidPassages

MultipleList

Passages

Passage Retrieval

Document Processing

Document Index

AQUAINTDocumentCollection

Page 20: Semantic Inference for  Question Answering

Extracting Answers for Factoid Questions In TREC 2003 the LCC QA system extracted 289

correct answers for factoid questions The Name Entity Recognizer was responsible for 234

of them

QUANTITY 55 ORGANIZATION 15 PRICE 3NUMBER 45 AUTHORED WORK 11 SCIENCE NAME 2DATE 35 PRODUCT 11 ACRONYM 1PERSON 31 CONTINENT 5 ADDRESS 1COUNTRY 21 PROVINCE 5 ALPHABET 1OTHER LOCATIONS 19 QUOTE 5 URI 1CITY 19 UNIVERSITY 3

Page 21: Semantic Inference for  Question Answering

Special Case of Names

1934: What is the play “West Side Story” based on?Answer: Romeo and Juliet

1976: What is the motto for the Boy Scouts?Answer: Driving Miss Daisy

1982: What movie won the Academy Award for best picture in 1989?Answer: Driving Miss Daisy

2080: What peace treaty ended WWI?Answer: Versailles

2102: What American landmark stands on Liberty Island?Answer: Statue of Liberty

Questions asking for names of authored works

Page 22: Semantic Inference for  Question Answering

NE-driven QA The results of the past 5 TREC evaluations of QA

systems indicate that current state-of-the-art QA is determined by the recognition of Named Entities: Precision of recognition Coverage of name classes Mapping into concept hierarchies Participation into semantic relations (e.g.

predicate-argument structures or frame semantics)

Page 23: Semantic Inference for  Question Answering

Concept Taxonomies For 29% of questions the QA system relied

on an off-line taxonomy with semantic classes such as: Disease Drugs Colors Insects Games

The majority of these semantic classes are also associated with patterns that enable their identification

Page 24: Semantic Inference for  Question Answering

Definition Questions They asked about:

PEOPLE (most of them starting with “Who”) other types of NAMES general concepts

People questions Many use the PERSON name in the format [First name, Last

name]• examples: Aaron Copland, Allen Iverson, Albert Ghiorso

Some names had the PERSON name in format [First name, Last name1, Last name2]• example: Antonia Coello Novello

Other names had the name as a single word very well known person• examples: Nostradamus, Absalom, Abraham

Some questions referred to names of kings or princes:• examples: Vlad the Impaler, Akbar the Great

Page 25: Semantic Inference for  Question Answering

Answering definition questions

Most QA systems use between 30-60 patterns The most popular patterns:

Id Pattern Freq. Usage Question

25 person-hyponym QP 0.43% The doctors also consult with former Italian Olympic skier Alberto Tomba, along with other Italian athletes

1907: Who is Alberto Tomba?

9 QP, the AP 0.28% Bausch Lomb, the company that sells contact lenses, among hundreds of other optical products, has come up with a new twist on the computer screen magnifier

1917: What is Bausch & Lomb?

11 QP, a AP 0.11% ETA, a Basque language acronym for Basque Homeland and Freedom _ has killed nearly 800 people since taking up arms in 1968

1987: What is ETA in Spain?

13 QA, an AP 0.02% The kidnappers claimed they are members of the Abu Sayaf, an extremist Muslim group, but a leader of the group denied that

2042: Who is Abu Sayaf?

21 AP such as QP 0.02% For the hundreds of Albanian refugees undergoing medical tests and treatments at Fort Dix, the news is mostly good: Most are in reasonable good health, with little evidence of infectious diseases such as TB

2095: What is TB?

Page 26: Semantic Inference for  Question Answering

Complex questions Characterized by the need of domain knowledge

There is no single answer type that can be identified, but rather an answer structure needs to be recognized

Answer selection becomes more complicated, since inference based on the semantics of the answer type needs to be activated

Complex questions need to be decomposed into a set of simpler questions

Page 27: Semantic Inference for  Question Answering

Example of Complex Question

How have thefts impacted on the safety of Russia’s nuclear navy, and has the theft problem been increased or reduced over time?

Need of domain knowledgeNeed of domain knowledge To what degree do different thefts put nuclearor radioactive materials at risk?

Question decompositionQuestion decomposition Definition questions: What is meant by nuclear navy? What does ‘impact’ mean? How does one define the increase or decrease of a problem?

Factoid questions: What is the number of thefts that are likely to be reported? What sort of items have been stolen?

Alternative questions: What is meant by Russia? Only Russia, or also former Soviet facilities in non-Russian republics?

Page 28: Semantic Inference for  Question Answering

The answer structure For complex questions, the answer structure has a

compositional semantics, comprising all the answer structures of each simpler question in which it is decomposed.

Example:

Q-Sem: How can a biological weapons program be detected?Question pattern: How can X be detected? X = Biological Weapons Program

Conceptual Schemas

INSPECTION SchemaInspect, Scrutinize, Monitor, Detect, Evasion, Hide, Obfuscate

POSSESSION SchemaAcquire, Possess, Develop, Deliver

Structure of Complex Answer Type: EVIDENCECONTENT SOURCE QUALITY JUDGE RELIABILITY

Page 29: Semantic Inference for  Question Answering

Answer Selection Based on the answer structure Example:

The CONTENT is selected based on:

Conceptual schemas are instantiated when predicate-argument structures or semantic frames are recognized in the text passages

The SOURCE is recognized when the content source is identified

The Quality of the Judgements, the Reliability of the judgements and the Judgements themselves are produced by an inference mechanism

Conceptual Schemas

INSPECTION SchemaInspect, Scrutinize, Monitor, Detect, Evasion, Hide, Obfuscate

POSSESSION SchemaAcquire, Possess, Develop, Deliver

Structure of Complex Answer Type: EVIDENCECONTENT SOURCE QUALITY JUDGE RELIABILITY

Page 30: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

develop(Iraq, Viral_Agent(instance_of:new))Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated; Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden

possess(Iraq, Chemical and Biological Weapons) Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Status: Hidden from Inspectors Likelihood: Medium

possess(Iraq, delivery systems(type : rockets; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer

Structure

Page 31: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

possess(Iraq, fuel stock(purpose: power launchers)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

possess(Iraq, delivery systems(type : scud missiles; launchers; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer Structure

(continued)

hide(Iraq, Seeker: UN Inspectors; Hidden: CBW stockpiles & guided missiles) Justification: DETECTION Schema Inspection status: Past; Likelihood: Medium

Page 32: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Source: UN documents, US intelligence

SOURCE.Type: Assesment reports; Source.Reliability: Med-high Likelihood: Medium

Answer Structure

(continued)

Judge: UN, US intelligence, Milton Leitenberg (Biological Weapons expert)

JUDGE.Type: mixed; Judge.manner; Judge.stage: ongoing

Quality: low-medium; Reliability: low-medium;

Page 33: Semantic Inference for  Question Answering

State-of-the-art QA:Learning surface text patterns

Pioneered by Ravichandran and Hovy (ACL-2002) The idea is that given a specific answer type (e.g. Birth-

Date), learn all surface patterns that enable the extraction of the answer from any text passage Patterns are learned by two algorithms:

Relies on Web redundancy

Algorithm 1 (Generates Patterns)Step 1: Select an answer type AT and a question Q(AT)Step 2: Generate a query (Q(AT) & AT) and submit it to

search engine (google, altavista)Step 3: Download the first 1000 documentsStep 4: Select only those sentences that contain the

question content words and the ATStep 5: Pass the sentences through a suffix tree

constructorStep 6: Extract only the longest matching sub-strings

that contain the AT and the question word it is syntactically connected with.

Algorithm 2 (Measures the Precision of Patterns)Step 1: Query by using only question Q(AT)Step 2: Download the first 1000 documentsStep 3: Select only those sentences that contain the question word connected to the ATStep 4: Compute C(a)= #patterns matched

by the correct answer;C(0)=#patterns matched by any word

Step 6: The precision of a pattern is given by: C(a)/C(0)

Step 7: Retain only patterns matching >5 examples

Page 34: Semantic Inference for  Question Answering

Results and Problems Some results:

Limitations: Cannot handle long-distance dependencies Cannot recognize paraphrases – since no semantic knowledge

is associated with these patterns (unlike patterns used in Information Extraction)

Cannot recognize a paraphrased questions

Answer Type=INVENTOR:<ANSWER> invents <NAME>the <NAME> was invented by <ANSWER><ANSWER>’s invention of the <NAME><ANSWER>’s <NAME> was<NAME>, invented by <ANSWER>That <ANSWER>’s <NAME>

Answer Type=BIRTH-YEAR:<NAME> (<ANSWER>- )<NAME> was born on <ANSWER><NAME> was born in <ANSWER>born in <ANSWER>, <NAME>Of <NAME>, (<ANSWER>

Page 35: Semantic Inference for  Question Answering

Shallow semantic parsing Part of the problems can be solved by using

shallow semantic parsers Parsers that use shallow semantics encoded as either

predicate-argument structures or semantic frames• Long-distance dependencies are captured• Paraphrases can be recognized by mapping on IE

architectures In the past 4 years, several models for training such

parsers have emerged Lexico-Semantic resources are available (e.g PropBank,

FrameNet) Several evaluations measure the performance of such

parsers (e.g. SENSEVAL, CoNNL)

Page 36: Semantic Inference for  Question Answering

Outline Part I. Introduction: The need for

Semantic Inference in QA Current State-of-the-art in QA Parsing with Predicate Argument

Structures Parsing with Semantic Frames Special Text Relations

Page 37: Semantic Inference for  Question Answering

Proposition Bank Overview

A one million word corpus annotated with predicate argument structures [Kingsbury, 2002]. Currently only predicates lexicalized by verbs.

Numbered arguments from 0 to 5. Typically ARG0 = agent, ARG1 = direct object or theme, ARG2 = indirect object, benefactive, or instrument.

Functional tags: ARMG-LOC = locative, ARGM-TMP = temporal, ARGM-DIR = direction.

NP

The futures halt was assailed by Big Board floor traders

PP

VP

VP

NP

S

ARG1 = entity assailed PRED ARG0 = agent

Page 38: Semantic Inference for  Question Answering

The Model

Consists of two tasks: (1) identifying parse tree constituents corresponding to predicate arguments, and (2) assigning a role to each argument constituent.

Both tasks modeled using C5.0 decision tree learning, and two sets of features: Feature Set 1 adapted from [Gildea and Jurafsky, 2002], and Feature Set 2, novel set of semantic and syntactic features [Surdeanu, Harabagiu et al, 2003].

NP

The futures halt was assailed by Big Board floor traders

PP

VP

VP

NP

S

PRED

Task 1

ARG1 ARG0 Task 2

Page 39: Semantic Inference for  Question Answering

Feature Set 1

PHRASE TYPE (pt): type of the syntactic phrase as argument. E.g. NP for ARG1.

PARSE TREE PATH (path): path between argument and predicate. E.g. NP S VP VP for ARG1.

PATH LENGTH (pathLen): number of labels stored in the predicate-argument path. E.g. 4 for ARG1.

POSITION (pos): indicates if constituent appears before predicate in sentence. E.g. true for ARG1 and false for ARG2.

VOICE (voice): predicate voice (active or passive). E.g. passive for PRED.

HEAD WORD (hw): head word of the evaluated phrase. E.g. “halt” for ARG1.

GOVERNING CATEGORY (gov): indicates if an NP is dominated by a S phrase or a VP phrase. E.g. S for ARG1, VP for ARG0.

PREDICATE WORD: the verb with morphological information preserved (verb), and the verb normalized to lower case and infinitive form (lemma). E.g. for PRED verb is “assailed”, lemma is “assail”.

NP

The futures halt wasassailedbyBig Board floor traders

PP

VP

VP

NP

S

ARG1 PRED ARG0

Page 40: Semantic Inference for  Question Answering

Observations about Feature Set 1

Because most of the argument constituents are prepositional attachments (PP) and relative clauses (SBAR), often the head word (hw) is not the most informative word in the phrase.

Due to its strong lexicalization, the model suffers from data sparsity. E.g. hw used < 3%. The problem can be addressed with a back-off model from words to part of speech tags.

The features in set 1 capture only syntactic information, even though semantic information like named-entity tags should help. For example, ARGM-TMP typically contains DATE entities, and ARGM-LOC includes LOCATION named entities.

Feature set 1 does not capture predicates lexicalized by phrasal verbs, e.g. “put up”.

PP

NPin

last June

SBAR

S

VP

NP

that

occurred

yesterday

VP

VPto

be

declared

VP

Page 41: Semantic Inference for  Question Answering

Feature Set 2 (1/2) CONTENT WORD (cw): lexicalized feature that selects an

informative word from the constituent, other than the head. Selection heuristics available in the paper. E.g. “June” for the phrase “in last June”.

PART OF SPEECH OF CONTENT WORD (cPos): part of speech tag of the content word. E.g. NNP for the phrase “in last June”.

PART OF SPEECH OF HEAD WORD (hPos): part of speech tag of the head word. E.g. NN for the phrase “the futures halt”.

NAMED ENTITY CLASS OF CONTENT WORD (cNE): The class of the named entity that includes the content word. 7 named entity classes (from the MUC-7 specification) covered. E.g. DATE for “in last June”.

Page 42: Semantic Inference for  Question Answering

Feature Set 2 (2/2) BOOLEAN NAMED ENTITY FLAGS: set of features that indicate if a named

entity is included at any position in the phrase: neOrganization: set to true if an organization name is recognized in the phrase. neLocation: set to true if a location name is recognized in the phrase. nePerson: set to true if a person name is recognized in the phrase. neMoney: set to true if a currency expression is recognized in the phrase. nePercent: set to true if a percentage expression is recognized in the phrase. neTime: set to true if a time of day expression is recognized in the phrase. neDate: set to true if a date temporal expression is recognized in the phrase.

PHRASAL VERB COLLOCATIONS: set of two features that capture information about phrasal verbs:

pvcSum: the frequency with which a verb is immediately followed by any preposition or particle.

pvcMax: the frequency with which a verb is followed by its predominant preposition or particle.

Page 43: Semantic Inference for  Question Answering

ResultsFeatures Arg P Arg R Arg F1 Role A

FS1 84.96 84.26 84.61 78.76

FS1 + POS tag of head word

92.24 84.50 88.20 79.04

FS1 + content word and POS tag

92.19 84.67 88.27 80.80

FS1 + NE label of content word

83.93 85.69 84.80 79.85

FS1 + phrase NE flags

87.78 85.71 86.73 81.28

FS1 + phrasal verb information

84.88 82.77 83.81 78.62

FS1 + FS2 91.62 85.06 88.22 83.05

FS1 + FS2 + boosting

93.00 85.29 88.98 83.74

Page 44: Semantic Inference for  Question Answering

Other parsers based on PropBank

Pradhan, Ward et al, 2004 (HLT/NAACL+J of ML) report on a parser trained with SVMs which obtains F1-score=90.4% for Argument classification and 80.8% for detecting the boundaries and classifying the arguments, when only the first set of features is used.

Gildea and Hockenmaier (2003) use features extracted from Combinatory Categorial Grammar (CCG). The F1-measure obtained is 80%

Chen and Rambow (2003) use syntactic and semantic features extracted from a Tree Adjoining Grammar (TAG) and report an F1-measure of 93.5% for the core arguments

Pradhan, Ward et al, use a set of 12 new features and obtain and F1-score of 93.8% for argument classification and 86.7 for argument detection and classification

Page 45: Semantic Inference for  Question Answering

Applying Predicate-Argument Structures to QA Parsing Questions

Parsing Answers

Result: exact answer= “approximately 7 kg of HEU”

Q: What kind of materials were stolen from the Russian navy?

PAS(Q): What [Arg1: kind of nuclear materials] were [Predicate:stolen] [Arg2: from the Russian Navy]?

A(Q): Russia’s Pacific Fleet has also fallen prey to nuclear theft; in 1/96, approximately 7 kg of HEU was reportedly stolen from a naval base in Sovetskaya Gavan.

PAS(A(Q)): [Arg1(P1redicate 1): Russia’s Pacific Fleet] has [ArgM-Dis(Predicate 1) also] [Predicate 1: fallen] [Arg1(Predicate 1): prey to nuclear theft]; [ArgM-TMP(Predicate 2): in 1/96], [Arg1(Predicate 2): approximately 7 kg of HEU]was [ArgM-ADV(Predicate 2) reportedly] [Predicate 2: stolen] [Arg2(Predicate 2): from a naval base] [Arg3(Predicate 2): in Sovetskawa Gavan]

Page 46: Semantic Inference for  Question Answering

Outline Part I. Introduction: The need for

Semantic Inference in QA Current State-of-the-art in QA Parsing with Predicate Argument

Structures Parsing with Semantic Frames Special Text Relations

Page 47: Semantic Inference for  Question Answering

The Model

Consists of two tasks: (1) identifying parse tree constituents corresponding to frame elements, and (2) assigning a semantic role to each frame element.

Both tasks introduced for the first time by Gildea and Jurafsky in 2000. It uses the Feature Set 1 , which later Gildea and Palmer used for parsing based on PropBank.

She clapped her hands in inspiration

PP

VP

NP

NP

S

PRED

Task 1

Agent Cause Task 2Body Part

Page 48: Semantic Inference for  Question Answering

Extensions Fleischman et al extend the model in 2003 in

three ways: Adopt a maximum entropy framework for learning a

more accurate classification model. Include features that look at previous tags and use

previous tag information to find the highest probability for the semantic role sequence of any given sentence.

Examine sentence-level patterns that exploit more global information in order to classify frame elements.

Page 49: Semantic Inference for  Question Answering

Applying Frame Structures to QA

Parsing Questions

Parsing Answers

Result: exact answer= “approximately 7 kg of HEU”

Q: What kind of materials were stolen from the Russian navy?

FS(Q): What [GOODS: kind of nuclear materials] were [Target-Predicate:stolen] [VICTIM: from the Russian Navy]?

A(Q): Russia’s Pacific Fleet has also fallen prey to nuclear theft; in 1/96, approximately 7 kg of HEU was reportedly stolen from a naval base in Sovetskaya Gavan.

FS(A(Q)): [VICTIM(P1): Russia’s Pacific Fleet] has also fallen prey to [Goods(P1): nuclear ][Target-Predicate(P1): theft]; in 1/96, [GOODS(P2): approximately 7 kg of HEU]was reportedly [Target-Predicate (P2): stolen] [VICTIM (P2): from a naval base] [SOURCE(P2): in Sovetskawa Gavan]

Page 50: Semantic Inference for  Question Answering

Outline Part I. Introduction: The need for

Semantic Inference in QA Current State-of-the-art in QA Parsing with Predicate Argument

Structures Parsing with Semantic Frames Special Text Relations

Page 51: Semantic Inference for  Question Answering

Additional types of relations Temporal relations

TERQUAS ARDA Workshop

Causal relations Evidential relations Part-whole relations

Page 52: Semantic Inference for  Question Answering

Temporal relations in QA Results of the workshop are accessible from

http://www.cs.brandeis.edu/~jamesp/arda/time/documentation/TimeML-use-in-qa-v1.0.pdf

A set of questions that require the extraction of temporal relations was created (TimeML question corpus) E.g.:

• “When did the war between Iran and Iraq end?”• “Who was Secretary of Defense during the Golf War?”

A number of features of these questions were identified and annotated E.g.:

• Number of TEMPEX relations in the question• Volatility of the question (how often does the answer change)• Reference to repetitive events• Number of events mentioned in the question

Page 53: Semantic Inference for  Question Answering

Outline Part II. Extracting Semantic

Relations from Questions and Texts Knowledge-intensive techniques Unsupervised techniques

Page 54: Semantic Inference for  Question Answering

Information Extraction from texts Extracting semantic relations from questions and texts can

be solved by adapting the IE technology to this new task.

What is Information Extraction (IE) ? The task of finding facts about a specified class of events

from free text Filling a table in a database with the information – sush a

database entry can be seen as a list of slots of a template Events are instances comprising many relations that span

multiple arguments

Page 55: Semantic Inference for  Question Answering

IE Architecture Overview

Phrasal parserPhrasal parser

Entity coreferenceEntity coreference

Domain event rulesDomain event rules

Domain coreferenceDomain coreference

Templette mergingTemplette merging

Domain APIRules

Rules

Coreference filters

Merge condition

Page 56: Semantic Inference for  Question Answering

Walk-through Example

... a bomb rigged with a trip wire that exploded and killed him...

TEMPLETTEBOMB: “a bomb rigged with a trip wire”

TEMPLETTEDEAD: “A Chinese restaurant chef”

... a bomb rigged with a trip wire/NG that/P exploded/VG and/P killed/VG him/NG...

Parser

him A Chinese restaurant chefEntity Coref

... a bomb rigged with a trip wire that exploded/PATTERN and killed him/PATTERN...

Domain Rules

TEMPLETTEBOMB: “a bomb rigged with a trip wire”LOCATION: “MIAMI”

TEMPLETTEBOMB: “a bomb rigged with a trip wire”DEAD: “A Chinese restaurant chef”

Domain Coref

TEMPLETTEBOMB: “a bomb rigged with a trip wire”DEAD: “A Chinese restaurant chef”LOCATION: “MIAMI”

Merging

Page 57: Semantic Inference for  Question Answering

Learning domain event rulesand domain relations

build patterns from examples Yangarber ‘97

generalize from multiple examples: annotated text Crystal, Whisk (Soderland), Rapier (Califf)

active learning: reduce annotation Soderland ‘99, Califf ‘99

learning from corpus with relevance judgements Riloff ‘96, ‘99

co-learning/bootstrapping Brin ‘98, Agichtein ‘00

Page 58: Semantic Inference for  Question Answering

Changes in IE architecture for enabling the extraction of semantic relations

Relation Merging

Document

Tokenizer

EventRecognizer

Event/RelationCoreference

EntityRecognizer

RelationRecognizer

EntityCoreference

-Addition of Relation LayerAddition of Relation Layer

-Modification of NE and Modification of NE and pronominal coreferencepronominal coreferenceto enable relation coreferenceto enable relation coreference

-Add a relation mergingAdd a relation merginglayerlayer

EEML FileGeneration

EEML Results

Page 59: Semantic Inference for  Question Answering

Entity: Person

Entity: Person

Walk-through Example

Entity: City

Entity: Person

Entity: Time-Quantity

Entity: GeopoliticalEntity

Entity: PersonEvent: Murder

Event: Murder

The murder of Vladimir Golovlyov, an associate of the exiled

tycoon Boris Berezovsky, was the second contract killing in

the Russian capital in as many days and capped a week of

setbacks for the Russian leader.

Page 60: Semantic Inference for  Question Answering

Walk-through ExampleEvent-Entity Relation: Victim Entity-Entity Relation: AffiliatedWith

Event-Entity Relation: Victim

Event-Entity Relation: EventOccurAt

Entity-Entity Relation: GeographicalSubregion

Entity-Entity Relation: hasLeader

The murder of Vladimir Golovlyov, an associate of the exiled

tycoon Boris Berezovsky, was the second contract killing in

the Russian capital in as many days and capped a week of

setbacks for the Russian leader.

Page 61: Semantic Inference for  Question Answering

Application to QA Who was murdered in Moscow this week?

Relations: EventOccuredAt + Victim Name some associates of Vladimir Golovlyov.

Relations: AffiliatedWith How did Vladimir Golovlyov die?

Relations: Victim What is the relation between Vladimir Golovlyov

and Boris Berezovsky? Relations: AffliliatedWith

Page 62: Semantic Inference for  Question Answering

Outline Part II. Extracting Semantic

Relations from Questions and Texts Knowledge-intensive techniques Unsupervised techniques

Page 63: Semantic Inference for  Question Answering

Learning extraction rules and semantic lexicons Generating Extraction Patterns : AutoSlog (Riloff

1993), AutoSlog-Ts(Riloff 1996)

Semantic Lexicon Induction: Riloff & Shepherd (1997), Roark & Charniak (1998), Ge, Hale, & Charniak (1998), Caraballo (1999), Thompson & Mooney (1999), Meta-Bootstrapping (Riloff & Jones 1999), (Thelen and Riloff 2002)

Bootstrapping/Co-training: Yarowsky (1995), Blum and Mitchell (1998), McCallum & Nigam (1998)

Page 64: Semantic Inference for  Question Answering

Generating extraction rules From untagged text: AutoSlog-TS (Riloff 1996)

The rule relevance is measured by: Relevance rate * log2 (frequency)

Pre-classified Texts

STAGE 1

Sentence Analyzer

Subject: World Trade CenterVerb: was bombedPP: by terrorists

AutoSlog Heuristics

Concept Nodes:

<x> was bombedby <y>

Concept Nodes:

<x> was bombedby <y>

Pre-classified Texts

STAGE 2

Sentence AnalyzerConcept Node

Dictionary:

<x> was killed<x> was bombed by <y>

Concept Node Dictionary:

<x> was killed<x> was bombed by <y>

Concept Nodes: REL%

<x> was bombed 87% bombed by <y> 84%<w> was killed 63%<z> saw 49%

Concept Nodes: REL%

<x> was bombed 87% bombed by <y> 84%<w> was killed 63%<z> saw 49%

Page 65: Semantic Inference for  Question Answering

Learning Dictionaries for IE with mutual bootrapping Riloff and Jones (1999)

Generate all candidate extraction rules rom the training corpus using AutoSlog

Apply the candidate extraction rules to the training corpus and save the patternsWith their extractions to EPdata

SemLEx = {seed words}Cat_EPlist = {}

MUTUAL BOOTSTRAPPING LOOP1. Score all extraction rules in Epdata2. best_EP = the highest scoring extraction pattern not already in Cat_Eplist3. Add best_EP to Cat_Eplist4. Add best_EP’s extraction to SemLEx5. Go to step 1.

Page 66: Semantic Inference for  Question Answering

The BASILISK approach (Thelen & Riloff)

extraction patterns andtheir extractions

corpus

seedwords

5 best candidate words

CandidateWord Pool

semanticlexicon

Pattern Poolbest

patternsextractions

BASILISK = Bootstrapping Approach to SemantIc Lexicon Induction using Semantic Knowledge

Key ideas:1/ Collective evidence over a large set of extraction patterns can reveal strongsemantic associations.

2/ Learning multiple categories simultaneously can constrain the bootstrapping process

Page 67: Semantic Inference for  Question Answering

Learning Multiple Categories Simultaneously

Bootstrapping a single category Bootstrapping multiple categories

•“One Sense per Domain” assumption: a word belongs to a single semantic category within a limited domain.

The simplest way to take advantage of multiple categories is to resolve conflicts when they arise.

1. A word cannot be assigned to category X if it has already been assigned to category Y.

2. If a word is hypothesized for both category X and category Y at the same time, choose the category that receives the highest score.

Page 68: Semantic Inference for  Question Answering

Kernel Methods for Relation Extraction Pioneered by Zelenko, Aone and Richardella (2002)

Uses Support Vector Machines and the Voted Perceptron Alorithm (Freund and Shapire, 1999)

It operates on the shallow parses of texts, by using two functions: A matching function between the nodes of the shallow parse

tree; and A similarity function between the nodes

It obtains very high F1-score values for relation extraction (86.8%)

Page 69: Semantic Inference for  Question Answering

Outline Part III. Knowledge representation and

inference Representing the semantics of answers Extended WordNet and abductive inference Intentional Structure and Probabilistic

Metonymy An example of Event Structure Modeling relations, uncertainty and dynamics Inference methods and their mapping to

answer types

Page 70: Semantic Inference for  Question Answering

Three representations A taxonomy of answer types in which Named

Entity Classes are also mapped.

A complex structure that results from schema instantiations

Answer type generated by the inference on the semantic structures

Page 71: Semantic Inference for  Question Answering

Possible Answer Types

TOP

PERSON LOCATION DATE TIME PRODUCT NUMERICAL MONEY ORGANIZATION MANNER REASON

VALUE

DEGREE DIMENSION RATE DURATION PERCENTAGE COUNT

time of day

midnightprime time

clock time

hockeyteam

team,squad

institution,establishment

financialinstitution

educationalinstitution

numerosity,multiplicity

integer,whole number

population denominatorthickness

width,breadth

distance,length

altitude wingspan

Page 72: Semantic Inference for  Question Answering

Examples

What

played

actressname

Shine

What

BMW

companyproduce

TOP

PERSON LOCATION DATE TIME PRODUCT NUMERICAL MONEY ORGANIZATION MANNER REASON

VALUE

What is the name of theactress that played in Shine?

What does the BMW companyproduce?

PERSON PRODUCT

PRODUCTPERSON

Page 73: Semantic Inference for  Question Answering

Outline Part III. Knowledge representation and

inference Representing the semantics of answers Extended WordNet and abductive inference Intentional Structure and Probabilistic

Metonymy An example of Event Structure Modeling relations, uncertainty and dynamics Inference methods and their mapping to

answer types

Page 74: Semantic Inference for  Question Answering

Extended WordNet eXtended WordNet is an ongoing project at the

Human Language Technology Research Institute, University of Texas at Dallas. http://xwn.hlt.utdallas.edu/)

The goal of this project is to develop a tool that takes as input the current or future versions of WordNet and automatically generates an eXtended WordNet that provides several important enhancements intended to remedy the present limitations of WordNet.

In the eXtended WordNet the WordNet glosses are syntactically parsed, transformed into logic forms and content words are semantically disambiguated.

Page 75: Semantic Inference for  Question Answering

Logic Abduction

Axiom

Builder

Axiom

BuilderJustificationJustification

RelaxationRelaxation

Answer

Ranking

Answer

Ranking

QLF

ALF

XWN axioms

NLP axioms

Lexical chains

SuccessRanked

answers

Answer

explanation

Proof

fails

Motivation: Goes beyond keyword based justification by capturing:

• syntax based relationships • links between concepts in the question and the candidate

answers

Page 76: Semantic Inference for  Question Answering

COGEX= the LCC Logic Prover for QA

Inputs to the Logic Prover

A logic form provides a mapping of the question and candidate answer text into first order logic predicates.

Question:

Where did bin Laden 's funding come from other than his own wealth ?

Question Logic Form:

( _multi_AT(x1) ) & bin_NN_1(x2) & Laden_NN(x3) & _s_POS(x5,x4) & nn_NNC(x4,x2,x3) & funding_NN_1(x5) & come_VB_1(e1,x5,x11) & from_IN(e1,x1) & other_than_JJ_1(x6) & his_PRP_(x6,x4) & own_JJ_1(x6) & wealth_NN_1(x6)

Page 77: Semantic Inference for  Question Answering

Justifying the answer

Answer:

... Bin Laden reportedly sent representatives to Afghanistan opium farmers to buy large amounts of opium , probably to raise funds for

al - Qaida ....

Answer Logic Form:

… Bin_NN(x14) & Laden_NN(x15) & nn_NNC(x16,x14,x15) & reportedly_RB_1(e2) & send_VB_1(e2,x16,x17) & representative_NN_1(x17) & to_TO(e2,x21) & Afghanistan_NN_1(x18) & opium_NN_1(x19) & farmer_NN_1(x20) & nn_NNC(x21,x19,x20) & buy_VB_5(e3,x17,x22) & large_JJ_1(x22) & amount_NN_1(x22) & of_IN(x22,x23) & opium_NN_1(x23) & probably_RB_1(e4) & raise_VB_1(e4,x22,x24) & funds_NN_2(x24) & for_IN(x24,x26) & al_NN_1(x25) & Qaida_NN(x26) ...

Page 78: Semantic Inference for  Question Answering

Lexical Chains

Lexical Chains Lexical chains provide an improved source of world knowledge by supplying the Logic Prover with much needed axioms to link question keywords with answer concepts.

Question: How were biological agents acquired by bin Laden?

Answer: On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders , which was founded by Bin Laden , purchased three chemical and biological_agent production facilities in

Lexical Chain: ( v - buy#1, purchase#1 ) HYPERNYM ( v - get#1, acquire#1 )

Page 79: Semantic Inference for  Question Answering

Axiom selection

XWN AxiomsAnother source of world knowledge is a general purpose knowledge base of more than 50,000 parsed and disambiguated glosses that are transformed into logic form for use during the course of a proof.

Gloss: Kill is to cause to die

GLF: kill_VB_1(e1,x1,x2) -> cause_VB_1(e1,x1,x3) & to_TO(e1,e2) & die_VB_1(e2,x2,x4)

Page 80: Semantic Inference for  Question Answering

Logic Prover

Axiom Selection Lexical chains and the XWN knowledge base work together to select and generate the axioms

needed for a successful proof when all the keywords in the questions are not found in the answer.

Question: How did Adolf Hitler die?

Answer: … Adolf Hitler committed suicide …

The following Lexical Chain is detected:( n - suicide#1, self-destruction#1, self-annihilation#1 ) GLOSS ( v - kill#1 ) GLOSS ( v - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 ) 2

The following axioms are loaded into the Usable List of the Prover: exists x2 all e1 x1 (suicide_nn(x1) -> act_nn(x1) & of_in(x1,e1) & kill_vb(e1,x2,x2)).

exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2) -> cause_vb_2(e1,x1,x3) & to_to(e1,e2) & die_vb(e2,x2,x4)).

Page 81: Semantic Inference for  Question Answering

Outline Part III. Knowledge representation and

inference Representing the semantics of answers Extended WordNet and abductive inference Intentional Structure and Probabilistic

Metonymy An example of Event Structure Modeling relations, uncertainty and dynamics Inference methods and their mapping to

answer types

Page 82: Semantic Inference for  Question Answering

Intentional Structure of Questions

Example: Does have ?

x y

Predicate-argument have/possess (Iraq, biological weapons)

structure Arg-0 Arg-1

Question Pattern possess (x,y)

Intentional Structure

Iraq biological weapons

Evidence ??? CoercionMeans of Finding ??? CoercionSource ??? CoercionConsequence ??? Coercion

Page 83: Semantic Inference for  Question Answering

Coercion of Pragmatic Knowledge0*Evidence (1-possess (2-Iraq, 3-biological weapons)

A form of logical metonymyLapata and Lascarides (Computational Linguistics,2003)allows coercion of interpretations by collecting possible meanings from large corpora.Examples: Mary finished the cigarette

Mary finished smoking the cigarette.

Arabic is a difficult language Arabic is a language that is difficult to learn Arabic is a language that is difficult to process automatically

Page 84: Semantic Inference for  Question Answering

The IdeaLogic metonymy is in part processed as verbal

metonymy. We model, after Lapata and Lascarides, the interpretation of verbal metonymy as:

where: v—the metonymic verb (enjoy)

o—its object (the cigarette)e—the sought-after interpretation (smoking)

),,( voep

Page 85: Semantic Inference for  Question Answering

A probabilistic modelBy choosing the ordering , the probability may

be factored as:

where we make the estimations:

ove ,,

),|()|()(),,( veoPevPePvoeP

)(

),()|(ˆ;

)()(ˆ

ef

evfevP

N

efeP

)(

),()|(ˆ

),(

),,(),|(ˆ

ef

eofeoP

vef

veofveoP

)(

),(),(),,(

efN

eofevfvoeP

This is a model of interpretation and coercion

Page 86: Semantic Inference for  Question Answering

Coercions for intentional structures0*Evidence (1-possess (2-Iraq, 3-biological weaponry)

)3,1,(eP

1. v=discover (1,2,3)2. V=stockpile (2,3)3. V=use (2,3)4. V=0 (1,2,3)

1. e=develop (,3)2. e=acquire ( ,3)

1. e=inspections ( ,2, 3)

2. e=ban ( , 2, from 3)

Topic Coercion

),3,( topiceP

)3,2,(eP

),1,0( veP

Page 87: Semantic Inference for  Question Answering

Outline Part III. Knowledge representation and

inference Representing the semantics of answers Extended WordNet and abductive inference Intentional Structure and Probabilistic

Metonymy An example of Event Structure Modeling relations, uncertainty and dynamics Inference methods and their mapping to

answer types

Page 88: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

develop(Iraq, Viral_Agent(instance_of:new))Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated; Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden

possess(Iraq, Chemical and Biological Weapons) Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Status: Hidden from Inspectors Likelihood: Medium

possess(Iraq, delivery systems(type : rockets; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer

Structure

Page 89: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

develop(Iraq, Viral_Agent(instance_of:new))Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated; Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden

possess(Iraq, Chemical and Biological Weapons) Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Status: Hidden from Inspectors Likelihood: Medium

possess(Iraq, delivery systems(type : rockets; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer

Structure

Page 90: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

develop(Iraq, Viral_Agent(instance_of:new))Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated; Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden

possess(Iraq, Chemical and Biological Weapons) Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Status: Hidden from Inspectors Likelihood: Medium

possess(Iraq, delivery systems(type : rockets; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer

Structure Temporal Reference/Grounding

Page 91: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

possess(Iraq, fuel stock(purpose: power launchers)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

possess(Iraq, delivery systems(type : scud missiles; launchers; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer Structure

(continued)

hide(Iraq, Seeker: UN Inspectors; Hidden: CBW stockpiles & guided missiles) Justification: DETECTION Schema Inspection status: Past; Likelihood: Medium

Present Progressive Perfect

Present Progressive Continuing

Page 92: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

develop(Iraq, Viral_Agent(instance_of:new))Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated; Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden

possess(Iraq, Chemical and Biological Weapons) Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Status: Hidden from Inspectors Likelihood: Medium

possess(Iraq, delivery systems(type : rockets; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer

Structure Uncertainty and Belief

Page 93: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

develop(Iraq, Viral_Agent(instance_of:new))Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated; Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden

possess(Iraq, Chemical and Biological Weapons) Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Status: Hidden from Inspectors Likelihood: Medium

possess(Iraq, delivery systems(type : rockets; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer

Structure Uncertainty and Belief

Mutliple Sources with reliability

Page 94: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking

at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:develop(Iraq, Viral_Agent(instance_of:new))Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated;

Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden

possess(Iraq, Chemical and Biological Weapons) Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Status: Hidden from Inspectors Likelihood: Medium

possess(Iraq, delivery systems(type : rockets; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer

Structure

Event Structure Metaphor

Page 95: Semantic Inference for  Question Answering

Event Structure for semantically based QA Reasoning about dynamics

Complex event structure• Multiple stages, interruptions, resources, framing

Evolving events• Conditional events, presuppositions.

Nested temporal and aspectual references• Past, future event references

Metaphoric references• Use of motion domain to describe complex events.

Reasoning with Uncertainty Combining Evidence from Multiple, unreliable sources Non-monotonic inference

• Retracting previous assertions• Conditioning on partial evidence

Page 96: Semantic Inference for  Question Answering

Relevant Previous WorkEvent Structure

Aspect (VDT, TimeML), Situation Calculus (Steedman), Frame Semantics (Fillmore), Cognitive Linguistics (Langacker, Talmy), Metaphor and Aspect (Narayanan)

Reasoning about Uncertainty Bayes Nets (Pearl), Probabilistic Relational Models

(Koller), Graphical Models (Jordan)Reasoning about Dynamics

Dynamic Bayes Nets (Murphy), Distributed Systems (Alur, Meseguer), Control Theory (Ramadge and Wonham), Causality (Pearl)

Page 97: Semantic Inference for  Question Answering

Outline Part III. Knowledge representation and

inference Representing the semantics of answers Extended WordNet and abductive inference Intentional Structure and Probabilistic

Metonymy An example of Event Structure Modeling relations, uncertainty and dynamics Inference methods and their mapping to

answer types

Page 98: Semantic Inference for  Question Answering

Structured Probabilistic Inference

Page 99: Semantic Inference for  Question Answering

Probabilistic inference Filtering

• P(X_t | o_1…t,X_1…t)• Update the state based on the observation sequence

and state set MAP Estimation

• Argmaxh1…hnP(X_t | o_1…t, X_1…t)• Return the best assignment of values to the hypothesis

variables given the observation and states Smoothing

• P(X_t-k | o_1…t, X_1…t)• modify assumptions about previous states, given

observation sequence and state set Projection/Prediction/Reachability

• P(X_t+k | o_1..t, X_1..t)

Page 100: Semantic Inference for  Question Answering

Answer Type to Inference Method

ANSWER TYPE INFERENCE DESCRIPTIONJustify (Proposition) MAP Proposition is part

of the MAP

Ability (Agent, Act) Filtering; Smoothing

Past/Current Action enabled given current state

Prediction (State) P;R’ MAP Propogate current information andestimate best new state

Hypothetical (Condition) S, R_I Smooth intervene and compute state

Page 101: Semantic Inference for  Question Answering

Outline Part IV. From Ontologies to Inference

From OWL to CPRM FrameNet in OWL FrameNet to CPRM mapping

Page 102: Semantic Inference for  Question Answering

Semantic Web The World Wide Web (WWW) contains a large

and expanding information base. HTML is accessible to humans but does not

formally describe data in a machine interpretable form.

XML remedies this by allowing for the use of tags to describe data (ex. disambiguating crawl)

Ontologies are useful to describe objects and their inter-relationships.

DAML+OIL (http://www.daml.org) is an markup language based on XML and RDF that is grounded in description logic and is designed to allow for ontology development, transfer, and use on the web.

Page 103: Semantic Inference for  Question Answering

Programmatic Access to the web

Web-accessible programs and devices

Page 104: Semantic Inference for  Question Answering

Knowledge Rep’n for the “Semantic Web”

XML Schema RDF (Resource Description Framework)

RDFS (RDF Schema)

OWL/DAML-L (Logic)

OWL (Ontology)

XML (Extensible Markup Language)

Page 105: Semantic Inference for  Question Answering

Knowledge Rep’n for “Semantic Web Services”

XML Schema RDF (Resource Description Framework)

RDFS (RDF Schema)

DAML-L (Logic)

DAML+OIL (Ontology)

XML (Extensible Markup Language)

DAML-S (Services)

Page 106: Semantic Inference for  Question Answering

DAML-S: Semantic Markup for Web Services

DAML-S: A DARPA Agent Markup Language for Services • DAML+OIL ontology for Web services:

• well-defined semantics• ontologies support reuse, mapping, succinct markup, ...

• Developed by a coalition of researchers from Stanford, SRI, CMU, BBN, and Nokia, Yale, under the auspices of DARPA.

• DAML-S version 0.6 posted October,2001 http://www.daml.org/services/daml-s[DAML-S Coalition, 2001, 2002]

[Narayanan & McIlraith 2003]

Page 107: Semantic Inference for  Question Answering

DAML-S/OWL-S Compositional Primitives

process

atomicprocess

compositeprocess

inputs (conditional) outputs preconditions (conditional) effects

controlconstructs

composedBy

whilesequence

If-then-else

fork

...

Page 108: Semantic Inference for  Question Answering

PROCESS.OWL

The OWL-S Process Description

Page 109: Semantic Inference for  Question Answering

Implementation

DAML-S translation to the modeling environment KarmaSIM [Narayanan, 97] (http://www.icsi.berkeley.edu/~snarayan)

Basic Program:

Input: DAML-S description of Events

Output: Network Description of Events in KarmaSIM

Procedure:• Recursively construct a sub-network for each control

construct. Bottom out at atomic event.• Construct a net for each atomic event• Return network

Page 110: Semantic Inference for  Question Answering
Page 111: Semantic Inference for  Question Answering
Page 113: Semantic Inference for  Question Answering
Page 114: Semantic Inference for  Question Answering

Outline Part IV. From Ontologies to Inference

From OWL to CPRM FrameNet in OWL FrameNet to CPRM mapping

Page 115: Semantic Inference for  Question Answering

The FrameNet ProjectC Fillmore PI (ICSI)Co-PI’s:

S Narayanan (ICSI, SRI)D Jurafsky (U Colorado) J M Gawron (San Diego State U)

Staff: C Baker Project Manager B Cronin Programmer C Wooters Database Designer

Page 116: Semantic Inference for  Question Answering

Frames and Understanding Hypothesis: People understand

things by performing mental operations on what they already know. Such knowledge is describable in terms of information packets called frames.

Page 117: Semantic Inference for  Question Answering

FrameNet in the Larger Context The long-term goal is to reason about the

world in a way that humans understand and agree with.

Such a system requires a knowledge representation that includes the level of frames.

FrameNet can provide such knowledge for a number of domains.

FrameNet representations complement ontologies and lexicons.

Page 118: Semantic Inference for  Question Answering

The core work of FrameNet

1. characterize frames2. find words that fit the frames3. develop descriptive terminology4. extract sample sentences5. annotate selected examples6. derive "valence" descriptions

Page 119: Semantic Inference for  Question Answering

The Core Data

The basic data on which FrameNet descriptions are based take the form of a collection of annotated sentences, each coded for the combinatorial properties of one word in it. The annotation is done manually, but several steps are computer-assisted.

Page 120: Semantic Inference for  Question Answering

Types of Words / Frames

o eventso artifacts, built objectso natural kinds, parts and aggregateso terrain featureso institutions, belief systems, practiceso space, time, location, motiono etc.

Page 121: Semantic Inference for  Question Answering

Event Frames

Event frames have temporal structure, and generally have constraints on what precedes them, what happens during them, and what state the world is in once the event has been completed.

Page 122: Semantic Inference for  Question Answering

Sample Event Frame:Commercial TransactionInitial state:

Vendor has Goods, wants MoneyCustomer wants Goods, has Money

Transition:Vendor transmits Goods to CustomerCustomer transmits Money to Vendor

Final state:Vendor has Money

Customer has Goods

Page 123: Semantic Inference for  Question Answering

Sample Event Frame:Commercial TransactionInitial state:

Vendor has Goods, wants MoneyCustomer wants Goods, has Money

Transition:Vendor transmits Goods to CustomerCustomer transmits Money to Vendor

Final state:Vendor has Money

Customer has Goods

(It’s a bit more complicated than that.)

Page 124: Semantic Inference for  Question Answering

Partial Wordlist for Commercial Transactions

Verbs: pay, spend, cost, buy, sell, charge

Nouns: cost, price, payment

Adjectives: expensive, cheap

Page 125: Semantic Inference for  Question Answering

Meaning and Syntax The various verbs that evoke this

frame introduce the elements of the frame in different ways. The identities of the buyer, seller, goods

and money Information expressed in sentences

containing these verbs occurs in different places in the sentence depending on the verb.

Page 126: Semantic Inference for  Question Answering

Customer Vendor

Goods Money

BUY

from

for

She bought some carrots from the greengrocer for a dollar.

Page 127: Semantic Inference for  Question Answering

Customer Vendor

Goods Money

PAY

for

to

She paid a dollar to the greengrocer for some carrots.

Page 128: Semantic Inference for  Question Answering

Customer Vendor

Goods Money

PAY

for

She paid the greengrocer a dollar for the carrots.

Page 129: Semantic Inference for  Question Answering

FrameNet Product For every target word, describe the frames or conceptual

structures which underlie them, and annotate example sentences

that cover the ways in which information from the associated frames are expressed in these sentences.

Page 130: Semantic Inference for  Question Answering

Complex Frames With Criminal_process we have, for

example, sub-frame relations (one frame is a

component of a larger more abstract frame) and

temporal relations (one process precedes another)

Page 131: Semantic Inference for  Question Answering
Page 132: Semantic Inference for  Question Answering

FrameNet Entities and Relations

Frames Background Lexical

Frame Elements (Roles) Binding Constraints

Identify ISA(x:Frame, y:Frame) SubframeOf (x:Frame, y:Frame) Subframe Ordering

precedes Annotation

Page 133: Semantic Inference for  Question Answering

A DAML+OIL Frame Class <daml:Class rdf:ID="Frame"> <rdfs:comment> The most general class

</rdfs:comment> <daml:unionOf rdf:parseType="daml:collection"> <daml:Class rdf:about="#BackgroundFrame"/> <daml:Class rdf:about="#LexicalFrame"/> </daml:unionOf></daml:Class>

<daml:ObjectProperty rdf:ID="Name"> <rdfs:domain rdf:resource="#Frame"/> <rdfs:range rdf:resource="&rdf-schema;#Literal"/></daml:ObjectProperty>

Page 134: Semantic Inference for  Question Answering

DAML+OIL Frame Element<daml:ObjectProperty rdf:ID= "role"> <rdfs:domain rdf:resource="#Frame"/> <rdfs:range rdf:resource="&daml;#Thing"/></daml:ObjectProperty>

</daml:ObjectProperty> <daml:ObjectProperty rdf:ID="frameElement"> <daml:samePropertyAs rdf:resource="#role"/></daml:ObjectProperty>

<daml:ObjectProperty rdf:ID="FE"> <daml:samePropertyAs rdf:resource="#role"/></daml:ObjectProperty>

Page 135: Semantic Inference for  Question Answering

FE Binding Relation

<daml:ObjectProperty rdf:ID="bindingRelation"> <rdf:comment> See http://www.daml.org/services

</rdf:comment> <rdfs:domain rdf:resource="#Role"/> <rdfs:range rdf:resource="#Role"/></daml:ObjectProperty>

<daml:ObjectProperty rdf:ID="identify"> <rdfs:subPropertyOf rdf:resource="#bindingRelation"/> <rdfs:domain rdf:resource="#Role"/> <daml-s:sameValuesAs rdf:resource="#rdfs:range"/></daml:ObjectProperty>

Page 136: Semantic Inference for  Question Answering

Subframes and Ordering

<daml:ObjectProperty rdf:ID="subFrameOf"> <rdfs:domain rdf:resource="#Frame"/> <rdfs:range rdf:resource="#Frame"/></daml:ObjectProperty>

<daml:ObjectProperty rdf:ID="precedes"> <rdfs:domain rdf:resource="#Frame"/> <rdfs:range rdf:resource="#Frame"/></daml:ObjectProperty>

Page 137: Semantic Inference for  Question Answering

The Criminal Process Frame

Frame Element Description

Court The court where the process takes place

Defendant The charged individual

Judge The presiding Judge

Prosecution FE indentifies the attorneys’ prosecuting the defendant

Defense Attorneys’ defending the defendant

Page 138: Semantic Inference for  Question Answering

The Criminal Process Frame in DAML+OIL

<daml:Class rdf:ID="CriminalProcess"> <daml:subClassOf rdf:resource="#BackgroundFrame"/></daml:Class>

<daml:Class rdf:ID="CP"> <daml:sameClassAs rdf:resource="#CriminalProcess"/></daml:Class>

Page 139: Semantic Inference for  Question Answering

DAML+OIL Representation of the Criminal Process Frame Elements

<daml:ObjectProperty rdf:ID="court"> <daml:subPropertyOf rdf:resource="#FE"/> <daml:domain rdf:resource="#CriminalProcess"/> <daml:range rdf:resource="&CYC;#Court-Judicial"/></daml:ObjectProperty>

<daml:ObjectProperty rdf:ID="defense"> <daml:subPropertyOf rdf:resource="#FE"/> <daml:domain rdf:resource="#CriminalProcess"/> <daml:range rdf:resource="&SRI-IE;#Lawyer"/></daml:ObjectProperty>

Page 140: Semantic Inference for  Question Answering

FE Binding Constraints

<daml:ObjectProperty rdf:ID="prosecutionConstraint"> <daml:subPropertyOf rdf:resource="#identify"/> <daml:domain rdf:resource="#CP.prosecution"/> <daml-s:sameValuesAs rdf:resource="#Trial.prosecution"/></daml:ObjectProperty>

• The idenfication contraints can be between • Frames and Subframe FE’s.• Between Subframe FE’s

• DAML does not support the dot notation for paths.

Page 141: Semantic Inference for  Question Answering

Criminal Process Subframes<daml:Class rdf:ID="Arrest"> <rdfs:comment> A subframe </rdfs:comment> <rdfs:subClassOf rdf:resource="#LexicalFrame"/></daml:Class>

<daml:Class rdf:ID="Arraignment"> <rdfs:comment> A subframe </rdfs:comment> <rdfs:subClassOf rdf:resource="#LexicalFrame"/></daml:Class>

<daml:ObjectProperty rdf:ID="arraignSubFrame"> <rdfs:subPropertyOf rdf:resource="#subFrameOf"/> <rdfs:domain rdf:resource="#CP"/> <rdfs:range rdf:resource="#Arraignment"/></daml:ObjectProperty>

Page 142: Semantic Inference for  Question Answering

Specifying Subframe Ordering

<daml:Class rdf:about="#Arrest"> <daml:subClassOf> <daml:Restriction> <daml:onPropertyrdf:resource="#precedes"/> <daml:hasClass rdf:resource="#Arraignment"/> </daml:Restriction> </daml:subClassOf></daml:Class>

Page 143: Semantic Inference for  Question Answering

DAML+OIL CP Annotations<fn:Annotation> <tpos> "36352897" </tpos> <frame rdf:about ="&fn;Arrest"> <time> In July last year </time> <authorities> a German border guard

</authorities> <target> apprehended </target> <suspect> two Irishmen with Kalashnikov assault rifles. </suspect> </frame></fn:Annotation>

Page 144: Semantic Inference for  Question Answering

Outline Part IV. From Ontologies to Inference

From OWL to CPRM FrameNet in OWL FrameNet to CPRM mapping

Page 145: Semantic Inference for  Question Answering

Representing Event Frames At the computational level, we use a

structured event representation of event frames that formally specify The frame Frame Elements and filler types Constraints and role bindings Frame-to-Frame relations

• Subcase• Subevent

Page 146: Semantic Inference for  Question Answering

Events and actionsschema Event

rolesbefore : Phasetransition : Phaseafter : Phasenucleus

constraintstransition :: nucleus

schema Actionevokes Event as eroles

actor : Entityundergoer : Entityself e.nucleus

before aftertransition

nucleus

undergoer

actor

Page 147: Semantic Inference for  Question Answering

The Commercial-Transaction schema

schema Commercial-Transactionsubcase of Exchangeroles

customer participant1vendor participant2money entity1 : Moneygoods entity2goods-transfer transfer1 money-transfer transfer2

Page 148: Semantic Inference for  Question Answering

Implementation

DAML-S translation to the modeling environment KarmaSIM [Narayanan, 97] (http://www.icsi.berkeley.edu/~snarayan)

Basic Program:

Input: DAML-S description of Frame relations

Output: Network Description of Frames in KarmaSIM

Procedure:• Recursively construct a sub-network for each control

construct. Bottom out at atomic frame.• Construct a net for each atomic frame• Return network

Page 149: Semantic Inference for  Question Answering
Page 150: Semantic Inference for  Question Answering
Page 151: Semantic Inference for  Question Answering
Page 152: Semantic Inference for  Question Answering
Page 153: Semantic Inference for  Question Answering
Page 154: Semantic Inference for  Question Answering
Page 155: Semantic Inference for  Question Answering

Outline Part V. Results of Event Structure

Inference for QA AnswerBank Current results for Inference Type Current results for Answer Structure

Page 156: Semantic Inference for  Question Answering

AnswerBank AnswerBank is a collection of over a 1200 QA

annotations from the AQUAINT CNS corpus. Questions and answers cover the different domains

of the CNS data. Questions and answers are POS tagged, and

syntactically parsed. Question and Answer predicates are annotated with

PropBank arguments and FrameNet (when available) tags. FrameNet is annotating CNS data with frame information

for use by the AQUAINT QA community. We are planning to add more semantic information

including temporal, aspectual information (TIMEML+) and information about event relations and figurative uses.

Page 157: Semantic Inference for  Question Answering

Event Simulation

Predicate ExtractionRetrieved

Documents

FrameNetFrames

OWL/OWL-STopic

Ontologies

Model Parameterization

CONTEXT

PRM

< PRM Update>

<Pred(args), Topic Model, Answer Type>

<Simulation Triggering >

Page 158: Semantic Inference for  Question Answering

Answer Types for complex questions in AnswerBank

ANSWER TYPE EXAMPLE NUMBERJustify (Proposition) What is the evidence

that IRAQ has WMD?89

Ability (Agent, Act) How can a Biological Weapons Program be detected?

71

Prediction (State) What were the possible ramifications of India’s launch of the Prithvi missile?

63

Hypothetical (Condition) If Musharraf is removed from power, will Pakistan become a militant Islamic State?

62

Page 159: Semantic Inference for  Question Answering

Answer Type to Inference Method

ANSWER TYPE INFERENCE DESCRIPTIONJustify (Proposition) MAP Proposition is part

of the MAP

Ability (Agent, Act) Filtering; Smoothing

Past/Current Action enabled given current state

Prediction (State) P;R’ MAP Propogate current information andestimate best new state

Hypothetical (Condition) S, R_I Smooth intervene and compute state

Page 160: Semantic Inference for  Question Answering

Outline Part V. Results of Event Structure

Inference for QA AnswerBank Current results for Inference Type Current results for Answer Structure

Page 161: Semantic Inference for  Question Answering

AnswerBank Data We used 80 QA annotations from AnswerBank

Questions were of the four complex types • Justification, Ability, Prediction, Hypothetical

Answers were combined from multiple sentences (Average 4.3) and multiple annotations (average 2.1)

CNS Domains Covered were WMD related (54%) Nuclear Theft (25%) India’s missile program (21%)

Page 162: Semantic Inference for  Question Answering

Building Models Gold Standard:

From the hand-annotated data in the CNS corpus, we manually built CPRM domain models for inference.

Semantic Web based: From FrameNet frames and from

semantic web ontologies in OWL (SUMO-based, OpenCYC and others), we built CPRM models (semi-automatic)

Page 163: Semantic Inference for  Question Answering

Percent correct by inference type

66 66

51

87

83

73

63 OWL-based Domain Model

83 Manually generated from CNS

data

50

55

60

65

70

75

80

85

90

Justification Prediction Ability Hypothetical

% c

orr

ect

(co

mp

ared

to

go

ld s

tan

dar

d

OWL-based Domain Model Manually generated from CNS data

Page 164: Semantic Inference for  Question Answering

Event Structure Inferences For the annotations we classified complex

event structure inferences as Aspectual

• Stages of events, viewpoints, temporal relations (such as start(ev1, ev2), interrupt(ev1, ev2))

Action-Based• Resources (produce,consume,lock), preconditions,

maintenance conditions, effects. Metaphoric

• Event Structure Metaphor (ESM)Events and predications (motion => Action), objects (Motion.Mover => Action.Actor), Parameters(Motion.speed =>Action.rateOfProgress)

Page 165: Semantic Inference for  Question Answering

ANSWER: Evidence-Combined: Pointer to Text Source:

A1: In recent months, Milton Leitenberg, an expert on biological weapons, has been looking at this murkiest and most dangerous corner of Saddam Hussein's armory.

A2: He says a series of reports add up to indications that Iraq may be trying to develop a new viral agent, possibly in underground laboratories at a military complex near Baghdad where Iraqis first chased away inspectors six years ago.

A3: A new assessment by the United Nations suggests Iraq still has chemical and biological weapons - as well as the rockets to deliver them to targets in other countries.

A4:The UN document says Iraq may have hidden a number of Scud missiles, as well as launchers and stocks of fuel.

A5: US intelligence believes Iraq still has stockpiles of chemical and biological weapons and guided missiles, which it hid from the UN inspectors

Content: Biological Weapons Program:

develop(Iraq, Viral_Agent(instance_of:new))Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Inspection terminated; Status: Attempt ongoing Likelihood: Medium Confirmability: difficult, obtuse, hidden

possess(Iraq, Chemical and Biological Weapons) Justification: POSSESSION Schema Previous (Intent and Ability): Prevent(ability, Inspection); Status: Hidden from Inspectors Likelihood: Medium

possess(Iraq, delivery systems(type : rockets; target: other countries)) Justification: POSSESSION Schema Previous (Intent and Ability): Hidden from Inspectors; Status: Ongoing Likelihood: Medium

Answer

Structure

Page 166: Semantic Inference for  Question Answering

Content of InferencesComponent Number F-Score

ManualF-ScoreOWL

Aspectual 375 .74 .65

Action-Feature

459 .62 .45

Metaphor 149 .70 .62

Page 167: Semantic Inference for  Question Answering

Conclusion Answering complex questions requires semantic

representations at multiple levels. NE and Extraction-based Predicate Argument Structures Frame, Topic and Domain Models

All these representations should be capable of supporting inference about relational structures, uncertain information, and dynamic context.

Both Semantic Extraction techniques and Structured Probabilistic KR and Inference methods have matured to the point that we understand the various algorithms and their properties.

Flexible architectures that embody these KR and inference techniques and make use of the expanding linguistic and ontological

resources (such as on the Semantic Web) Point the way to the future of semantically based QA

systems!

Page 168: Semantic Inference for  Question Answering

References (URL) Semantic Resources

FrameNet: http://www.icsi.berkeley.edu/framenet (Papers on FrameNet and Computational Modeling efforts using FrameNet can be found here).

PropBank: http://www.cis.upenn.edu/~ace/ Gildea’s Verb Index; http://www.cs.rochester.edu/~gildea/Verbs/ (links FrameNet,

PropBank, and VerbNet Probabilistic KR (PRM)

http://robotics.stanford.edu/~koller/papers/lprm.ps (Learning PRM) http://www.eecs.harvard.edu/~avi/Papers/thesis.ps.gz (Avi Pfeffer’s PRM

Stanford thesis) Dynamic Bayes Nets

http://www.ai.mit.edu/~murphyk/Thesis/thesis.pdf (Kevin Murphy’s Berkeley DBN thesis)

Event Structure in Language http://www.icsi.berkeley.edu/~snarayan/thesis.pdf (Narayanan’s Berkeley PhD

thesis on models of metaphor and aspect) ftp://ftp.cis.upenn.edu/pub/steedman/temporality/temporality.ps.gz

(Steedman’s article on Temporality with links to previous work on aspect) http://www.icsi.berkeley.edu/NTL (publications on Cognitive Linguistics and

computational models of cognitive linguistic phenomena can be found here)