Top Banner
© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA • System Architecture • Methods System Evaluation • State-of-the-art Lecture 2 Question Analysis • Background Knowledge Answer Typing Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking
97

© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

Dec 25, 2015

Download

Documents

Damon Baker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art

Lecture 2• Question Analysis• Background Knowledge• Answer Typing

Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking

Page 2: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Pronto QA System

Page 3: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Lecture 2

Page 4: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 2Question Analysis

• Background Knowledge

• Answer Typing

Page 5: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Analysis – Why?

• The aim of QA is to output answers, not documents

• We need question analysis to– Determine the type of answer that we try

to find– Estimate the number of answers that we

want to return– Calculate the probability that an answer

is correct

Page 6: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

• We need ways to automate the process of manipulating natural language– Punctuation– The way words are composed– The relationship between words– The structure of phrases– Represent meaning of phrases

• This is where NLP comes in!– (NLP = Natural Language Processing)

Natural Language Processing

Page 7: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

How to use NLP tools?

• There is a large set of tools available on the web, most of it free for research

• Examples of integrated text processing environments:– GATE (University of Sheffield)– TTT (University of Edinburgh)– LingPipe– For a general overview of NLP tools, see

http://registry.dfki.de/– C&C (used by the Pronto QA system)

Page 8: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 9: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Analysis

• Tokenisation

• Part of speech tagging

• Lemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)

• Named entity recognition

• Anaphora resolution

Page 10: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Tokenisation

• Tokenisation is the task of splitting words from punctuation – Semicolons, colons ; :– exclamation marks, question marks ! ?– commas and full stops . ,– quotes “ ‘ `

• Tokens are normally split by spaces– In the following slides, we use |

Page 11: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Tokenisation: Example 1

• Input (9 tokens):

When was the Buckingham Palace built in London, England?

Page 12: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Tokenisation: Example 1

• Input (9 tokens):

When | was | the | Buckingham | Palace | built | in | London, | England?

Page 13: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Tokenisation: Example 1

• Input (9 tokens):

When | was | the | Buckingham | Palace | built | in | London, | England?

• Output (11 tokens):

When | was | the | Buckingham | Palace | built | in | London | , | England | ?

Page 14: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Tokenisation: Example 2

• Input (7 tokens):

What year did "Snow White" come out?

Page 15: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Tokenisation: Example 2

• Input (7 tokens):

What | year | did | "Snow | White" | come | out?

Page 16: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Tokenisation: Example 2

• Input (7 tokens):

What | year | did | "Snow | White" | come | out?

• Output (10 tokens):

What | year | did | “ | Snow | White | " | come | out | ?

Page 17: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Tokenisation: combined words

• Combined words are split– I’d I | ’d– country’s country | ’s– won’t wo | n’t– “don’t!” “ do | n’t ! “

• Some Italian examples– gliel’ha detto glie | l’ | ha | detto– posso prenderlo posso | prender | lo

Page 18: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Difficulties with tokenisation

• Abbreviations, acronyms– When was the U.S. invasion of Haiti?

• In particular if the abbreviation or acronym is the last word of a sentence– Look at next word: if in uppercase, then

assume it is end of sentence– But think of cases such as Mr. Jones

Page 19: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Why is tokenisation important?

• Required for all subsequent stages of processing– Parsing– Named entity recognition– Lemmatisation– To look up a word in an electronic

dictionary (such as WordNet)

Page 20: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Analysis

• TokenisationPart of speech tagging

• Named Entity Recognition

• Lemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)

Page 21: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Traditional parts of speech

• Verb

• Noun

• Pronoun

• Adjective

• Adverb

• Preposition

• Conjunction

• Interjection

Page 22: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Parts of speech in NLP

CLAWS1 (132 tags)

Examples:NN singular common noun (boy, pencil ... ) NN$ genitive singular common noun (boy's,

parliament's ... ) NNP singular common noun with word initial

capital (Austrian, American, Sioux, Eskimo ... )

NNP$ genitive singular common noun with word initial capital (Sioux', Eskimo's, Austrian's, American's, ...)

NNPS plural common noun with word initial capital (Americans, ... )

NNPS$ genitive plural common noun with word initial capital (Americans‘, …)

NNS plural common noun (pencils, skeletons, days, weeks ... )

NNS$ genitive plural common noun (boys', weeks' ... )

NNU abbreviated unit of measurement unmarked for number (in, cc, kg …)

Penn Treebank (45 tags)

Examples:JJ adjective (green, …)

JJR adjective, comparative (greener,…)

JJS adjective, superlative (greenest, …)

MD modal (could, will, …)

NN noun, singular or mass (table, …)

NNS noun plural (tables, …)

NNP proper noun, singular (John, …)

NNPS proper noun, plural (Vikings, …)

PDT predeterminer (both the boys)

POS possessive ending (friend's)

PRP personal pronoun (I, he, it, …)

PRP$ possessive pronoun (my, his, …)

RB adverb (however, usually, naturally, here, good, …)

RBR adverb, comparative (better, …)

Page 23: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

POS tagged example

Whatyear did “ Snow White " come out ?

Page 24: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

POS tagged example

What WP

year NN

did VBD

“ “ Snow NNP White NNP

" “come VB

out IN

? .

Page 25: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Why is POS-tagging important?

• To disambiguate words

• For instance, to distinguish “book” used as a noun from “book” used as a verb– Where can I find a book on cooking?– Where can I book a room?

• Prerequisite for further processing stages, such as parsing

Page 26: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Analysis

• Tokenisation

• Part of speech taggingLemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)

Page 27: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Lemmatisation

• Lemmatising means– grouping morphological variants of words under a

single headword

• For example, you could group the words

am, was, are, is, were, and been together under the word be 

Page 28: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Lemmatisation

• Lemmatising means– grouping morphological variants of words under a

single headword

• For example, you could group the words

am, was, are, is, were, and been together under the word be 

Page 29: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Lemmatisation

• Using linguistic terminology, the variants taken together form the lemma of a lexeme

• Lexeme: a “lexical unit”, an abstraction over specific constructions

• Other examples:

dying, die, died, dies diecar, cars carman, men man

Page 30: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Analysis

• Tokenisation

• Part of speech tagging

• LemmatisationSyntactic analysis (Parsing)

• Semantic analysis (Boxing)

Page 31: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

What is Parsing

• Parsing is the process of assigning a syntactic structure to a sequence of words

• The syntactic structure is defined using a grammar

• A grammar contains of a set of symbols (terminal and non-terminal symbols) and production rules (grammar rules)

• The lexicon is built over the terminal symbols (i.e., the words)

Page 32: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Syntactic Categories

• The non-terminal symbols correspond to syntactic categories– Det (determiner)– N (noun)– IV (intransitive verb)– TV (transitive verb)– PN (proper name)– Prep (preposition)– NP (noun phrase) the car – PP (prepositional phrase) at the table– VP (verb phrase) saw a car– S (sentence) Mia likes Vincent

Page 33: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Example Grammar

Lexicon

Det: which, a, the,…

N: rock, singer, …

IV: die, walk, …

TV: kill, write,…

PN: John, Lithium, …

Prep: on, from, to, …

Grammar Rules

S NP VP

NP Det N

NP PN

N N N

N N PP

VP TV NP

VP IV

PP Prep NP

VP VP PP

Page 34: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

The Parser

• A parser automates the process of parsing

• The input of the parser is a string of words (annotated with POS-tags)

• The output of a parser is a parse tree, connecting all the words

• The way a parse tree is constructed is also called a derivation

Page 35: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Derivation Example

Which rock singer wrote Lithium

Page 36: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Lexical stage

Det N N TV PN

Which rock singer wrote Lithium

Page 37: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Use rule: NP Det N

NP

Det N N TV PN

Which rock singer wrote Lithium

Page 38: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Use rule: NP PN

NP NP

Det N N TV PN

Which rock singer wrote Lithium

Page 39: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Use rule: VP TV NP

VP

NP NP

Det N N TV PN

Which rock singer wrote Lithium

Page 40: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Backtracking

VP

NP NP

Det N N TV PN

Which rock singer wrote Lithium

Page 41: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Use rule: N N N

VP

N NP

Det N N TV PN

Which rock singer wrote Lithium

Page 42: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Use rule: NP Det N

NP VP

N NP

Det N N TV PN

Which rock singer wrote Lithium

Page 43: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Use rule S NP VP

S

NP VP

N NP

Det N N TV PN

Which rock singer wrote Lithium

Page 44: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Wide coverage parsers

• Normally expect tokenised and POS-tagged input

• Example of wide-coverage parsers:– Charniak parser– Collins parser– RASP (Carroll & Briscoe)– CCG parser

(Clark & Curran – used in Pronto)

Page 45: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Output C&C parser

ba('S[wq]', fa('S[wq]', fa('S[wq]/(S[q]/PP)', fc('(S[wq]/(S[q]/PP))/N', lf(1,'(S[wq]/(S[q]/PP))/(S[wq]/(S[q]/NP))'), lf(2,'(S[wq]/(S[q]/NP))/N')), lf(3,'N')), fc('S[q]/PP', fa('S[q]/(S[b]NP)', lf(4,'(S[q]/(S[b]NP))/NP'), lex('N','NP', lf(5,'N'))), lf(6,'(S[b]NP)/PP'))), lf(7,'S[wq]S[wq]')).

w(1,'For', for, 'IN', 'O', '(S[wq]/(S[q]/PP))/(S[wq]/(S[q]/NP))').w(2,which, which, 'WDT','O', '(S[wq]/(S[q]/NP))/N').w(3,newspaper, newspaper, 'NN', 'O', 'N').w(4,does, do, 'VBZ','O', '(S[q]/(S[b]NP))/NP').w(5,'Krugman', krugman, 'NNP','I-PER', 'N').w(6,write, write, 'VB', 'O', '(S[b]NP)/PP').w(7,?, ?, '.', 'O', 'S[wq]S[wq]').

Page 46: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Analysis

• Tokenisation

• Part of speech tagging

• Lemmatisation

• Syntactic analysis (Parsing)Semantic analysis (Boxing)

Page 47: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 48: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Boxing (Semantic Analysis)

• Providing a semantic analysis on the basis of the syntactic analysis

• A semantic analysis of a question offers an abstract representation of the meaning of the question

• Boxer uses a particular semantic theory: Discourse Representation Theory

Page 49: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Discourse Representation Theory

• Meaning of natural language expressions represented in first-order logic

• No formulas but box representation (without explicit quantification and conjunction)

• DRT covers a wide range of linguistic phenomena (Kamp & Reyle)

Page 50: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Output of Boxer

_______________________ ____________________________________ | x0 | | x1 | |_______________________| |____________________________________| (| named(x0,krugman,per) |+| write(x1) |) | named(x0,paul,per) | | event(x1) | | | | agent(x1,x0) | |_______________________| | _______________ ____________ | | | x2 | | | | | |_______________| |____________| | | | newspaper(x2) | ? | event(x1) | | | |_______________| | for(x1,x2) | | | |____________| | |____________________________________|

DRS (Discourse Representation Structure):

Paul Krugman. For which newspaper does Krugman write?

Page 51: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Focus and Topic

• Information expressed in a question can be structured into two parts:– the focus: information that is asked for– the topic: information about focus

• Example:How many inhabitants does Rome have?

FOCUS TOPIC

Page 52: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Focus in DRS

Focus

_______________________ ____________________________________ | x0 | | x1 | |_______________________| |____________________________________| (| named(x0,krugman,per) |+| write(x1) |) | named(x0,paul,per) | | event(x1) | | | | agent(x1,x0) | |_______________________| | _______________ ____________ | | | x2 | | | | | |_______________| |____________| | | | newspaper(x2) | ? | event(x1) | | | |_______________| | for(x1,x2) | | | |____________| | |____________________________________|

Page 53: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 2• Question AnalysisBackground Knowledge

• Answer Typing

Page 54: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 55: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Knowledge Construction

• The knowledge component in Pronto constructs a local knowledge base for a the question under consideration– This knowledge is used in subsequent

components

• The task of the knowledge component is to find all relevant knowledge that might be used– As little as possible to ensure efficiency

Page 56: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Manually Constructed Knowledge

• Linguistic knowledge– WordNet– NomLex– FrameNet

• General knowledge– CYC– CIA Factbook– Gazzetteers

Page 57: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

WordNet

• Electronic dictionary

• Not only words and definitions, but also relations between words

• Four parts of speech– Nouns– Verbs– Adjectives– Adverbs

Page 58: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

WordNet SynSets

• Words are organised in SynSets

• A SynSet is a group of words with the same meaning --- in other words, a set of synonyms

• Example:{ Rome, Roma, Eternal City, Italian Capital, capital of Italy }

Page 59: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Senses

• A word can have several different meanings

• Example: plant– A building for industrial labour– A living organism lacking the power of

locomotion

• The different meanings of a word are called senses

• Therefore, one word can occur in more than one SynSet in WordNet

Page 60: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

SynSet Example

- {mug, mugful}= the quantity that can be held in a mug

- {chump, fool, gull, mark, patsy, fall guy, sucker, soft touch, chump, mug}= a person who is gullible and easy to take advantage of

- {countenance, physiognomy, phiz, visage, kisser, smiler, mug} = the human face

Page 61: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Hypernyms and Hyponyms

• Hyperonomy is a WordNet relation defined among two SynSets– If A is a hypernym of B, then A is more generic

then B

• The inverse of hyperonomy is hyponomy– If A is a hyponym of B, then A is more specific

then B

• Take transitive closure of these relations• Examples:

– “cow” and “horse” are hyponyms of “animal”– “publication” is a hypernym of “book”

Page 62: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Examples using WordNet

• Which rock singer wrote Lithium?– WordNet:

singer is a hyponym of person– Knowledge:

x(singer(x) person(x))

• What is the population of Andorra?– WordNet:

population is a hyponym of number– Knowledge:

x(population(x) number(x))

Page 63: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

NomLex

• NomLex is a database of nominalisation paraphrases– A nominalisation is a

“verb promoted to a noun”– A paraphrase links the noun to

the root verb

• Example:– X is an invention by Y Y invented X– the killing of X X was killed

Page 64: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Harvesting Knowledge

• Often existing knowledge bases are incomplete for particular applications

• There are various ways to automatically construct knowledge bases:– Instances and Hyponyms [e.g. Hearst]– Paraphrases [e.g. Lin & Pantel]

Page 65: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Hyponyms (X such-as Y)

• WordNet has no instances of airlines.

TREC 20.2 (Concorde)

What airlines have Concorde in their fleets?

Page 66: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Hyponyms (X such as Y)

• Search for “Xs such as Y” patterns in large corpora, such as the web

• Here: X = airline, Y a hyponym of X• Corpus:

…airlines such as Continental and United now fly…

TREC 20.2 (Concorde)

What airlines have Concorde in their fleets?

Page 67: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Hyponyms (X such as Y)

• Knowledge (Acquaint corpus):Air Asia, Air Canada, Air France, Air Mandalay, Air Zimbabwe, Alaska, Aloha, American Airlines, Angel Airlines, Ansett, Asiana, Bangkok Airways, Belgian Carrier Sabena, British Airways, Canadian, Cathay Pacific, China Eastern Airlines, China Xinhua Airlines, Continental, Garuda, Japan Airlines, Korean Air, Lai, Lao Aviation, Lufthansa, Malaysia Airlines, Maylasian Airlines, Midway, Northwest, Orient Thai Airlines, Qantas, Seage Air, Shanghai Airlines, Singapore Airlines, Skymark Airlines Co., South Africa, Swiss Air, US Airways, United, Virgin, Yangon Airways

TREC 20.2 (Concorde)

What airlines have Concorde in their fleets?

Page 68: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Paraphrases

• Several methods have been developed for automatically finding paraphrases in large corpora

• This usually proceeds by starting with seed patterns of known positive instances

• Using bootstrapping new patterns are found, and new seeds can be used

Page 69: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Seed example

• Start: Oswald killed JFK

• Search for "Oswald * JFK"

• Results:– Oswald assassinated JFK– Oswald shot JFK

• Use these new patters to find other pairs and start again

Page 70: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Paraphrase Example

Knowledge: xt(e(kill(e)&theme(e,x)&in(e,t)) e(die(e)&agent(e,x)&in(e,t)))

TREC 4.2 (James Dean)

When did James Dean die?

Page 71: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Paraphrase Example

Knowledge: xt(e(kill(e)&theme(e,x)&in(e,t)) e(die(e)&agent(e,x)&in(e,t)))

TREC 4.2 (James Dean)

When did James Dean die?

APW19990929.0165: In 1955, actor James Dean was

killed in a two-car collision near Cholame, Calif.

Page 72: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 2• Question Analysis

• Background KnowledgeAnswer Typing

Page 73: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 74: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer Typing

• Providing information on the expected answer type– Type of question– Type (sortal ontology or taxonomy)– Answer cardinality

• Issues– Ambiguities– Vagueness– Classification problems

Page 75: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Types

• Wh-questions:– Where was Franz Kafka born?– How many countries are member of

OPEC?– Who is Thom Yorke?– Why did David Koresh ask the FBI for a

word processor?– How did Frank Zappa die?– Which boxer beat Muhammed Ali?

Page 76: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Types

• Yes-no questions:– Does light have weight?– Scotland is part of England – true or false?

• Choice-questions:– Did Italy or Germany win the world cup in

1982?– Who is Harry Potter’s best friend – Ron,

Hermione or Sirius?

Page 77: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Indirect Questions

• Imperative mood:– Name four European countries that

produce wine.– Give the date of birth of Franz Kafka.

• Declarative mood:– I would like to know when Jim Morrison

was born.

Page 78: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer Type Taxonomies

• Simple Answer-Type Taxonomy:

PERSON

NUMERAL

DATE

MEASURE

LOCATION

ORGANISATION

Page 79: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Expected Answer Types

• PERSON:– Who won the Nobel prize for Peace?– Which rock singer wrote Lithium?

Page 80: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Expected Answer Types

• NUMERAL:– How many inhabitants does Rome have?– What’s the population of Scotland?

Page 81: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Expected Answer Types

• DATE:– When was JFK killed?– In what year did Rome become the capital

of Italy?

Page 82: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Expected Answer Types

• MEASURE:– How much does a 125 gallon fish tank

cost?– How tall is an African elephant?– How heavy is a Boeing 777?

Page 83: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Expected Answer Types

• LOCATION:– Where does Angus Young of AC/DC live?– What city gives a Christmas tree to

Westminster every year as a gift?

Page 84: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Expected Answer Types

• ORGANISATION:– Which company invented the

compact disk?– Who purchased Gilman Paper

Company?

Page 85: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Using background knowledge

• Which rock singer …– singer is a hyponym of person, therefore

expected answer type is PERSON

• What is the population of …– population is a hyponym of number,

hence answer type NUMERAL

Page 86: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer type tagging

Simple rule-based systems:Who … PERSONWhere … LOCATIONWhen … DATEHow many … NUMERAL

…often fail…– Who launched the iPod?– Where in the human body is the liver?– When is it time to go to bed?

Page 87: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Complex taxonomies

• Simple ontologies cannot account for the large variety of questions

• An example of a more complex ontology is proposed by Li & Roth

• Pronto uses its own complex ontology

• Machine learning approaches are often used to automatically tag questions with answer types

Page 88: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Taxonomy of Li & Roth (1/3)

• ENTITY – animal animals   – body organs of body   – color colors   – creative inventions, books and other creative pieces   – currency currency names   – dis.med. diseases and medicine   – event events   – food food  – instrument musical instrument   – lang languages   – letter letters like a-z   – other other entities   – plant plants  

– product products   – religion religions   – sport sports   – substance elements and substances   – symbol symbols and signs  – technique techniques and methods   – term equivalent terms   – vehicle vehicles   – word words with a special property

Page 89: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Taxonomy of Li & Roth (2/3)

• DESCRIPTION description and abstract concepts   – definition definition of sth.   – description description of sth.   – manner manner of an action   – reason reasons

• HUMAN human beings   – group a group or organization of persons   – ind an individual   – title title of a person   – description description of a person

• LOCATION locations   – city cities   – country countries   – mountain mountains   – other other locations   – state states

Page 90: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Taxonomy of Li & Roth (3/3)

• NUMERIC numeric values   – code postcodes or other codes   – count number of sth.   – date dates   – distance linear measures   – money prices   – order ranks   – other other numbers   – period the lasting time of sth.   – percent fractions   – speed speed   – temp temperature   – size size, area and volume   – weight weight

• ABBREVIATION   – abb abbreviation– exp expansion

Page 91: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Pronto Answer Type Taxonomy

Page 92: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Pronto Answer Type Taxonomy

Page 93: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer typing: problems

• AmbiguitiesHow long distance or duration

• Vague Wh-wordsWhat do pinguins eat? What is the length of a football pitch?

• Taxonomy gapsWhich alien race featured in Star Trek?What is the cultural capital of Italy?

Page 94: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer Cardinality

• How many distinct answers does a question have?

• Examples:– When did Louis Braille die?

1 answer– Who won a nobel prize in chemistry?

1 or more answers– What are the seven wonders of the world?

exactly 7 answers

Page 95: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Class activity: answer typing

1. How many islands does Italy have?2. When did Inter win the Scudetto?3. What are the colours of the Lithuanian flag?4. Where is St. Andrews located?5. Why does oil float in water?6. How did Frank Zappa die?7. Name the Baltic countries.8. Which seabird was declared extinct in the 1840s?9. Who is Noam Chomsky?10. List names of Russian composers.11. Edison is the inventor of what?12. How far is the moon from the sun?13. What is the distance from New York to Boston?14. How many planets are there?15. What is the exchange rate of the Euro to the Dollar?16. What does SPQR stand for?17. What is the nickname of Totti?18. What does the Scottish word “bonnie” mean?19. Who wrote the song “Paranoid Android”?

Page 96: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Lecture 3

Page 97: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art

Lecture 2• Question Analysis• Background Knowledge• Answer Typing

Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking