© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

Post on 30-Mar-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art

Lecture 2• Question Analysis• Background Knowledge• Answer Typing

Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Pronto architecture

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Lecture 3

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3Query Generation

• Document Analysis

• Semantic Indexing

• Answer Extraction

• Selection and Ranking

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

© J

oh

an B

os

Ap

ril 2

008

Query Generation

• Once we analysed the question, we need to retrieve appropriate documents

• Most QA systems use an off-the-shelf information retrieval system for this task

• Examples:– Lemur– Lucene– Indri (used by Pronto)

• The input of the IR system is a query;the output is a ranked set of documents

© J

oh

an B

os

Ap

ril 2

008

Queries

• Query generation depends on the way documents are indexed

• Based on– Semantic analysis of the question– Expected answer type– Background knowledge

• Computing a good query is hard – we don’t want too little documents, and we don’t want too many!

© J

oh

an B

os

Ap

ril 2

008

Generating Query Terms

• Example 1:– Question: Who discovered prions?

– Text A: Dr. Stanley Prusiner received the Nobel prize for the discovery of prions.

– Text B: Prions are a kind of proteins that…

• Query terms?

© J

oh

an B

os

Ap

ril 2

008

Generating Query Terms

• Example 2:– Question: When did Franz Kafka die?

– Text A: Kafka died in 1924.– Text B: Dr. Franz died in 1971.

• Query terms?

© J

oh

an B

os

Ap

ril 2

008

Generating Query Terms

• Example 3:– Question: How did actor James Dean die?

– Text:

James Dean was killed in a car accident.

• Query terms?

© J

oh

an B

os

Ap

ril 2

008

Useful query terms

• Ranked on importance:– Named entities– Dates or time expressions– Expressions in quotes– Nouns– Verbs

• Queries can be expanded using the created local knowledge base

© J

oh

an B

os

Ap

ril 2

008

Query expansion example

• Query: sacajawea Returns only five documents

• Use synonyms in query expansions

• New query: sacajawea OR sagajaweaReturns two hundred documents

TREC 44.6 (Sacajawea)

How much is the Sacajawea coin worth?

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3• Query GenerationDocument Analysis

• Semantic Indexing

• Answer Extraction

• Selection and Ranking

© J

oh

an B

os

Ap

ril 2

008

Document Analysis – Why?

• The aim of QA is to output answers, not documents

• We need document analysis to– Find the correct type of answer in the

documents– Calculate the probability that an answer

is correct

• Semantic analysis is important to get valid answers

© J

oh

an B

os

Ap

ril 2

008

Document Analysis – When?

• After retrieval– token or word based index– keyword queries– low precision

• Before retrieval– semantic indexing– concept queries– high precision– More NLP required

© J

oh

an B

os

Ap

ril 2

008

Document Analysis – How?

• Ideally use the same NLP tools as for question analysis– This will make the semantic matching of

Question and Answer easier– Not always possible: wide coverage tools

are usally good at analysing text, but not at analysing questions

– Questions are often not part of large annotated corpora, on which NLP tools are trained

© J

oh

an B

os

Ap

ril 2

008

Documents vs Passages

• Split documents into smaller passages– This will make the semantic matching

faster and more accurate– In Pronto the passage size is two

sentences, implemented by a sliding window

• Too small passages risk losing important contextual information– Pronouns and referring expressions

© J

oh

an B

os

Ap

ril 2

008

Document Analysis

• Tokenisation

• Part of speech tagging

• Lemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)• Named entity recognition• Anaphora resolution

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.– Text B:

Kafka lived in Austria. He died in 1924.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.– Text B:

Kafka lived in Austria. He died in 1924.– Text C:

Both Kafka and Lenin died in 1924.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.– Text B:

Kafka lived in Austria. He died in 1924.– Text C:

Both Kafka and Lenin died in 1924.– Text D:

Max Brod, who knew Kafka, died in 1930.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.– Text B:

Kafka lived in Austria. He died in 1924.– Text C:

Both Kafka and Lenin died in 1924.– Text D:

Max Brod, who knew Kafka, died in 1930.– Text E:

Someone who knew Kafka died in 1930.

© J

oh

an B

os

Ap

ril 2

008

DRS for “The mother of Franz Kafka died in 1918.”

_____________________ | x3 x4 x2 x1 | |---------------------| | mother(x3) | | named(x4,kafka,per) | | named(x4,franz,per) | | die(x2) | | thing(x1) | | event(x2) | | of(x3,x4) | | agent(x2,x3) | | in(x2,x1) | | timex(x1)=+1918XXXX | |_____________________|

© J

oh

an B

os

Ap

ril 2

008

DRS for:“Kafka lived in Austria. He died in 1924.”

_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

© J

oh

an B

os

Ap

ril 2

008

DRS for: “Both Kafka and Lenin died in 1924.”

_____________________| x6 x5 x4 x3 x2 x1 ||---------------------|| named(x6,kafka,per) || die(x5) || event(x5) || agent(x5,x6) || in(x5,x4) || timex(x4)=+1924XXXX || named(x3,lenin,per) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1924XXXX ||_____________________|

© J

oh

an B

os

Ap

ril 2

008

DRS for:“Max Brod, who knew Kafka, died in 1930.”

_____________________| x3 x5 x4 x2 x1 ||---------------------|| named(x3,brod,per) || named(x3,max,per) || named(x5,kafka,per) || know(x4) || event(x4) || agent(x4,x3) || patient(x4,x5) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1930XXXX ||_____________________|

© J

oh

an B

os

Ap

ril 2

008

DRS for:“Someone who knew Kafka died in 1930.”

_____________________| x3 x5 x4 x2 x1 ||---------------------|| person(x3) || named(x5,kafka,per) || know(x4) || event(x4) || agent(x4,x3) || patient(x4,x5) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1930XXXX ||_____________________|

© J

oh

an B

os

Ap

ril 2

008

Document Analysis

• Tokenisation

• Part of speech tagging

• Lemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)Named entity recognition• Anaphora resolution

© J

oh

an B

os

Ap

ril 2

008

Recall the Answer-Type Taxonomy

• We divided questions according to their expected answer type

• Simple Answer-Type Taxonomy

PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY

© J

oh

an B

os

Ap

ril 2

008

Named Entity Recognition

• In order to make use of the answer types, we need to be able to recognise named entities of the same types in the documents

PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY

© J

oh

an B

os

Ap

ril 2

008

Example Text

Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of  Arthur Andersen. 

© J

oh

an B

os

Ap

ril 2

008

Named entities

Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of  Arthur Andersen. 

© J

oh

an B

os

Ap

ril 2

008

Named Entity Recognition

<ENAMEX TYPE=„LOCATION“>Italy</ENAME>‘s business world was rocked by the announcement <TIMEX TYPE=„DATE“>last Thursday</TIMEX> that Mr. <ENAMEX TYPE=„PERSON“>Verdi</ENAMEX> would leave his job as vice-president of <ENAMEX TYPE=„ORGANIZATION“>Music Masters of Milan, Inc</ENAMEX> to become operations director of  <ENAMEX TYPE=„ORGANIZATION“>Arthur Andersen</ENAMEX>. 

© J

oh

an B

os

Ap

ril 2

008

NER difficulties

• Several types of entities are too numerous to include in dictionaries

• New names turn up every day

• Ambiguities – Paris, Lazio

• Different forms of same entities in same text– Brian Jones … Mr. Jones

• Capitalisation

© J

oh

an B

os

Ap

ril 2

008

NER approaches

• Rule-based approaches– Hand-crafted rules– Help from databases of known

named entities [e.g. locations]

• Statistical approaches– Features – Machine learning

© J

oh

an B

os

Ap

ril 2

008

Document Analysis

• Tokenisation

• Part of speech tagging

• Lemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)• Named entity recognitionAnaphora resolution

© J

oh

an B

os

Ap

ril 2

008

What is anaphora?

• Relation between a pronoun and another element in the same or earlier sentence

• Anaphoric pronouns: – he, him, she, her, it, they, them

• Anaphoric noun phrases:– the country, – these documents, – his hat, her dress

© J

oh

an B

os

Ap

ril 2

008

Anaphora (pronouns)

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of its tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: ?

© J

oh

an B

os

Ap

ril 2

008

Anaphora (definite descriptions)

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of the country’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: ?

© J

oh

an B

os

Ap

ril 2

008

Anaphora Resolution

• Anaphora Resolution is the task of finding the antecedents of anaphoric expressions

• Example system:– Mitkov, Evans & Orasan (2002)– http://clg.wlv.ac.uk/MARS/

© J

oh

an B

os

Ap

ril 2

008

“Kafka lived in Austria. He died in 1924.”

_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

© J

oh

an B

os

Ap

ril 2

008

“Kafka lived in Austria. He died in 1924.”

_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

© J

oh

an B

os

Ap

ril 2

008

Co-reference resolution

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of Andorra’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: Tourism

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3• Query Generation

• Document AnalysisSemantic Indexing

• Answer Extraction

• Selection and Ranking

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

© J

oh

an B

os

Ap

ril 2

008

Semantic indexing

• If we index documents on the token level, we cannot search for specific semantic concepts

• If we index documents on semantic concepts, we can formulate more specific queries

• Semantic indexing requires a complete preprocessing of the entire document collection [can be costly]

© J

oh

an B

os

Ap

ril 2

008

Semantic indexing example

• Example NL question:

When did Franz Kafka die?

• Term-based – query: kafka– Returns all passages containing the term “kafka"

© J

oh

an B

os

Ap

ril 2

008

Semantic indexing example

• Example NL question: When did Franz Kafka die?

• Term-based – query: kafka– Returns all passages containing the term “kafka"

• Concept-based – query: DATE & kafka – Returns all passages containing the term "kafka"

and a date expression

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3• Query Generation

• Document Analysis

• Semantic IndexingAnswer Extraction

• Selection and Ranking

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

© J

oh

an B

os

Ap

ril 2

008

Answer extraction

• Passage retrieval gives us a set of ranked documents

• Match answer with question– DRS for question– DRS for each possible document– Score for amount of overlap

• Deep inference or shallow matching

• Use knowledge

© J

oh

an B

os

Ap

ril 2

008

Answer extraction: matching

• Given a question and an expression with a potential answer, calculate a matching score S = match(Q,A) that indicates how well Q matches A

• Example– Q: When was Franz Kafka born?

– A1: Franz Kafka died in 1924.

– A2: Kafka was born in 1883.

© J

oh

an B

os

Ap

ril 2

008

Using logical inference

• Recall that Boxer produces first order representations [DRSs]

• In theory we could use a theorem prover to check whether a retrieved passage entails or is inconsistent with a question

• In practice this is too costly, given the high number of possible answer + question pairs that need to be considered

• Also: theorem provers are precise – they don’t give us information if they almost find a proof, although this would be useful for QA

© J

oh

an B

os

Ap

ril 2

008

Semantic matching

• Matching is an efficient approximation to the inference task

• Consider flat semantic representation of the passage and the question

• Matching gives a score of the amount of overlap between the semantic content of the question and a potential answer

© J

oh

an B

os

Ap

ril 2

008

Matching Example

• Question: When was Franz Kafka born?

• Passage 1:Franz Kafka died in 1924.

• Passage 2:Kafka was born in 1883.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

X=x2

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

Y=x1

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

Match score = 3/6 = 0.50

Q: A1: franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

X=x2

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Y=x1

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

E=x3

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(x3)

patient(x3,x1)

temp(x3,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(x3)

patient(x3,x1)

temp(x3,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(x3)

patient(x3,x1)

temp(x3,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Match score = 4/6 = 0.67

© J

oh

an B

os

Ap

ril 2

008

Matching Example

• Question: When was Franz Kafka born?

• Passage 1:Franz Kafka died in 1924.

• Passage 2:Kafka was born in 1883.

Match score = 0.67

Match score = 0.50

© J

oh

an B

os

Ap

ril 2

008

Matching Techniques

• Weighted matching– Higher weight for named entities– Estimate weights using machine learning

• Incorporate background knowledge– WordNet [hyponyms]– NomLex– Paraphrases:

BORN(E) & IN(E,Y) & DATE(Y) TEMP(E,Y)

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3• Query Generation

• Document Analysis

• Semantic Indexing

• Answer ExtractionSelection and Ranking

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

© J

oh

an B

os

Ap

ril 2

008

Answer selection

• Rank answer– Group duplicates– Syntactically or semantically equivalent– Sort on frequency

• How specific should an answer be?– Semantic relations between answers– Hyponyms, synonyms– Answer modelling

[PhD thesis Dalmas 2007]

• Answer cardinality

© J

oh

an B

os

Ap

ril 2

008

Answer selection example 1

• Where did Franz Kafka die?– In his bed– In a sanatorium– In Kierling– Near Vienna– In Austria

– In Berlin– In Germany

© J

oh

an B

os

Ap

ril 2

008

Answer selection example 2

• Where is 3M based?– In Maplewood– In Maplewood, Minn.– In Minnesota– In the U.S.– In Maplewood, Minn., USA

– In San Francisco– In the Netherlands

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

© J

oh

an B

os

Ap

ril 2

008

Reranking

• Most QA systems first produce a list of possible answers…

• This is usually followed by a process called reranking

• Reranking promotes correct answers to a higher rank

© J

oh

an B

os

Ap

ril 2

008

Factors in reranking

• Matching score– The better the match with the question, the

more likely the answers

• Frequency– If the same answer occurs many times,

it is likely to be correct

© J

oh

an B

os

Ap

ril 2

008

Answer Validation

• Answer Validation– check whether an answer is likely to be

correct using an expensive method

• Tie breaking– Deciding between two answers with similar

probability

• Methods:– Inference check– Sanity checking– Googling

© J

oh

an B

os

Ap

ril 2

008

Inference check

• Use first-order logic [FOL] to check whether a potential answer entails the question

• This can be done with the use of a theorem prover– Translate Q into FOL– Translate A into FOL– Translate background knowledge into FOL – If ((BKfol & Afol) Qfol) is a theorem,

we have a likely answer

© J

oh

an B

os

Ap

ril 2

008

Sanity Checking

Answer should be informative, that is, not part of the question

Q: Who is Tom Cruise married to?

A: Tom Cruise

Q: Where was Florence Nightingale born?

A: Florence

© J

oh

an B

os

Ap

ril 2

008

Googling

• Given a ranked list of answers, some of these might not make sense at all

• Promote answers that make sense

• How?

• Use even a larger corpus!– “Sloppy” approach– “Strict” approach

© J

oh

an B

os

Ap

ril 2

008

The World Wide Web

© J

oh

an B

os

Ap

ril 2

008

Answer validation (sloppy)

• Given a question Q and a set of answers A1…An

• For each i, generate query Q Ai

• Count the number of hits for each i

• Choose Ai with most number of hits

• Use existing search engines– Google, AltaVista– Magnini et al. 2002 (CCP)

© J

oh

an B

os

Ap

ril 2

008

Corrected Conditional Probability

• Treat Q and A as a bag of words– Q = content words question– A = answer

hits(A NEAR Q)

• CCP(Qsp,Asp) = ------------------------------ hits(A) x hits(Q)

• Accept answers above a certain CCP threshold

© J

oh

an B

os

Ap

ril 2

008

Answer validation (strict)

• Given a question Q and a set of answers A1…An

• Create a declarative sentence with the focus of the question replaced by Ai

• Use the strict search option in Google– High precision– Low recall

• Any terms of the target not in the sentence as added to the query

© J

oh

an B

os

Ap

ril 2

008

Example

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Top-5 Answers: 1) Britain

* 2) Okemah, Okla.3) Newport

* 4) Oklahoma5) New York

© J

oh

an B

os

Ap

ril 2

008

Example: generate queries

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Generated queries: 1) “Guthrie was born in Britain”

2) “Guthrie was born in Okemah, Okla.”3) “Guthrie was born in Newport”4) “Guthrie was born in Oklahoma”5) “Guthrie was born in New York”

© J

oh

an B

os

Ap

ril 2

008

Example: add target words

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Generated queries: 1) “Guthrie was born in Britain” Woody

2) “Guthrie was born in Okemah, Okla.” Woody3) “Guthrie was born in Newport” Woody4) “Guthrie was born in Oklahoma” Woody5) “Guthrie was born in New York” Woody

© J

oh

an B

os

Ap

ril 2

008

Example: morphological variants

TREC 99.3

Target: Woody Guthrie.

Question: Where was Guthrie born?

Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody

“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody

“Guthrie is OR was OR are OR were born in Newport” Woody

“Guthrie is OR was OR are OR were born in Oklahoma” Woody

“Guthrie is OR was OR are OR were born in New York” Woody

© J

oh

an B

os

Ap

ril 2

008

Example: google hits

TREC 99.3

Target: Woody Guthrie.

Question: Where was Guthrie born?

Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody 0

“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody 10

“Guthrie is OR was OR are OR were born in Newport” Woody 0

“Guthrie is OR was OR are OR were born in Oklahoma” Woody 42

“Guthrie is OR was OR are OR were born in New York” Woody 2

© J

oh

an B

os

Ap

ril 2

008

Example: reranked answers

TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

Original answers 1) Britain

* 2) Okemah, Okla.3) Newport

* 4) Oklahoma5) New York

Reranked answers * 4) Oklahoma

* 2) Okemah, Okla.5) New York 1) Britain3) Newport

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art

Lecture 2• Question Analysis• Background Knowledge• Answer Typing

Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking

© J

oh

an B

os

Ap

ril 2

008

Where to go from here

• Producing answers in real-time

• Improve accuracy

• Answer explanation

• User modelling

• Speech interfaces

• Dialogue (interactive QA)

• Multi-lingual QA

• Non sequential architectures

top related