Page 1
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art
Lecture 2• Question Analysis• Background Knowledge• Answer Typing
Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking
Page 2
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Pronto architecture
Page 3
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Lecture 3
Page 4
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3Query Generation
• Document Analysis
• Semantic Indexing
• Answer Extraction
• Selection and Ranking
Page 5
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 6
© J
oh
an B
os
Ap
ril 2
008
Query Generation
• Once we analysed the question, we need to retrieve appropriate documents
• Most QA systems use an off-the-shelf information retrieval system for this task
• Examples:– Lemur– Lucene– Indri (used by Pronto)
• The input of the IR system is a query;the output is a ranked set of documents
Page 7
© J
oh
an B
os
Ap
ril 2
008
Queries
• Query generation depends on the way documents are indexed
• Based on– Semantic analysis of the question– Expected answer type– Background knowledge
• Computing a good query is hard – we don’t want too little documents, and we don’t want too many!
Page 8
© J
oh
an B
os
Ap
ril 2
008
Generating Query Terms
• Example 1:– Question: Who discovered prions?
– Text A: Dr. Stanley Prusiner received the Nobel prize for the discovery of prions.
– Text B: Prions are a kind of proteins that…
• Query terms?
Page 9
© J
oh
an B
os
Ap
ril 2
008
Generating Query Terms
• Example 2:– Question: When did Franz Kafka die?
– Text A: Kafka died in 1924.– Text B: Dr. Franz died in 1971.
• Query terms?
Page 10
© J
oh
an B
os
Ap
ril 2
008
Generating Query Terms
• Example 3:– Question: How did actor James Dean die?
– Text:
James Dean was killed in a car accident.
• Query terms?
Page 11
© J
oh
an B
os
Ap
ril 2
008
Useful query terms
• Ranked on importance:– Named entities– Dates or time expressions– Expressions in quotes– Nouns– Verbs
• Queries can be expanded using the created local knowledge base
Page 12
© J
oh
an B
os
Ap
ril 2
008
Query expansion example
• Query: sacajawea Returns only five documents
• Use synonyms in query expansions
• New query: sacajawea OR sagajaweaReturns two hundred documents
TREC 44.6 (Sacajawea)
How much is the Sacajawea coin worth?
Page 13
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 14
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3• Query GenerationDocument Analysis
• Semantic Indexing
• Answer Extraction
• Selection and Ranking
Page 15
© J
oh
an B
os
Ap
ril 2
008
Document Analysis – Why?
• The aim of QA is to output answers, not documents
• We need document analysis to– Find the correct type of answer in the
documents– Calculate the probability that an answer
is correct
• Semantic analysis is important to get valid answers
Page 16
© J
oh
an B
os
Ap
ril 2
008
Document Analysis – When?
• After retrieval– token or word based index– keyword queries– low precision
• Before retrieval– semantic indexing– concept queries– high precision– More NLP required
Page 17
© J
oh
an B
os
Ap
ril 2
008
Document Analysis – How?
• Ideally use the same NLP tools as for question analysis– This will make the semantic matching of
Question and Answer easier– Not always possible: wide coverage tools
are usally good at analysing text, but not at analysing questions
– Questions are often not part of large annotated corpora, on which NLP tools are trained
Page 18
© J
oh
an B
os
Ap
ril 2
008
Documents vs Passages
• Split documents into smaller passages– This will make the semantic matching
faster and more accurate– In Pronto the passage size is two
sentences, implemented by a sliding window
• Too small passages risk losing important contextual information– Pronouns and referring expressions
Page 19
© J
oh
an B
os
Ap
ril 2
008
Document Analysis
• Tokenisation
• Part of speech tagging
• Lemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)• Named entity recognition• Anaphora resolution
Page 20
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.
Page 21
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.– Text B:
Kafka lived in Austria. He died in 1924.
Page 22
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.– Text B:
Kafka lived in Austria. He died in 1924.– Text C:
Both Kafka and Lenin died in 1924.
Page 23
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.– Text B:
Kafka lived in Austria. He died in 1924.– Text C:
Both Kafka and Lenin died in 1924.– Text D:
Max Brod, who knew Kafka, died in 1930.
Page 24
© J
oh
an B
os
Ap
ril 2
008
Why semantics is important
• Example:– Question: When did Franz Kafka die? – Text A:
The mother of Franz Kafka died in 1918.– Text B:
Kafka lived in Austria. He died in 1924.– Text C:
Both Kafka and Lenin died in 1924.– Text D:
Max Brod, who knew Kafka, died in 1930.– Text E:
Someone who knew Kafka died in 1930.
Page 25
© J
oh
an B
os
Ap
ril 2
008
DRS for “The mother of Franz Kafka died in 1918.”
_____________________ | x3 x4 x2 x1 | |---------------------| | mother(x3) | | named(x4,kafka,per) | | named(x4,franz,per) | | die(x2) | | thing(x1) | | event(x2) | | of(x3,x4) | | agent(x2,x3) | | in(x2,x1) | | timex(x1)=+1918XXXX | |_____________________|
Page 26
© J
oh
an B
os
Ap
ril 2
008
DRS for:“Kafka lived in Austria. He died in 1924.”
_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|
Page 27
© J
oh
an B
os
Ap
ril 2
008
DRS for: “Both Kafka and Lenin died in 1924.”
_____________________| x6 x5 x4 x3 x2 x1 ||---------------------|| named(x6,kafka,per) || die(x5) || event(x5) || agent(x5,x6) || in(x5,x4) || timex(x4)=+1924XXXX || named(x3,lenin,per) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1924XXXX ||_____________________|
Page 28
© J
oh
an B
os
Ap
ril 2
008
DRS for:“Max Brod, who knew Kafka, died in 1930.”
_____________________| x3 x5 x4 x2 x1 ||---------------------|| named(x3,brod,per) || named(x3,max,per) || named(x5,kafka,per) || know(x4) || event(x4) || agent(x4,x3) || patient(x4,x5) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1930XXXX ||_____________________|
Page 29
© J
oh
an B
os
Ap
ril 2
008
DRS for:“Someone who knew Kafka died in 1930.”
_____________________| x3 x5 x4 x2 x1 ||---------------------|| person(x3) || named(x5,kafka,per) || know(x4) || event(x4) || agent(x4,x3) || patient(x4,x5) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1930XXXX ||_____________________|
Page 30
© J
oh
an B
os
Ap
ril 2
008
Document Analysis
• Tokenisation
• Part of speech tagging
• Lemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)Named entity recognition• Anaphora resolution
Page 31
© J
oh
an B
os
Ap
ril 2
008
Recall the Answer-Type Taxonomy
• We divided questions according to their expected answer type
• Simple Answer-Type Taxonomy
PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY
Page 32
© J
oh
an B
os
Ap
ril 2
008
Named Entity Recognition
• In order to make use of the answer types, we need to be able to recognise named entities of the same types in the documents
PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY
Page 33
© J
oh
an B
os
Ap
ril 2
008
Example Text
Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.
Page 34
© J
oh
an B
os
Ap
ril 2
008
Named entities
Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of Arthur Andersen.
Page 35
© J
oh
an B
os
Ap
ril 2
008
Named Entity Recognition
<ENAMEX TYPE=„LOCATION“>Italy</ENAME>‘s business world was rocked by the announcement <TIMEX TYPE=„DATE“>last Thursday</TIMEX> that Mr. <ENAMEX TYPE=„PERSON“>Verdi</ENAMEX> would leave his job as vice-president of <ENAMEX TYPE=„ORGANIZATION“>Music Masters of Milan, Inc</ENAMEX> to become operations director of <ENAMEX TYPE=„ORGANIZATION“>Arthur Andersen</ENAMEX>.
Page 36
© J
oh
an B
os
Ap
ril 2
008
NER difficulties
• Several types of entities are too numerous to include in dictionaries
• New names turn up every day
• Ambiguities – Paris, Lazio
• Different forms of same entities in same text– Brian Jones … Mr. Jones
• Capitalisation
Page 37
© J
oh
an B
os
Ap
ril 2
008
NER approaches
• Rule-based approaches– Hand-crafted rules– Help from databases of known
named entities [e.g. locations]
• Statistical approaches– Features – Machine learning
Page 38
© J
oh
an B
os
Ap
ril 2
008
Document Analysis
• Tokenisation
• Part of speech tagging
• Lemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)• Named entity recognitionAnaphora resolution
Page 39
© J
oh
an B
os
Ap
ril 2
008
What is anaphora?
• Relation between a pronoun and another element in the same or earlier sentence
• Anaphoric pronouns: – he, him, she, her, it, they, them
• Anaphoric noun phrases:– the country, – these documents, – his hat, her dress
Page 40
© J
oh
an B
os
Ap
ril 2
008
Anaphora (pronouns)
• Question:What is the biggest sector in Andorra’s economy?
• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of its tiny, well-to-do economy, accounts for roughly 80% of the GDP.
• Answer: ?
Page 41
© J
oh
an B
os
Ap
ril 2
008
Anaphora (definite descriptions)
• Question:What is the biggest sector in Andorra’s economy?
• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of the country’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.
• Answer: ?
Page 42
© J
oh
an B
os
Ap
ril 2
008
Anaphora Resolution
• Anaphora Resolution is the task of finding the antecedents of anaphoric expressions
• Example system:– Mitkov, Evans & Orasan (2002)– http://clg.wlv.ac.uk/MARS/
Page 43
© J
oh
an B
os
Ap
ril 2
008
“Kafka lived in Austria. He died in 1924.”
_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|
Page 44
© J
oh
an B
os
Ap
ril 2
008
“Kafka lived in Austria. He died in 1924.”
_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|
Page 45
© J
oh
an B
os
Ap
ril 2
008
Co-reference resolution
• Question:What is the biggest sector in Andorra’s economy?
• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of Andorra’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.
• Answer: Tourism
Page 46
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3• Query Generation
• Document AnalysisSemantic Indexing
• Answer Extraction
• Selection and Ranking
Page 47
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 48
© J
oh
an B
os
Ap
ril 2
008
Semantic indexing
• If we index documents on the token level, we cannot search for specific semantic concepts
• If we index documents on semantic concepts, we can formulate more specific queries
• Semantic indexing requires a complete preprocessing of the entire document collection [can be costly]
Page 49
© J
oh
an B
os
Ap
ril 2
008
Semantic indexing example
• Example NL question:
When did Franz Kafka die?
• Term-based – query: kafka– Returns all passages containing the term “kafka"
Page 50
© J
oh
an B
os
Ap
ril 2
008
Semantic indexing example
• Example NL question: When did Franz Kafka die?
• Term-based – query: kafka– Returns all passages containing the term “kafka"
• Concept-based – query: DATE & kafka – Returns all passages containing the term "kafka"
and a date expression
Page 51
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3• Query Generation
• Document Analysis
• Semantic IndexingAnswer Extraction
• Selection and Ranking
Page 52
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 53
© J
oh
an B
os
Ap
ril 2
008
Answer extraction
• Passage retrieval gives us a set of ranked documents
• Match answer with question– DRS for question– DRS for each possible document– Score for amount of overlap
• Deep inference or shallow matching
• Use knowledge
Page 54
© J
oh
an B
os
Ap
ril 2
008
Answer extraction: matching
• Given a question and an expression with a potential answer, calculate a matching score S = match(Q,A) that indicates how well Q matches A
• Example– Q: When was Franz Kafka born?
– A1: Franz Kafka died in 1924.
– A2: Kafka was born in 1883.
Page 55
© J
oh
an B
os
Ap
ril 2
008
Using logical inference
• Recall that Boxer produces first order representations [DRSs]
• In theory we could use a theorem prover to check whether a retrieved passage entails or is inconsistent with a question
• In practice this is too costly, given the high number of possible answer + question pairs that need to be considered
• Also: theorem provers are precise – they don’t give us information if they almost find a proof, although this would be useful for QA
Page 56
© J
oh
an B
os
Ap
ril 2
008
Semantic matching
• Matching is an efficient approximation to the inference task
• Consider flat semantic representation of the passage and the question
• Matching gives a score of the amount of overlap between the semantic content of the question and a potential answer
Page 57
© J
oh
an B
os
Ap
ril 2
008
Matching Example
• Question: When was Franz Kafka born?
• Passage 1:Franz Kafka died in 1924.
• Passage 2:Kafka was born in 1883.
Page 58
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(X)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
Page 59
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(X)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
X=x2
Page 60
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
Page 61
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
Y=x1
Page 62
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
Page 63
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Q: A1:
Page 64
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [1]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
Match score = 3/6 = 0.50
Q: A1: franz(x1)
kafka(x1)
die(x3)
agent(x3,x1)
in(x3,x2)
1924(x2)
Page 65
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(X)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Page 66
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(X)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,X)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
X=x2
Page 67
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Page 68
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(Y)
kafka(Y)
born(E)
patient(E,Y)
temp(E,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Y=x1
Page 69
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Page 70
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(E)
patient(E,x1)
temp(E,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
E=x3
Page 71
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Page 72
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Page 73
© J
oh
an B
os
Ap
ril 2
008
Semantic Matching [2]
answer(x2)
franz(x1)
kafka(x1)
born(x3)
patient(x3,x1)
temp(x3,x2)
kafka(x1)
born(x3)
patient(x3,x1)
in(x3,x2)
1883(x2)
Q: A2:
Match score = 4/6 = 0.67
Page 74
© J
oh
an B
os
Ap
ril 2
008
Matching Example
• Question: When was Franz Kafka born?
• Passage 1:Franz Kafka died in 1924.
• Passage 2:Kafka was born in 1883.
Match score = 0.67
Match score = 0.50
Page 75
© J
oh
an B
os
Ap
ril 2
008
Matching Techniques
• Weighted matching– Higher weight for named entities– Estimate weights using machine learning
• Incorporate background knowledge– WordNet [hyponyms]– NomLex– Paraphrases:
BORN(E) & IN(E,Y) & DATE(Y) TEMP(E,Y)
Page 76
© J
oh
an B
os
Ap
ril 2
008
Question Answering
Lecture 3• Query Generation
• Document Analysis
• Semantic Indexing
• Answer ExtractionSelection and Ranking
Page 77
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 78
© J
oh
an B
os
Ap
ril 2
008
Answer selection
• Rank answer– Group duplicates– Syntactically or semantically equivalent– Sort on frequency
• How specific should an answer be?– Semantic relations between answers– Hyponyms, synonyms– Answer modelling
[PhD thesis Dalmas 2007]
• Answer cardinality
Page 79
© J
oh
an B
os
Ap
ril 2
008
Answer selection example 1
• Where did Franz Kafka die?– In his bed– In a sanatorium– In Kierling– Near Vienna– In Austria
– In Berlin– In Germany
Page 80
© J
oh
an B
os
Ap
ril 2
008
Answer selection example 2
• Where is 3M based?– In Maplewood– In Maplewood, Minn.– In Minnesota– In the U.S.– In Maplewood, Minn., USA
– In San Francisco– In the Netherlands
Page 81
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 82
© J
oh
an B
os
Ap
ril 2
008
Reranking
• Most QA systems first produce a list of possible answers…
• This is usually followed by a process called reranking
• Reranking promotes correct answers to a higher rank
Page 83
© J
oh
an B
os
Ap
ril 2
008
Factors in reranking
• Matching score– The better the match with the question, the
more likely the answers
• Frequency– If the same answer occurs many times,
it is likely to be correct
Page 84
© J
oh
an B
os
Ap
ril 2
008
Answer Validation
• Answer Validation– check whether an answer is likely to be
correct using an expensive method
• Tie breaking– Deciding between two answers with similar
probability
• Methods:– Inference check– Sanity checking– Googling
Page 85
© J
oh
an B
os
Ap
ril 2
008
Inference check
• Use first-order logic [FOL] to check whether a potential answer entails the question
• This can be done with the use of a theorem prover– Translate Q into FOL– Translate A into FOL– Translate background knowledge into FOL – If ((BKfol & Afol) Qfol) is a theorem,
we have a likely answer
Page 86
© J
oh
an B
os
Ap
ril 2
008
Sanity Checking
Answer should be informative, that is, not part of the question
Q: Who is Tom Cruise married to?
A: Tom Cruise
Q: Where was Florence Nightingale born?
A: Florence
Page 87
© J
oh
an B
os
Ap
ril 2
008
Googling
• Given a ranked list of answers, some of these might not make sense at all
• Promote answers that make sense
• How?
• Use even a larger corpus!– “Sloppy” approach– “Strict” approach
Page 88
© J
oh
an B
os
Ap
ril 2
008
The World Wide Web
Page 89
© J
oh
an B
os
Ap
ril 2
008
Answer validation (sloppy)
• Given a question Q and a set of answers A1…An
• For each i, generate query Q Ai
• Count the number of hits for each i
• Choose Ai with most number of hits
• Use existing search engines– Google, AltaVista– Magnini et al. 2002 (CCP)
Page 90
© J
oh
an B
os
Ap
ril 2
008
Corrected Conditional Probability
• Treat Q and A as a bag of words– Q = content words question– A = answer
hits(A NEAR Q)
• CCP(Qsp,Asp) = ------------------------------ hits(A) x hits(Q)
• Accept answers above a certain CCP threshold
Page 91
© J
oh
an B
os
Ap
ril 2
008
Answer validation (strict)
• Given a question Q and a set of answers A1…An
• Create a declarative sentence with the focus of the question replaced by Ai
• Use the strict search option in Google– High precision– Low recall
• Any terms of the target not in the sentence as added to the query
Page 92
© J
oh
an B
os
Ap
ril 2
008
Example
• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?
• Top-5 Answers: 1) Britain
* 2) Okemah, Okla.3) Newport
* 4) Oklahoma5) New York
Page 93
© J
oh
an B
os
Ap
ril 2
008
Example: generate queries
• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?
• Generated queries: 1) “Guthrie was born in Britain”
2) “Guthrie was born in Okemah, Okla.”3) “Guthrie was born in Newport”4) “Guthrie was born in Oklahoma”5) “Guthrie was born in New York”
Page 94
© J
oh
an B
os
Ap
ril 2
008
Example: add target words
• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?
• Generated queries: 1) “Guthrie was born in Britain” Woody
2) “Guthrie was born in Okemah, Okla.” Woody3) “Guthrie was born in Newport” Woody4) “Guthrie was born in Oklahoma” Woody5) “Guthrie was born in New York” Woody
Page 95
© J
oh
an B
os
Ap
ril 2
008
Example: morphological variants
TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody
“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody
“Guthrie is OR was OR are OR were born in Newport” Woody
“Guthrie is OR was OR are OR were born in Oklahoma” Woody
“Guthrie is OR was OR are OR were born in New York” Woody
Page 96
© J
oh
an B
os
Ap
ril 2
008
Example: google hits
TREC 99.3
Target: Woody Guthrie.
Question: Where was Guthrie born?
Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody 0
“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody 10
“Guthrie is OR was OR are OR were born in Newport” Woody 0
“Guthrie is OR was OR are OR were born in Oklahoma” Woody 42
“Guthrie is OR was OR are OR were born in New York” Woody 2
Page 97
© J
oh
an B
os
Ap
ril 2
008
Example: reranked answers
TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?
Original answers 1) Britain
* 2) Okemah, Okla.3) Newport
* 4) Oklahoma5) New York
Reranked answers * 4) Oklahoma
* 2) Okemah, Okla.5) New York 1) Britain3) Newport
Page 98
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art
Lecture 2• Question Analysis• Background Knowledge• Answer Typing
Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking
Page 99
© J
oh
an B
os
Ap
ril 2
008
Where to go from here
• Producing answers in real-time
• Improve accuracy
• Answer explanation
• User modelling
• Speech interfaces
• Dialogue (interactive QA)
• Multi-lingual QA
• Non sequential architectures