Page 1
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art
Lecture 2• Question Analysis• Background Knowledge• Answer Typing
Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking
Page 2
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Pronto QA System
Page 3
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Lecture 2
Page 4
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 2Question Analysis
• Background Knowledge
• Answer Typing
Page 5
© J
oh
an B
os
Ap
ril 2
008
Question Analysis – Why?
• The aim of QA is to output answers, not documents
• We need question analysis to– Determine the type of answer that we try
to find– Estimate the number of answers that we
want to return– Calculate the probability that an answer
is correct
Page 6
© J
oh
an B
os
Ap
ril 2
008
• We need ways to automate the process of manipulating natural language– Punctuation– The way words are composed– The relationship between words– The structure of phrases– Represent meaning of phrases
• This is where NLP comes in!– (NLP = Natural Language Processing)
Natural Language Processing
Page 7
© J
oh
an B
os
Ap
ril 2
008
How to use NLP tools?
• There is a large set of tools available on the web, most of it free for research
• Examples of integrated text processing environments:– GATE (University of Sheffield)– TTT (University of Edinburgh)– LingPipe– For a general overview of NLP tools, see
http://registry.dfki.de/– C&C (used by the Pronto QA system)
Page 8
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 9
© J
oh
an B
os
Ap
ril 2
008
Question Analysis
• Tokenisation
• Part of speech tagging
• Lemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)
• Named entity recognition
• Anaphora resolution
Page 10
© J
oh
an B
os
Ap
ril 2
008
Tokenisation
• Tokenisation is the task of splitting words from punctuation – Semicolons, colons ; :– exclamation marks, question marks ! ?– commas and full stops . ,– quotes “ ‘ `
• Tokens are normally split by spaces– In the following slides, we use |
Page 11
© J
oh
an B
os
Ap
ril 2
008
Tokenisation: Example 1
• Input (9 tokens):
When was the Buckingham Palace built in London, England?
Page 12
© J
oh
an B
os
Ap
ril 2
008
Tokenisation: Example 1
• Input (9 tokens):
When | was | the | Buckingham | Palace | built | in | London, | England?
Page 13
© J
oh
an B
os
Ap
ril 2
008
Tokenisation: Example 1
• Input (9 tokens):
When | was | the | Buckingham | Palace | built | in | London, | England?
• Output (11 tokens):
When | was | the | Buckingham | Palace | built | in | London | , | England | ?
Page 14
© J
oh
an B
os
Ap
ril 2
008
Tokenisation: Example 2
• Input (7 tokens):
What year did "Snow White" come out?
Page 15
© J
oh
an B
os
Ap
ril 2
008
Tokenisation: Example 2
• Input (7 tokens):
What | year | did | "Snow | White" | come | out?
Page 16
© J
oh
an B
os
Ap
ril 2
008
Tokenisation: Example 2
• Input (7 tokens):
What | year | did | "Snow | White" | come | out?
• Output (10 tokens):
What | year | did | “ | Snow | White | " | come | out | ?
Page 17
© J
oh
an B
os
Ap
ril 2
008
Tokenisation: combined words
• Combined words are split– I’d I | ’d– country’s country | ’s– won’t wo | n’t– “don’t!” “ do | n’t ! “
• Some Italian examples– gliel’ha detto glie | l’ | ha | detto– posso prenderlo posso | prender | lo
Page 18
© J
oh
an B
os
Ap
ril 2
008
Difficulties with tokenisation
• Abbreviations, acronyms– When was the U.S. invasion of Haiti?
• In particular if the abbreviation or acronym is the last word of a sentence– Look at next word: if in uppercase, then
assume it is end of sentence– But think of cases such as Mr. Jones
Page 19
© J
oh
an B
os
Ap
ril 2
008
Why is tokenisation important?
• Required for all subsequent stages of processing– Parsing– Named entity recognition– Lemmatisation– To look up a word in an electronic
dictionary (such as WordNet)
Page 20
© J
oh
an B
os
Ap
ril 2
008
Question Analysis
• TokenisationPart of speech tagging
• Named Entity Recognition
• Lemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)
Page 21
© J
oh
an B
os
Ap
ril 2
008
Traditional parts of speech
• Verb
• Noun
• Pronoun
• Adjective
• Adverb
• Preposition
• Conjunction
• Interjection
Page 22
© J
oh
an B
os
Ap
ril 2
008
Parts of speech in NLP
CLAWS1 (132 tags)
Examples:NN singular common noun (boy, pencil ... ) NN$ genitive singular common noun (boy's,
parliament's ... ) NNP singular common noun with word initial
capital (Austrian, American, Sioux, Eskimo ... )
NNP$ genitive singular common noun with word initial capital (Sioux', Eskimo's, Austrian's, American's, ...)
NNPS plural common noun with word initial capital (Americans, ... )
NNPS$ genitive plural common noun with word initial capital (Americans‘, …)
NNS plural common noun (pencils, skeletons, days, weeks ... )
NNS$ genitive plural common noun (boys', weeks' ... )
NNU abbreviated unit of measurement unmarked for number (in, cc, kg …)
Penn Treebank (45 tags)
Examples:JJ adjective (green, …)
JJR adjective, comparative (greener,…)
JJS adjective, superlative (greenest, …)
MD modal (could, will, …)
NN noun, singular or mass (table, …)
NNS noun plural (tables, …)
NNP proper noun, singular (John, …)
NNPS proper noun, plural (Vikings, …)
PDT predeterminer (both the boys)
POS possessive ending (friend's)
PRP personal pronoun (I, he, it, …)
PRP$ possessive pronoun (my, his, …)
RB adverb (however, usually, naturally, here, good, …)
RBR adverb, comparative (better, …)
Page 23
© J
oh
an B
os
Ap
ril 2
008
POS tagged example
Whatyear did “ Snow White " come out ?
Page 24
© J
oh
an B
os
Ap
ril 2
008
POS tagged example
What WP
year NN
did VBD
“ “ Snow NNP White NNP
" “come VB
out IN
? .
Page 25
© J
oh
an B
os
Ap
ril 2
008
Why is POS-tagging important?
• To disambiguate words
• For instance, to distinguish “book” used as a noun from “book” used as a verb– Where can I find a book on cooking?– Where can I book a room?
• Prerequisite for further processing stages, such as parsing
Page 26
© J
oh
an B
os
Ap
ril 2
008
Question Analysis
• Tokenisation
• Part of speech taggingLemmatisation
• Syntactic analysis (Parsing)
• Semantic analysis (Boxing)
Page 27
© J
oh
an B
os
Ap
ril 2
008
Lemmatisation
• Lemmatising means– grouping morphological variants of words under a
single headword
• For example, you could group the words
am, was, are, is, were, and been together under the word be
Page 28
© J
oh
an B
os
Ap
ril 2
008
Lemmatisation
• Lemmatising means– grouping morphological variants of words under a
single headword
• For example, you could group the words
am, was, are, is, were, and been together under the word be
Page 29
© J
oh
an B
os
Ap
ril 2
008
Lemmatisation
• Using linguistic terminology, the variants taken together form the lemma of a lexeme
• Lexeme: a “lexical unit”, an abstraction over specific constructions
• Other examples:
dying, die, died, dies diecar, cars carman, men man
Page 30
© J
oh
an B
os
Ap
ril 2
008
Question Analysis
• Tokenisation
• Part of speech tagging
• LemmatisationSyntactic analysis (Parsing)
• Semantic analysis (Boxing)
Page 31
© J
oh
an B
os
Ap
ril 2
008
What is Parsing
• Parsing is the process of assigning a syntactic structure to a sequence of words
• The syntactic structure is defined using a grammar
• A grammar contains of a set of symbols (terminal and non-terminal symbols) and production rules (grammar rules)
• The lexicon is built over the terminal symbols (i.e., the words)
Page 32
© J
oh
an B
os
Ap
ril 2
008
Syntactic Categories
• The non-terminal symbols correspond to syntactic categories– Det (determiner)– N (noun)– IV (intransitive verb)– TV (transitive verb)– PN (proper name)– Prep (preposition)– NP (noun phrase) the car – PP (prepositional phrase) at the table– VP (verb phrase) saw a car– S (sentence) Mia likes Vincent
Page 33
© J
oh
an B
os
Ap
ril 2
008
Example Grammar
Lexicon
Det: which, a, the,…
N: rock, singer, …
IV: die, walk, …
TV: kill, write,…
PN: John, Lithium, …
Prep: on, from, to, …
Grammar Rules
S NP VP
NP Det N
NP PN
N N N
N N PP
VP TV NP
VP IV
PP Prep NP
VP VP PP
Page 34
© J
oh
an B
os
Ap
ril 2
008
The Parser
• A parser automates the process of parsing
• The input of the parser is a string of words (annotated with POS-tags)
• The output of a parser is a parse tree, connecting all the words
• The way a parse tree is constructed is also called a derivation
Page 35
© J
oh
an B
os
Ap
ril 2
008
Derivation Example
Which rock singer wrote Lithium
Page 36
© J
oh
an B
os
Ap
ril 2
008
Lexical stage
Det N N TV PN
Which rock singer wrote Lithium
Page 37
© J
oh
an B
os
Ap
ril 2
008
Use rule: NP Det N
NP
Det N N TV PN
Which rock singer wrote Lithium
Page 38
© J
oh
an B
os
Ap
ril 2
008
Use rule: NP PN
NP NP
Det N N TV PN
Which rock singer wrote Lithium
Page 39
© J
oh
an B
os
Ap
ril 2
008
Use rule: VP TV NP
VP
NP NP
Det N N TV PN
Which rock singer wrote Lithium
Page 40
© J
oh
an B
os
Ap
ril 2
008
Backtracking
VP
NP NP
Det N N TV PN
Which rock singer wrote Lithium
Page 41
© J
oh
an B
os
Ap
ril 2
008
Use rule: N N N
VP
N NP
Det N N TV PN
Which rock singer wrote Lithium
Page 42
© J
oh
an B
os
Ap
ril 2
008
Use rule: NP Det N
NP VP
N NP
Det N N TV PN
Which rock singer wrote Lithium
Page 43
© J
oh
an B
os
Ap
ril 2
008
Use rule S NP VP
S
NP VP
N NP
Det N N TV PN
Which rock singer wrote Lithium
Page 44
© J
oh
an B
os
Ap
ril 2
008
Wide coverage parsers
• Normally expect tokenised and POS-tagged input
• Example of wide-coverage parsers:– Charniak parser– Collins parser– RASP (Carroll & Briscoe)– CCG parser
(Clark & Curran – used in Pronto)
Page 45
© J
oh
an B
os
Ap
ril 2
008
Output C&C parser
ba('S[wq]', fa('S[wq]', fa('S[wq]/(S[q]/PP)', fc('(S[wq]/(S[q]/PP))/N', lf(1,'(S[wq]/(S[q]/PP))/(S[wq]/(S[q]/NP))'), lf(2,'(S[wq]/(S[q]/NP))/N')), lf(3,'N')), fc('S[q]/PP', fa('S[q]/(S[b]NP)', lf(4,'(S[q]/(S[b]NP))/NP'), lex('N','NP', lf(5,'N'))), lf(6,'(S[b]NP)/PP'))), lf(7,'S[wq]S[wq]')).
w(1,'For', for, 'IN', 'O', '(S[wq]/(S[q]/PP))/(S[wq]/(S[q]/NP))').w(2,which, which, 'WDT','O', '(S[wq]/(S[q]/NP))/N').w(3,newspaper, newspaper, 'NN', 'O', 'N').w(4,does, do, 'VBZ','O', '(S[q]/(S[b]NP))/NP').w(5,'Krugman', krugman, 'NNP','I-PER', 'N').w(6,write, write, 'VB', 'O', '(S[b]NP)/PP').w(7,?, ?, '.', 'O', 'S[wq]S[wq]').
Page 46
© J
oh
an B
os
Ap
ril 2
008
Question Analysis
• Tokenisation
• Part of speech tagging
• Lemmatisation
• Syntactic analysis (Parsing)Semantic analysis (Boxing)
Page 47
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 48
© J
oh
an B
os
Ap
ril 2
008
Boxing (Semantic Analysis)
• Providing a semantic analysis on the basis of the syntactic analysis
• A semantic analysis of a question offers an abstract representation of the meaning of the question
• Boxer uses a particular semantic theory: Discourse Representation Theory
Page 49
© J
oh
an B
os
Ap
ril 2
008
Discourse Representation Theory
• Meaning of natural language expressions represented in first-order logic
• No formulas but box representation (without explicit quantification and conjunction)
• DRT covers a wide range of linguistic phenomena (Kamp & Reyle)
Page 50
© J
oh
an B
os
Ap
ril 2
008
Output of Boxer
_______________________ ____________________________________ | x0 | | x1 | |_______________________| |____________________________________| (| named(x0,krugman,per) |+| write(x1) |) | named(x0,paul,per) | | event(x1) | | | | agent(x1,x0) | |_______________________| | _______________ ____________ | | | x2 | | | | | |_______________| |____________| | | | newspaper(x2) | ? | event(x1) | | | |_______________| | for(x1,x2) | | | |____________| | |____________________________________|
DRS (Discourse Representation Structure):
Paul Krugman. For which newspaper does Krugman write?
Page 51
© J
oh
an B
os
Ap
ril 2
008
Focus and Topic
• Information expressed in a question can be structured into two parts:– the focus: information that is asked for– the topic: information about focus
• Example:How many inhabitants does Rome have?
FOCUS TOPIC
Page 52
© J
oh
an B
os
Ap
ril 2
008
Focus in DRS
Focus
_______________________ ____________________________________ | x0 | | x1 | |_______________________| |____________________________________| (| named(x0,krugman,per) |+| write(x1) |) | named(x0,paul,per) | | event(x1) | | | | agent(x1,x0) | |_______________________| | _______________ ____________ | | | x2 | | | | | |_______________| |____________| | | | newspaper(x2) | ? | event(x1) | | | |_______________| | for(x1,x2) | | | |____________| | |____________________________________|
Page 53
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 2• Question AnalysisBackground Knowledge
• Answer Typing
Page 54
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 55
© J
oh
an B
os
Ap
ril 2
008
Knowledge Construction
• The knowledge component in Pronto constructs a local knowledge base for a the question under consideration– This knowledge is used in subsequent
components
• The task of the knowledge component is to find all relevant knowledge that might be used– As little as possible to ensure efficiency
Page 56
© J
oh
an B
os
Ap
ril 2
008
Manually Constructed Knowledge
• Linguistic knowledge– WordNet– NomLex– FrameNet
• General knowledge– CYC– CIA Factbook– Gazzetteers
Page 57
© J
oh
an B
os
Ap
ril 2
008
WordNet
• Electronic dictionary
• Not only words and definitions, but also relations between words
• Four parts of speech– Nouns– Verbs– Adjectives– Adverbs
Page 58
© J
oh
an B
os
Ap
ril 2
008
WordNet SynSets
• Words are organised in SynSets
• A SynSet is a group of words with the same meaning --- in other words, a set of synonyms
• Example:{ Rome, Roma, Eternal City, Italian Capital, capital of Italy }
Page 59
© J
oh
an B
os
Ap
ril 2
008
Senses
• A word can have several different meanings
• Example: plant– A building for industrial labour– A living organism lacking the power of
locomotion
• The different meanings of a word are called senses
• Therefore, one word can occur in more than one SynSet in WordNet
Page 60
© J
oh
an B
os
Ap
ril 2
008
SynSet Example
- {mug, mugful}= the quantity that can be held in a mug
- {chump, fool, gull, mark, patsy, fall guy, sucker, soft touch, chump, mug}= a person who is gullible and easy to take advantage of
- {countenance, physiognomy, phiz, visage, kisser, smiler, mug} = the human face
Page 61
© J
oh
an B
os
Ap
ril 2
008
Hypernyms and Hyponyms
• Hyperonomy is a WordNet relation defined among two SynSets– If A is a hypernym of B, then A is more generic
then B
• The inverse of hyperonomy is hyponomy– If A is a hyponym of B, then A is more specific
then B
• Take transitive closure of these relations• Examples:
– “cow” and “horse” are hyponyms of “animal”– “publication” is a hypernym of “book”
Page 62
© J
oh
an B
os
Ap
ril 2
008
Examples using WordNet
• Which rock singer wrote Lithium?– WordNet:
singer is a hyponym of person– Knowledge:
x(singer(x) person(x))
• What is the population of Andorra?– WordNet:
population is a hyponym of number– Knowledge:
x(population(x) number(x))
Page 63
© J
oh
an B
os
Ap
ril 2
008
NomLex
• NomLex is a database of nominalisation paraphrases– A nominalisation is a
“verb promoted to a noun”– A paraphrase links the noun to
the root verb
• Example:– X is an invention by Y Y invented X– the killing of X X was killed
Page 64
© J
oh
an B
os
Ap
ril 2
008
Harvesting Knowledge
• Often existing knowledge bases are incomplete for particular applications
• There are various ways to automatically construct knowledge bases:– Instances and Hyponyms [e.g. Hearst]– Paraphrases [e.g. Lin & Pantel]
Page 65
© J
oh
an B
os
Ap
ril 2
008
Hyponyms (X such-as Y)
• WordNet has no instances of airlines.
TREC 20.2 (Concorde)
What airlines have Concorde in their fleets?
Page 66
© J
oh
an B
os
Ap
ril 2
008
Hyponyms (X such as Y)
• Search for “Xs such as Y” patterns in large corpora, such as the web
• Here: X = airline, Y a hyponym of X• Corpus:
…airlines such as Continental and United now fly…
TREC 20.2 (Concorde)
What airlines have Concorde in their fleets?
Page 67
© J
oh
an B
os
Ap
ril 2
008
Hyponyms (X such as Y)
• Knowledge (Acquaint corpus):Air Asia, Air Canada, Air France, Air Mandalay, Air Zimbabwe, Alaska, Aloha, American Airlines, Angel Airlines, Ansett, Asiana, Bangkok Airways, Belgian Carrier Sabena, British Airways, Canadian, Cathay Pacific, China Eastern Airlines, China Xinhua Airlines, Continental, Garuda, Japan Airlines, Korean Air, Lai, Lao Aviation, Lufthansa, Malaysia Airlines, Maylasian Airlines, Midway, Northwest, Orient Thai Airlines, Qantas, Seage Air, Shanghai Airlines, Singapore Airlines, Skymark Airlines Co., South Africa, Swiss Air, US Airways, United, Virgin, Yangon Airways
TREC 20.2 (Concorde)
What airlines have Concorde in their fleets?
Page 68
© J
oh
an B
os
Ap
ril 2
008
Paraphrases
• Several methods have been developed for automatically finding paraphrases in large corpora
• This usually proceeds by starting with seed patterns of known positive instances
• Using bootstrapping new patterns are found, and new seeds can be used
Page 69
© J
oh
an B
os
Ap
ril 2
008
Seed example
• Start: Oswald killed JFK
• Search for "Oswald * JFK"
• Results:– Oswald assassinated JFK– Oswald shot JFK
• Use these new patters to find other pairs and start again
Page 70
© J
oh
an B
os
Ap
ril 2
008
Paraphrase Example
Knowledge: xt(e(kill(e)&theme(e,x)&in(e,t)) e(die(e)&agent(e,x)&in(e,t)))
TREC 4.2 (James Dean)
When did James Dean die?
Page 71
© J
oh
an B
os
Ap
ril 2
008
Paraphrase Example
Knowledge: xt(e(kill(e)&theme(e,x)&in(e,t)) e(die(e)&agent(e,x)&in(e,t)))
TREC 4.2 (James Dean)
When did James Dean die?
APW19990929.0165: In 1955, actor James Dean was
killed in a two-car collision near Cholame, Calif.
Page 72
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 2• Question Analysis
• Background KnowledgeAnswer Typing
Page 73
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Architecture of PRONTO
Page 74
© J
oh
an B
os
Ap
ril 2
008
Answer Typing
• Providing information on the expected answer type– Type of question– Type (sortal ontology or taxonomy)– Answer cardinality
• Issues– Ambiguities– Vagueness– Classification problems
Page 75
© J
oh
an B
os
Ap
ril 2
008
Question Types
• Wh-questions:– Where was Franz Kafka born?– How many countries are member of
OPEC?– Who is Thom Yorke?– Why did David Koresh ask the FBI for a
word processor?– How did Frank Zappa die?– Which boxer beat Muhammed Ali?
Page 76
© J
oh
an B
os
Ap
ril 2
008
Question Types
• Yes-no questions:– Does light have weight?– Scotland is part of England – true or false?
• Choice-questions:– Did Italy or Germany win the world cup in
1982?– Who is Harry Potter’s best friend – Ron,
Hermione or Sirius?
Page 77
© J
oh
an B
os
Ap
ril 2
008
Indirect Questions
• Imperative mood:– Name four European countries that
produce wine.– Give the date of birth of Franz Kafka.
• Declarative mood:– I would like to know when Jim Morrison
was born.
Page 78
© J
oh
an B
os
Ap
ril 2
008
Answer Type Taxonomies
• Simple Answer-Type Taxonomy:
PERSON
NUMERAL
DATE
MEASURE
LOCATION
ORGANISATION
Page 79
© J
oh
an B
os
Ap
ril 2
008
Expected Answer Types
• PERSON:– Who won the Nobel prize for Peace?– Which rock singer wrote Lithium?
Page 80
© J
oh
an B
os
Ap
ril 2
008
Expected Answer Types
• NUMERAL:– How many inhabitants does Rome have?– What’s the population of Scotland?
Page 81
© J
oh
an B
os
Ap
ril 2
008
Expected Answer Types
• DATE:– When was JFK killed?– In what year did Rome become the capital
of Italy?
Page 82
© J
oh
an B
os
Ap
ril 2
008
Expected Answer Types
• MEASURE:– How much does a 125 gallon fish tank
cost?– How tall is an African elephant?– How heavy is a Boeing 777?
Page 83
© J
oh
an B
os
Ap
ril 2
008
Expected Answer Types
• LOCATION:– Where does Angus Young of AC/DC live?– What city gives a Christmas tree to
Westminster every year as a gift?
Page 84
© J
oh
an B
os
Ap
ril 2
008
Expected Answer Types
• ORGANISATION:– Which company invented the
compact disk?– Who purchased Gilman Paper
Company?
Page 85
© J
oh
an B
os
Ap
ril 2
008
Using background knowledge
• Which rock singer …– singer is a hyponym of person, therefore
expected answer type is PERSON
• What is the population of …– population is a hyponym of number,
hence answer type NUMERAL
Page 86
© J
oh
an B
os
Ap
ril 2
008
Answer type tagging
Simple rule-based systems:Who … PERSONWhere … LOCATIONWhen … DATEHow many … NUMERAL
…often fail…– Who launched the iPod?– Where in the human body is the liver?– When is it time to go to bed?
Page 87
© J
oh
an B
os
Ap
ril 2
008
Complex taxonomies
• Simple ontologies cannot account for the large variety of questions
• An example of a more complex ontology is proposed by Li & Roth
• Pronto uses its own complex ontology
• Machine learning approaches are often used to automatically tag questions with answer types
Page 88
© J
oh
an B
os
Ap
ril 2
008
Taxonomy of Li & Roth (1/3)
• ENTITY – animal animals – body organs of body – color colors – creative inventions, books and other creative pieces – currency currency names – dis.med. diseases and medicine – event events – food food – instrument musical instrument – lang languages – letter letters like a-z – other other entities – plant plants
– product products – religion religions – sport sports – substance elements and substances – symbol symbols and signs – technique techniques and methods – term equivalent terms – vehicle vehicles – word words with a special property
Page 89
© J
oh
an B
os
Ap
ril 2
008
Taxonomy of Li & Roth (2/3)
• DESCRIPTION description and abstract concepts – definition definition of sth. – description description of sth. – manner manner of an action – reason reasons
• HUMAN human beings – group a group or organization of persons – ind an individual – title title of a person – description description of a person
• LOCATION locations – city cities – country countries – mountain mountains – other other locations – state states
Page 90
© J
oh
an B
os
Ap
ril 2
008
Taxonomy of Li & Roth (3/3)
• NUMERIC numeric values – code postcodes or other codes – count number of sth. – date dates – distance linear measures – money prices – order ranks – other other numbers – period the lasting time of sth. – percent fractions – speed speed – temp temperature – size size, area and volume – weight weight
• ABBREVIATION – abb abbreviation– exp expansion
Page 91
© J
oh
an B
os
Ap
ril 2
008
Pronto Answer Type Taxonomy
Page 92
© J
oh
an B
os
Ap
ril 2
008
Pronto Answer Type Taxonomy
Page 93
© J
oh
an B
os
Ap
ril 2
008
Answer typing: problems
• AmbiguitiesHow long distance or duration
• Vague Wh-wordsWhat do pinguins eat? What is the length of a football pitch?
• Taxonomy gapsWhich alien race featured in Star Trek?What is the cultural capital of Italy?
Page 94
© J
oh
an B
os
Ap
ril 2
008
Answer Cardinality
• How many distinct answers does a question have?
• Examples:– When did Louis Braille die?
1 answer– Who won a nobel prize in chemistry?
1 or more answers– What are the seven wonders of the world?
exactly 7 answers
Page 95
© J
oh
an B
os
Ap
ril 2
008
Class activity: answer typing
1. How many islands does Italy have?2. When did Inter win the Scudetto?3. What are the colours of the Lithuanian flag?4. Where is St. Andrews located?5. Why does oil float in water?6. How did Frank Zappa die?7. Name the Baltic countries.8. Which seabird was declared extinct in the 1840s?9. Who is Noam Chomsky?10. List names of Russian composers.11. Edison is the inventor of what?12. How far is the moon from the sun?13. What is the distance from New York to Boston?14. How many planets are there?15. What is the exchange rate of the Euro to the Dollar?16. What does SPQR stand for?17. What is the nickname of Totti?18. What does the Scottish word “bonnie” mean?19. Who wrote the song “Paranoid Android”?
Page 96
© J
oh
an B
os
Ap
ril 2
008
knowledge
parsing
boxing
query
answertyping
Indri
answerextraction
answerselection
answerreranking
question answerccg
drs WordNetNomLex
Indexed Documents
Lecture 3
Page 97
© J
oh
an B
os
Ap
ril 2
008
Question Answering (QA)
Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art
Lecture 2• Question Analysis• Background Knowledge• Answer Typing
Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking