Techniques Used in Modern Techniques Used in Modern Question-Answering Systems Question-Answering Systems Candidacy Exam Elena Filatova December 11, 2002 Committee Luis Gravano Columbia University Vasileios Hatzivassiloglou Department of Computer Science Rebecca J. Passonneau
21
Embed
Techniques Used in Modern Question-Answering Systems
Techniques Used in Modern Question-Answering Systems. Candidacy Exam Elena Filatova December 11, 2002 Committee Luis GravanoColumbia University Vasileios Hatzivassiloglou Department of Computer Science Rebecca J. Passonneau. Present vs Past Research on QA. Current systems - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Techniques Used in Modern Techniques Used in Modern
Question-Answering SystemsQuestion-Answering Systems
Candidacy Exam
Elena FilatovaDecember 11, 2002
CommitteeLuis Gravano Columbia UniversityVasileios Hatzivassiloglou Department of Computer ScienceRebecca J. Passonneau
Present Present vsvs Past Research on QAPast Research on QA
Current systems– Mainly systems written for TREC conference
• factoid questions• short answers• huge text collections
Related systems– IR
• queries vs questions• return documents vs short answers
– Systems based on semantic representations (Lehnert): • questions about one text vs text collections • inference from semantic structure of a text vs searching for
an answer in the text– One type of output (NP) from a closed collection (Kupiec)
answer inference vs answer extraction
Lehner’t systemLehner’t system
John loved Mary but she didn’t want to marry him. One day, adragon stole Mary from the castle. John got on top of hishorse and killed the dragon. Mary agreed to marry him. Theylived happily ever after.
Q: Why did Mary agree to marry John? A: Because she was indebted to him
Problems stated:– right classification– dependency of answer inference procedure on the type of the
question
Current QA SystemsCurrent QA Systems
questionanalysis
question queryextracteddocuments
rules foranswer
list of
answers
InformationInformation ExtractionExtraction
•right query
•long text
•domain dependency
•predefined types of answers
Information Information RetrievalRetrieval
PlanPlan
• Classification
• Information (document) retrieval– Query formation
• Information extraction– Passage extraction
– Answer extraction
• Usage of answer redundancy on Web in QA
• QA for restricted domain
• Evaluation procedure for current QA systems and analysis of the performance
Classification and QAClassification and QA
questionanalysis
question queryextracteddocuments
rules foranswer
list of
answers
Theory of ClassificationTheory of Classification
Rosch et al: classification of basic objects
World is structured: real-world attributes do not occur independently of each other:
Each category (class) – set of attributes that are common for all the objects in the category
Types of categories• Superordinate – small amount of common attributes (furniture)• Subordinate – a lot of common attributes (floor lamp, desk lamp)• Basic – optimal amount of common attributes (lamp): basic
objects are the most inclusive categories which delineate the correlation structure of the environment
Though classification is a converging problem for objects, it is not possible to compile a list of all possible basic categories.
etc.• Knowledge of the domain (IBM’s system)• Statistical methods for connecting question and
answer spaces:– Agichtein: automatic acquisition of patterns that might be
good candidates for query expansion4 ‘types ‘ of question
– Berger: to facilitate query modification (expansion) each question term gets a set of answer terms
FQA: closed set of question-answer pairs
Information retrievalInformation retrieval
• Classical IR is the first step of QA• Vector-space model (calculation of similarity between terms
in the query and terms in the document)• IR techniques used in current QA systems are usually for
one database (either web or TREC collection)• Is it possible to apply Distributed IR techniques?
– domain restricted QA with extra knowledge about the text collection IBM system
– “splitting” one big collection of documents into smaller collections about specific topics
– it might require change in classification: type of the question might cause the changes in query formulation, document extraction process, answer extraction process
questionanalysis
question queryextracteddocuments
rules foranswer
list of
answers
InformationInformation ExtractionExtraction
Information Information RetrievalRetrieval
Passage extractionPassage extraction
• Passages of particular length (Cardie) + Vector representation for each passage
• Paragraphs or sentences
• Classical text excerpting– Each sentence is assigned a score– Retrieved passages are formed by taking the sentences with
the highest score
• Global-Local Processing (Salton)
• McCallum: passage extraction based not only on words but also on other features (e.g. syntactic constructions)
Information ExtractionInformation Extraction
• Domain dependency (Grishman)predefined set of attributes for the search specific for eachtopic, e.g. terrorism: victims, locations, perpetrators
• usually a lot of manually tagged data for training or
• texts divided into two groups: one topic – all other texts (Riloff)
in both cases division into topics is anecessary step which is not applicable to open domain
QA systems
What information can be extracted (IE)What information can be extracted (IE)
• Named entities (NE-tagging)– Numbers (incl. dates, ZIP codes, etc.)– Proper names (locations, people, etc.)– Other depending on the systemTREC8 – 80% questions asked for NEs