Top Banner
Question Answering (Open- Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary questions from text resources. WWW, encyclopedias, books, manuals, medical literature, scientific papers, etc. Another application: database queries Converting natural language questions into database queries was one of the earliest NLP applications! A scientific reason to do Q/A: the ability to answer questions about a story is the hallmark of understanding.
22

Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Dec 15, 2015

Download

Documents

Alivia Brundage
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Question Answering (Open-Domain)(modified lecture from E. Riloff’s webpage)

• Grand Challenge Problem for NLP: A program that can find the answer to arbitrary questions from text resources.

• WWW, encyclopedias, books, manuals, medical literature, scientific papers, etc.

• Another application: database queries

• Converting natural language questions into database queries was one of the earliest NLP applications!

• A scientific reason to do Q/A: the ability to answer questions about a story is the hallmark of understanding.

Page 2: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Multiple Document Question Answering • A multiple document Q/A task involves questions posed

against a collection of documents.

• The answer may appear in the collection multiple times, or may not appear at all! For this task, it doesn’t matter where the answer is found.

• Applications include WWW search engines, and searching text repositories such as news archives, medical literature, or scientific articles.

Page 3: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

TREC-9 Q/A Task

Sample questions:• How much folic acid should an expectant mother get daily?• Who invented the paper clip?• What university was Woodrow Wilson president of?• Where is Rider College located?• Name a film in which Jude law acted.• Where do lobsters like to live?

Number of Documents: 979,000Megabytes of Text: 3033Document Sources: AP, WSJ, Financial

Times, San Jose Mercury News,LA Times, RBIS

Number of Questions: 682Question Sources: Encarta log, Excite log

Page 4: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

TREC and ACQUAINT

• TREC-10: new questions from MSNSearch logs and AskJeeves, some of which have no answers, or require fusion across documents

• List questions: Name 32 countires Pope John Paul II has visited.

• “Dialogue” processing: Which museum in Florence was damaged by a major bomb explision in 1993? On what day did this happen?

• ACQUAINT : Advanced Question and Answering for INTelligence (e.g., beyond factoids, the Multiple Perspective Q-A work at Pitt)

Page 5: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Single Document Question Answering

• A single document Q/A task involves questions associated with one particular document.

• In most cases, the assumption is that the answer appears somewhere in the document and probably once.

• Applications involve searching an individual resource, such as a book, encyclopedia, or manual.

• Reading comprehension tests are also a form of single document question answering.

Page 6: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Reading Comprehension TestsMars Polar Lander- Where Are You?

(January 18, 2000) After more than a month of searching for a single sign from NASA’s Mars Polar Lander, mission controllers have lost hope of finding it. The Mars Polar Lander was on a mission to Mars to study its atmosphere and search for water, something that could help scientists determine whether life even existed on Mars. Polar Lander was to have touched down on December 3 for a 90-day mission. It was to land near Mars’ south pole. The lander was last heard from minutes before beginning its descent. The last effort to communicate with the three-legged lander ended with frustration at 8 a.m. Monday. “We didn’t see anything,” said Richard Cook, the spacecraft’s project manager at NASA’s Jest Propulsion laboratory. The failed mission to the Red Planet cost the American government more the $200 million dollars. Now, space agency scientists and engineers will try to find out what could have gone wrong. They do not want to make the same mistakes in the next mission.

– When did the mission controllers lost Hope of communication with the Lander? (Answer: 8AM, Monday Jan. 17)

– Who is the Polar Lander’s project manager? (Answer: Richard Cook)

– Where on Mars was the spacecraft supposed to touch down? (Answer: near Mars’ south pole)

– What was the mission of the Mars Polar Lander? (Answer: to study Mars’ atmosphere and search for water)

Page 7: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Judging Answers

There are several possible ways to present an answer:

Short Answer: the exact answer to the question

Answer Sentence: the sentence containing the answer.

Answer Passage: a passage containing the answer. (e.g., a paragraph)

Short answers are difficult to score automatically because many variations are often acceptable. Example:

Text: The 2002 Winter Olympics will be held in beautiful Salt Lake City, Utah. Q: Where will the 2002 winter Olympics be held?

A1: beautiful Salt Lake City, Utah A2: Salt Lake City, Utah A3: Salt Lake City A4: Salt Lake A5: Utah

Page 8: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Reciprocal Ranking Scheme

In a real Q/A application, it doesn’t make much sense to produce several possible answers. But for the purposes of evaluating computer models, several answer candidates are often ranked by confidence.

Reciprocal Ranking Scheme: the score for a question is 1/R, where R is a rank of the first correct answer in the list.

Q: What is the capital of Utah? A1: Ogden A2: Salt Lake City A3: Provo A4: St. George A5: Salt Lake

The score for the question Q would be ½.

Page 9: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Architecture of Typical Q/A Systems

Question Typing: input= question, output = entity type(s)

Document/Passage Retrieval: input=text(s), output=relevant texts

Named Entity Tagging: input=relevant texts, output=tagged text

Answer Identification: input=question, entity types(s), tagged text

Page 10: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Question Typing

Many common varieties of questions expect a specific type of answer. For example:

WHO: person, organization, or country.

WHERE: location (specific or general)

WHEN: date on time period

HOW MUCH: an amount

HOW MANY: a number

WHICH CITY: a city

Most Q/A systems use a question classifier to assign a type to each question. The question type constrains the set of possible answers. The classification rules are often developed by hand and are quite simple.

Page 11: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

A Question Type Hierarchy (excerpt)Default NP

Thingname

Title

Temporal

Time

Date

Definition

Agent

Organization

Person

Country

Location

Country

Page 12: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Document/Passage Retrieval• For some applications, the text collection that must be searched is very

large. For example, the TREC Q/A collection is about 3 GB!

• Applying NLP techniques to large text collections is too expensive to do in real-time. So, information retrieval (IR) engines identigy the most relevant texts, using the question words as key words.

• Document Retrieval Systems return the N documents that are most relevant to the question. Passage retrieval systems return the N passages that are most relevant to the question.

• Only the most relevant documents/passages are given to the remaining modules of the Q/A system. If the IR engine doesn’t retrieve text(s) containing the answer, the Q/A system is out of luck!

Page 13: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Named Entity Tagging

Named Entity (NE) Taggers recognize certain types of Named objects and other easily identifiable semantic classes. Common NE classes are:

People: Mr. Fripper; John Fripper; President Fripper

Locations: Salt Lake City; Massachusetts; France

Dates/Times: November; Monday; 5:10 pm

Companies: KVW Co.; KVW Inc.; KVW corporation

Measures: 500 dollars; 40 miles; 32 lbs

Page 14: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Sample Text

Consider this sentence:

President George Bush announced a new bill that would send $1.2 million dollars to Miami Florida for a new hurricane tracking system.

After applying a Named Entity Tagger, the text might look like this:

<Person=“President George Bush”> announced a new bill that would send <MONEY=$1.2 million dollars”> to

<LOCATION=“Miami Florida:”> for a new hurricane tracking system.

Page 15: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Rules for Name Entity Tagging

Most Named Entity Taggers use simple rules that are developed by hand. Most rules use the following types of clues:

Keywords: Ex. “Mr.”, “Corp.”, “city”

Common Lists: Ex. Cities, countries, months of the year, common first names, common last names

Special Symbols: Ex. Dollar signs, percent signs

Structured Phrases: Ex. Dates often appear as “MONTH, DAY #, YEAR”

Syntactic Patterns: (more rarely) Ex. “LOCATIONS_NP, LOCATION_NP” is usually a single location (e.g. Boston, Massachusetts).

Page 16: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Answer Identification

At this point, we’ve assigned a type to the question and we’ve tagged the text with Named Entities. So we can now narrow down the candidate pool to entities of the right type.

Problem: There are often many objects of the right type, even in a single text.

• The Answer Identification module is responsible for finding the best answer to the question.

• For questions that have Named Entity types, this module must figure out which item of the right type is correct.

• For questions that do not have Named Entity types, this module is essentially starting from scratch.

Page 17: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Word Overlap

The most common method of Answer Identification is to measure the amount of Word Overlap between the question and an answer candidate.

Basic Word Overlap: Each answer candidate is scored by counting how many question words are present in or near the candidate.

Stop Words: sometimes closed class words (often called Stop Words in IR) are not included in the word overlap measure.

Stemming: sometimes morphological analysis is used to compare only the root forms or words (e.g. “walk” and “walked” would match).

Weights: some words may be weighted more heavily than others (e.g., verbs might be given more weight than nouns).

Page 18: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

The State of the Art in Q/A

Most remedia reading comprehension results Answer Sentence Identification: around 40%

Best TREC-9 results (Mean Reciprocal Rank): 50-byte answers: MRR=0.58, no correct answer was found for 34% of questions

250-byte answers:

MRR=0.76, no correct answer was found for14% of questions

The best TREC Q/A system is more sophisticated Q/A model that uses syntactic dependency structures, semantic hierarchies, etc. But more intelligent Q/A models are still highly experimental.

Page 19: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Answer Confusability Experiments

• Manually annotated data for 165 TREC-9 questions and 186 CBC questions for perfect question typing, perfect answer sentence identification, and perfect semantic tagging.– Idea: An oracle gives you the correct question type, a sentence containing

the answer, and correctly tags all entities in the sentence that match the question type.

– Ex. The oracle tells you that the question expects a person, gives you a sentence containing the correct person, and tags all person entities in that sentence. The one thing the oracle does not tell you is which person is the correct one.

• Measured the “answer confusability”: the score that a Q/A system would get if it randomly selected an iem of the designed type from the answer sentence.

Page 20: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Example

Q1: When was Fred Smith born?

S1: Fred Smith lived from 1823 to 1897.

Q2: What city is Massachusetts General Hospital located in?

S2: It was conducted by a cooperative group of oncologists from Hoag, Massachusetts General Hospital in Boston, Dartmouth College in New Hampshire, UC San Diego Medical Center, McGill University in Montreal and the University of Missouri in Columbia.

Page 21: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Reading Comprehension TestsMars Polar Lander- Where Are You?

(January 18, 2000) After more than a month of searching for a single sign from NASA’s Mars Polar Lander, mission controllers have lost hope of finding it. The Mars Polar Lander was on a mission to Mars to study its atmosphere and search for water, something that could help scientists determine whether life even existed on Mars. Polar Lander was to have touched down on December 3 for a 90-day mission. It was to land near Mars’ south pole. The lander was last heard from minutes before beginning its descent. The last effort to communicate with the three-legged lander ended with frustration at 8 a.m. Monday. “We didn’t see anything,” said Richard Cook, the spacecraft’s project manager at NASA’s Jest Propulsion laboratory. The failed mission to the Red Planet cost the American government more the $200 million dollars. Now, space agency scientists and engineers will try to find out what could have gone wrong. They do not want to make the same mistakes in the next mission.

– When did the mission controllers lost Hope of communication with the Lander?

– Who is the Polar Lander’s project manager?

– Where on Mars was the spacecraft supposed to touch down?

– What was the mission of the Mars Polar Lander?

Page 22: Question Answering (Open-Domain) (modified lecture from E. Riloff’s webpage) Grand Challenge Problem for NLP: A program that can find the answer to arbitrary.

Why use reading comprehension tests?• The tests were designed to ask questions that would demonstrate

whether a child understands a story. So they are an objective way to evaluate the reading ability of computer programs.

• Questions and answer keys already exist!

• Tests are available for many grade levels, so we can challenge our Q/A computer programs with progressively harder questions.

• The grade level of an exam can give us some ideas of the “reading ability” of our computer programs (e.g. “it reads at a 2nd grade level”).

• Grade school exams typically ask factual questions that mimic real-world applications (as opposed to high school exams that often ask general inferential questions, e.g. “what is the topic of the story”).