Top Banner
Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlae fer, William W. Cohen, and Eric Nyberg Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213 USA
30

Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

Automatic Set Expansion for List Question AnsweringRichard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg

Language Technologies InstituteCarnegie Mellon UniversityPittsburgh, PA 15213 USA

Page 2: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

2 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Task Automatically improve answers generated by Que

stion Answering systems for list questions, by using a Set Expansion system.

For example: Name cities that have Starbucks.

QA Answers Expanded AnswersBostonSeattle

Carnegie-MellonAquafinaGoogle

Logitech

SeattleBoston

ChicagoPittsburgh

Carnegie-MellonGoogle

Better!

Page 3: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

3 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Outline Introduction

Question Answering Set Expansion

Proposed Approach Aggressive Fetcher Lenient Extractor Hinted Expander

Experimental Results QA System: Ephyra Other QA Systems

Conclusion

Page 4: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

4 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Question Answering (QA) Question Answering task:

Retrieve answers to natural language questions Different question types:

Factoid questions List questions Definitional questions Opinion questions

Major QA evaluations: Text REtrieval Conference (TREC): English NTCIR: Japanese, Chinese CLEF: European languages

Page 5: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

5 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Typical QA Pipe

line

QuestionAnalysis

Query Generation& Search

CandidateGeneration

AnswerScoring

KnowledgeSources

Question String

Analyzed Question

Search Results

Candidate Answers

Scored Answers

The two original textsmileys were inventedon September 19, 1982by Scott E. Fahlman ...

• smileys• September 19, 1982• Scott E. Fahlman

Candidate Score

Scott E. Fahlman 0.853smileys 0.418September 19, 1982 0.239

“Who invented the smiley?”

Answer type: PersonKeywords: invented, smiley...

Page 6: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

6 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

QA System: Ephyra (Schlaefer et al., TREC 200

7) History:

Developed at University of Karlsruhe, Germany and Carnegie Mellon University, USA

TREC participations in 2006 (13th out of 27 teams) and 2007 (7th out of 21 teams)

Released into open source in 2008

Different candidate generators: Answer type classification Regular expression matching Semantic parsing

Available for download at: http://www.ephyra.info/

Page 7: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

7 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Outline Introduction

Question Answering Set Expansion

Proposed Approach Aggressive Fetcher Lenient Extractor Hinted Expander

Experimental Results QA System: Ephyra Other QA Systems

Conclusion

Page 8: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

8 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Set Expansion (SE) For example,

Given a query: {“survivor”, “amazing race”} Answer is: {“american idol”, “big brother”, ....}

More formally, Given a small number of seeds: x1, x2, …, xk wh

ere each xi St Answer is a listing of other probable elements: e1, e2, …, en where each ei St

A well-known example of a web-based set expansion system is Google Sets™ http://labs.google.com/sets

Page 9: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

9 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

SE System: SEAL (Wang & Cohen, ICDM 2007)

Features Independent of human/markup language

Support seeds in English, Chinese, Japanese, Korean, ... Accept documents in HTML, XML, SGML, TeX, WikiML, …

Does not require pre-annotated training data Utilize readily-available corpus: World Wide Web

Based on two research contributions Automatically construct wrappers for extracting candi

date items Rank extracted items using random graph walk

Try it out for yourself: http://rcwang.com/seal

Page 10: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

10 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

SEAL’s SE Pipeline

Fetcher: downloads web pages from the Web Extractor: learns wrappers from web pages Ranker: ranks entities extracted by wrappers

CanonNikonOlympus

PentaxSonyKodakMinoltaPanasonicCasioLeicaFujiSamsung…

Page 11: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

11 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Challenge SE systems require relevant (non-noisy) s

eeds, but answers produced by QA systems are often noisy.

How can we integrate those two systems together?We propose three extensions to SEAL

Aggressive Fetcher Lenient Extractor Hinted Expander

Page 12: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

12 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Outline Introduction

Question Answering Set Expansion

Proposed Approach Aggressive Fetcher Lenient Extractor Hinted Expander

Experimental Results QA System: Ephyra Other QA Systems

Conclusion

Page 13: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

13 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Original Fetcher

Procedure:1. Compose a search query by concatenating all seeds

2. Use Google to request top 100 web pages

3. Fetch web pages and send to the Extractor

Seeds

BostonSeattle

Carnegie-Mellon

Query

Boston Seattle Carnegie-Mellon

Page 14: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

14 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Proposed Fetcher Aggressive Fetcher (AF)

Sends a two-seed query for every possible pair of seeds to the search engines

More likely to compose queries containing only relevant seeds

Seeds

BostonSeattle

Carnegie-Mellon

Queries

Boston SeattleBoston Carnegie-MellonSeattle Carnegie-Mellon

Page 15: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

15 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Outline Introduction

Question Answering Set Expansion

Proposed Approach Aggressive Fetcher Lenient Extractor Hinted Expander

Experimental Results QA System: Ephyra Other QA Systems

Conclusion

Page 16: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

16 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Original Extractor A wrapper is a pair of L and R context string

Maximally-long contextual strings that bracket at least one instance of every seed

Extracts strings between L and R

Learn wrappers from web pages and seeds on the fly Utilize semi-structured documents Wrappers defined at character level

No tokenization required (language-independent) However, very page specific (page-dependent)

Page 17: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

17 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

<img src="/common/logos/honda/logo-horiz-rgb-lg-dkbg.gif" alt="4"></a> <ul><li><a href="http://www.curryhonda-ga.com/"> <span class="dName">Curry Honda Atlanta</span>...</li> <li><a href="http://www.curryhondamass.com/"> <span class="dName">Curry Honda</span>...</li> <li class="last"><a href="http://www.curryhondany.com/"> <span class="dName">Curry Honda Yorktown</span>...</li></ul> </li>

<li class="honda"><a href="http://www.curryauto.com/">

<li class="acura"><a href="http://www.curryauto.com/">

<li class="toyota"><a href="http://www.curryauto.com/">

<li class="nissan"><a href="http://www.curryauto.com/">

<li class="ford"><a href="http://www.curryauto.com/"> <img src="/common/logos/ford/logo-horiz-rgb-lg-dkbg.gif" alt="3"></a> <ul><li class="last"><a href="http://www.curryauto.com/"> <span class="dName">Curry Ford</span>...</li></ul> </li>

<img src="/curryautogroup/images/logo-horiz-rgb-lg-dkbg.gif" alt="5"></a> <ul><li class="last"><a href="http://www.curryacura.com/"> <span class="dName">Curry Acura</span>...</li></ul> </li>

<img src="/common/logos/toyota/logo-horiz-rgb-lg-dkbg.gif" alt="7"></a> <ul><li class="last"><a href="http://www.geisauto.com/toyota/"> <span class="dName">Curry Toyota</span>...</li></ul> </li>

<img src="/common/logos/nissan/logo-horiz-rgb-lg-dkbg.gif" alt="6"></a> <ul><li class="last"><a href="http://www.geisauto.com/"> <span class="dName">Curry Nissan</span>...</li></ul> </li>

Page 18: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

18 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Proposed Extractor Lenient Extractor (LE)

Maximally-long contextual strings that bracket at least one instance of a minimum of two seeds

More likely to find useful contexts that bracket only relevant seeds

Text

... in Boston City Hall ...

... in Seattle City Hall ...

... at Boston University ...

... at Seattle University ...

... at Carnegie-Mellon University ...

Learned Wrapper (w/o LE)

at <blah> University

Learned Wrappers (w/ LE)

at <blah> University

in <blah> City Hall

Page 19: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

19 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Outline Introduction

Question Answering Set Expansion

Proposed Approach Aggressive Fetcher Lenient Extractor Hinted Expander

Experimental Results QA System: Ephyra Other QA Systems

Conclusion

Page 20: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

20 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Hinted Expander (HE)

Utilizes contexts in the question to constrain SEAL’s search space on the Web Extract up to three keywords from the question using

Ephyra’s keyword extractor Append the keywords to the search query

Example: Name cities that have Starbucks.

More likely to find documents containing desired set of answers

Page 21: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

21 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Outline Introduction

Question Answering Set Expansion

Proposed Approach Aggressive Fetcher Lenient Extractor Hinted Expander

Experimental Results QA System: Ephyra Other QA Systems

Conclusion

Page 22: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

22 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Experiment #1: Ephyra Evaluate on TREC 13, 14, and 15 datasets

55, 93, and 89 list questions respectively

Use SEAL to expand top four answers from Ephyra Outputs a list of answers ranked by confidence scores

For each dataset, we report: Mean Average Precision (MAP)

Mean of average precision for each ranked list

Average F1 with Optimal Per-Question Threshold For each question, cut off the list at a threshold which maximizes

the F1 score for that particular question

Page 23: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

23 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Experiment #1: EphyraMean Average Precision

6%

10%

14%

18%

22%

26%

30%

34%

Trec 13 Trec 14 Trec 15

TREC Dataset

Mea

n A

vg. P

reci

sio

n (%

)

Ephyra

Ephyra's Top 4

SEAL

SEAL+LE

SEAL+LE+AF

SEAL+LE+AF+HE

F1 with Optimal Per-Question Threshold

12%

16%

20%

24%

28%

32%

36%

40%

Trec 13 Trec 14 Trec 15

TREC Dataset

Av

g. O

pti

ma

l F1

(%

)

Ephyra

Ephyra's Top 4

SEAL

SEAL+LE

SEAL+LE+AF

SEAL+LE+AF+HE

Page 24: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

24 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Experiment #2: Ephyra

In practice, thresholds are unknown For each dataset, do 5-fold cross validation:

Train: Find one optimal threshold for four folds Test: Use the threshold to evaluate the fifth fold

Introduce a fourth dataset: All Union of TREC 13, 14, and 15

Introduce another system: Hybrid Intersection of original answers from Ephyra and expand

ed answers from SEAL

Page 25: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

25 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Experiment #2: EphyraF1 with Trained Threshold

12%

14%

16%

18%

20%

22%

24%

26%

28%

30%

32%

Trec 13 Trec 14 Trec 15 All

TREC Dataset

Av

g. F

1 (

%)

Ephyra

SEAL+LE+AF+HE

Hybrid

Page 26: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

26 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Outline Introduction

Question Answering Set Expansion

Proposed Approach Aggressive Fetcher Lenient Extractor Hinted Expander

Experimental Results QA System: Ephyra Other QA Systems

Conclusion

Page 27: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

27 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Experiment: Other QA Systems Top five QA systems that perform the best on li

st questions in TREC 15 evaluation1. Language Computer Corporation (lccPA06)

2. The Chinese University of Hong Kong (cuhkqaepisto)

3. National University of Singapore (NUSCHUAQA1)

4. Fudan University (FDUQAT15A)

5. National Security Agency (QACTIS06C)

For each QA system, train thresholds for SEAL and Hybrid on the union of TREC 13 and 14 Expand top four answers from the QA systems on T

REC 15, and apply the trained threshold

Page 28: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

28 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Experiment: Top QA Systems

30%

32%

34%

36%

38%

40%

42%

44%

46%

lccPA06

Av

era

ge

F1

(%

)

F1 with Trained Threshold

12%

13%

14%

15%

16%

17%

18%

19%

20%

21%

22%

cuhkqaepisto NUSCHUAQA1 FDUQAT15A QACTIS06C

TREC Dataset

Baseline

Top 4 Ans.

Google Sets

SEAL+LE+AF+HE

Hybrid

Page 29: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

29 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Conclusion

A feasible method for integrating a SE approach into any QA system

Proposed SE approach is effective Improves QA systems on list questions by usi

ng only a few top answers as seeds Proposed hybrid system is effective

Improves Ephyra and (most) top five QA systems

Page 30: Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.

30 / 30Language Technologies Institute, Carnegie Mellon University

Set Expansion for List Question AnsweringRichard C. Wang

Thank You!