Top Banner
1 QA in Discussion Boards Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions 90% of 40 analyzed discussion boards contain questions and answers Online QA services could benefit (Yahoo! Answers, Answers.com, etc) But… finding questions and their answers is hard Post may not be in question format
16

1 QA in Discussion Boards Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions 90% of 40 analyzed.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

1

QA in Discussion Boards

Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions

90% of 40 analyzed discussion boards contain questions and answers

Online QA services could benefit (Yahoo! Answers, Answers.com, etc)

But… finding questions and their answers is hard Post may not be in question format Answers are provided asynchronously Messages in a single thread may response to different questions

Page 2: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

2

Research questions

Can we detect question threads in an efficient and effective manner?

What features should be used (content/non)?

Can we effectively discover answers without analyzing content of replied posts?

Who posts these answers and where do they appear?

Can this task be treated as a traditional IR problem suitable for relevance detection?

Page 3: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

3

Question Post from UbuntuForums.org There are a number of threads on

Firefox crashes, so it’s nothing new. I upgraded from U8.04 to U8.10, but it’s no better. Then I tried Seamonkey, and it worked fine for a couple of days. Now it too is crashing. I’m baffled. Anyone have any ideas what I can do?

Page 4: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

4

Method: Classification

Features for question classification Question mark 5 W1H words (who, what, when, where,

why) Total # of posts within 1 thread: long posts

problematic Authorship: a new poster is more likely a

questioner, vice versa for the answerer N-grams

Page 5: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

5

Answer detection features

Position of post: answer usually not near bottom

Authorship N-grams Stop words: an answer probably

contains less Query likelihood model score: tests

relevance to question

Page 6: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

6

Experiments Baseline: previously published system

(Cong) using syntactic patterns (Q) and query relevance (A)

Data: Photography on next (700K posts) Ubuntu Forum (555K posts)

Training data: manually labeled all first posts and answers from 2580 Ubuntu posts and 3962 photo posts

Balanced the data set (50% positive, 50% negative)

SVM classifier and 10-fold cross validation

Page 7: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

7

Page 8: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

8

Answer Detection

Relevance model did not do well Perhaps ranking difficult since all posts more

or less relevant to the question

N-gram does not outperform other features

Stop word similar performance to n-gram Simple heuristics (position, authorship)

best. Combination outperforms all others.

Page 9: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

9

Page 10: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

10

Page 11: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

11

Proposed Solution Paraphrase Templates

How did Mahatma Gandhi die? Mahatma Gandhi died <how> Mahatma Gandhi died of <what> Mahatma Gandhi died from <what> Mahatma Gandhi’s death from <what> Mahatma Gandhi drowned Mahatma Gandhi suffocated Mahatma Gandhi froze to death <who> killed Mahatma Gandhi <who> assassinated Mahatma Gandhi Mahatma Gandhi was killed

Page 12: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

12

Use

In IR to find documents more likely to contain answer

To rank sentences within documents that are returned by IR

Page 13: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

13

Reformulations Hand-built or manual generalizations of

automatically produced paraphrases Specify type of relation to original Number of reformulations: 1-30, ave: 3.24

Page 14: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

14

Other Reformulation Types Lexical:

Buy/sell: John sold the laptop to Mary = Mary bought the laptop from John

Syntactic: How deep is Crater Lake? Crater Lake has a depth of <what distance>

Inference

Reformulation Chains Where did Bill Gates go to college? Bill Gates was a student at <which college>? Bill Gates dropped out of <which college>? Bill Gates was a <which college> dropout? Text: Bill Gates was a Harvard dropout.

Page 15: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

15

But…. Their Web IR System Preserve quoted terms and quote the smallest

NPs: “What is the longest river in the United States?” -> “longest river”

and “United States”

Expand the query using Wordnet Synonyms “What is the length of the border between Ukraine and Russia?”

-> (“length” or “distance”) and (“border” or “surround”) and (“Ukraine” or “Ukrainia”) and (“Russia” or “Soviet Union”) and (“between” or “betwixt”)

Using Contex refomulations, add quoted reformulations of the question’s declarative form “What is an atom?” -> “is an atom”, “an atom is” “an atom is one of”

Page 16: 1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.

16

Evaluation