Top Banner
Classification of unanswerable questions: the rhetoric of Twitter Clasificación de preguntas sin respuesta: la retórica de Twitter David Tomás Department of Software and Computing Systems University of Alicante, Spain [email protected] CERI 2012
23

Classification of unanswerable questions: the rhetoric of Twitter

Jan 18, 2017

Download

Technology

David Tomas
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Classification of unanswerable  questions: the rhetoric of Twitter

Classification of unanswerable

questions: the rhetoric of Twitter

Clasificación de preguntas sin respuesta:

la retórica de Twitter

questions: the rhetoric of Twitter

David TomásDepartment of Software and Computing Systems

University of Alicante, Spain

[email protected]

CERI 2012

Page 2: Classification of unanswerable  questions: the rhetoric of Twitter

This presentation is about…

Twitter

Questions that look like questions when in fact they are not

Corpus-based question classification

A preliminary evaluation

A lot of future work

Page 3: Classification of unanswerable  questions: the rhetoric of Twitter

Twitter

friend

mention @

hashtag #

url

follower

tweet hashtag #

140 characters

RT

Page 4: Classification of unanswerable  questions: the rhetoric of Twitter

Twitter

Page 5: Classification of unanswerable  questions: the rhetoric of Twitter

Twitter

A perfect way to spread information

Fast, fast, fast

Immediacy: many people is asking questions

Page 6: Classification of unanswerable  questions: the rhetoric of Twitter

Twitter

Page 7: Classification of unanswerable  questions: the rhetoric of Twitter

Proposal

Wouldn’t it be nice that someone come to

your aid when you need an answer?

New paradigm: systems going to the userNew paradigm: systems going to the user

First problem: who really needs an answer?

Page 8: Classification of unanswerable  questions: the rhetoric of Twitter

Proposal

Page 9: Classification of unanswerable  questions: the rhetoric of Twitter

Proposal

Question classification problem

Real questions vs. rhetorical questions

Supervised / corpus-based

Corpus + Features + Algorithms

Page 10: Classification of unanswerable  questions: the rhetoric of Twitter

Corpus

Real question: expects an answer, from the

mass or from an individual

Rhetorical question: all the others

what

who whom

whose

which

when where

whyhow

x 100 =

= 220 real + 680 rhetorical

Page 11: Classification of unanswerable  questions: the rhetoric of Twitter

Features

punctuation marks

? ! “

part-of-speech

named entity recognition

entities

WordNet

relations

friends

Twitter language

@ # links

words

interjections

part-of-speech

NN NP VWordNet

average length

% terms found

total terms found

sentiment analysis

polarity

friends

followers

friends/followers

Page 12: Classification of unanswerable  questions: the rhetoric of Twitter

Algorithm

Page 13: Classification of unanswerable  questions: the rhetoric of Twitter

Experiments and results

72

74

76

78

80

Accuracy

60

62

64

66

68

70

72

SVM NB IB1 RF

real + rhetorical

Page 14: Classification of unanswerable  questions: the rhetoric of Twitter

Experiments and results

72

74

76

78

80

Accuracy

Baseline

60

62

64

66

68

70

72

SVM NB IB1 RF

real + rhetorical

Page 15: Classification of unanswerable  questions: the rhetoric of Twitter

Experiments and results

0

10

20

30

40

50

60

70

80

90

Precision

0

SVM NB IB1 RF

real rhetorical

0

10

20

30

40

50

60

70

80

90

100

SVM NB IB1 RF

Recall

real rhetorical

Page 16: Classification of unanswerable  questions: the rhetoric of Twitter

Corpus (2nd attempt)

Unbalanced corpus bias classification

Problem: need for more real questions

Solution: #lazyweb

Page 17: Classification of unanswerable  questions: the rhetoric of Twitter

Corpus (2nd attempt)

Page 18: Classification of unanswerable  questions: the rhetoric of Twitter

Corpus (2nd attempt)

Balanced corpus of 1360 questions:

680 rhetorical

680 real (from a set of 2,800 #lazyweb)

Page 19: Classification of unanswerable  questions: the rhetoric of Twitter

Experiments and results

75

80

85

Accuracy

60

65

70

75

SVM NB IB1 RF

real + rhetorical balanced

Page 20: Classification of unanswerable  questions: the rhetoric of Twitter

Experiments and results

75

80

85

Accuracy

60

65

70

75

SVM NB IB1 RF

real + rhetorical balanced

Baseline50

Page 21: Classification of unanswerable  questions: the rhetoric of Twitter

Experiments and results

80

81

82

83

Accuracy (ablation study)

75

76

77

78

79

80

Punctuation Language Entities POS WordNet Polarity Relations

Selection All

Page 22: Classification of unanswerable  questions: the rhetoric of Twitter

Conclusions and future work

Just a first step

Room for improvement

Augment the corpusAugment the corpus

Truly analyze the rhetoric of Twitter

Integrate in a QA system

Page 23: Classification of unanswerable  questions: the rhetoric of Twitter

Thank you very much

35,679,2

CERI 2012