Top Banner
Arkaitz Zubiaga University of Warwick Maria Liakata 1 , Rob Procter 1 , Kalina Bontcheva 2 , Peter Tolmie 1 1 University of Warwick, UK 2 University of Sheffield, UK Crowdsourcing the Annotation of Rumourous Conversations in Social Media
33

Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Jul 21, 2015

Download

Data & Analytics

azubiaga
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Arkaitz ZubiagaUniversity of Warwick

Maria Liakata1, Rob Procter1, Kalina Bontcheva2, Peter Tolmie1

1 University of Warwick, UK2 University of Sheffield, UK

Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Page 2: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Motivation

Page 3: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Motivation

Page 4: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Motivation

lHowever, the stream of tweets tends to be riddled with (occasionally false) rumours.

Page 5: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Motivation

Page 6: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Motivation

FALSE

Page 7: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Motivation

Page 8: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Motivation

FALSE

Page 9: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Motivation

Page 10: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

MotivationSupport

Support

Support

Denial

Denial

?

?

Page 11: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Objectives

l Scenario where a journalist is tracking a breaking news story.

l Identify rumours, distinguishing them from non-rumours.

l Study the conversational aspects of rumours, towards determining their veracity.

Page 12: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Objectives

Study conversational aspects of rumours.l

1) Build a dataset with diverse sets of rumourous stories.

2) Annotate linguistic and interaction patterns within rumours to enable automated analysis.

3) Analyse these patterns and use machine learning techniques to determine the veracity of rumours.

Page 13: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Related Work

Previous work on rumour detection in social media [Qazvinian et al. 2011, Procter et al. 2013, Castillo et al. 2013, Starbird er al. 2014]

● Rumours known a priori, keyword search, e.g., “sandy sharks” or “london eye fire”.

● Looking at tweets individually, no interactions captured.

Page 14: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Our Approach

Our approach:

● Identify diverse set of rumours, can be unknown a priori, e.g.,

follow #hurricanesandy, and see what comes up.

● Annotate conversational aspects (wrt veracity), capturing interaction between tweets.

Page 15: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Creating a corpus of rumourous conversations

Steps:

● Formal definition of rumour.

● Annotation of rumours and non-rumours.

● Annotation of conversational aspects.

Page 16: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Definition of rumour

Putting together OED and previous research on rumours:

Rumour: “a circulating story of questionable veracity, which is apparently credible but hard to verify, and produces sufficient skepticism and/or anxiety.”

Page 17: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Data Collection

• Track event on streaming API, e.g. #ferguson.

• Data sampling: According to definition of rumours, sample tweets that spark high number of Rts.

• Conversations: Collecting associated conversations.

Page 18: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Example rumour and conversation thread

Page 19: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Rumour annotation tool (is_rumour? + categorisation + true/false/unverified)

Page 20: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Annotation (rumour vs non-rumour): Results

Annotations:

● Ferguson:291 / 1,185 rumours (24.6%) – 42 stories.

● Ottawa shootings:475 / 901 rumours – 51 stories.

● Essien contracted Ebola:18 / 18 rumours (100%) – 1 story.

Page 21: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Annotation scheme: conversational aspects of rumours Designed annotation scheme to:•

l Capture sequential features of conversation thread.l Analyse effect of interaction at a given point.l Break down annotation into tweet triples (or less).

Page 22: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Annotation scheme: conversational aspects of rumours

Page 23: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Crowdsourcing the annotation of tweets

Used CrowdFlower for crowdsourcing, 5-10 annotators for tweet-feature pair.•

• All data also annotated by two of us, as a reference.

Page 24: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Example micro-task

Page 25: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Crowdsourcing task results

• Annotation of 216 tweets in 8 threadsl 3-4 features per tweet: 4,974 units.

• 98 different annotators.• Final set of annotations obtained through majority voting.

Page 26: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Crowdsourcing task agreement

CS REF

CS 60.2% 68.84%

REF 78.57%

CS: crowdsourced annotations.l

l REF: reference annotations.

Page 27: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Evaluation against reference annotations

• Disagreement occurs with Certainty.

Page 28: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Annotation scheme: conversational aspects of rumours

(+) Underspecified

Page 29: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Distribution of annotations

Skewed distribution of annotations:ll 66.5% of replies are comments.

l

l 79.8% of replies provide no evidence.l

l 84% of comments provide no evidence.

Page 30: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Annotation scheme: conversational aspects of rumours

(+) Underspecified

l Comments should not be annotated for certainty and evidentiality (they're not adding anything to veracity anyway)

Page 31: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Conclusion

● Described novel method to collect and annotate rumorous conversations from Twitter.

● Introduced annotation scheme for annotation of conversation threads.l Annotations looking at tweet triples.l Differentiating source tweets and replies.

Scheme iteratively revised and validated through crowdsourcing.

Page 32: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Future Work

● With validated annotation scheme, perform larger scale crowdsourced annotation of conversations.

● Annotation of a wider variety of events, e.g., Charlie Hebdo shooting, Germanwings plane crash, etc.

● Development of Machine Learning tools:l Rumour identification and veracity assessment.l Tweet classification: supporting/denying, providing evidence, etc.

Page 33: Crowdsourcing the Annotation of Rumourous Conversations in Social Media

Questions?

Thank you for listening!

http://www.pheme.eu/