Top Banner
Alethiometer: a framework for assessing trustworthiness and content validity in social media Eva Jaho , Efstratios Tzoannos, Aris Papadopoulos, Nikos Sarris
19

MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

Dec 14, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

Alethiometer: a framework for assessing trustworthiness and content validity in social

media

Eva Jaho, Efstratios Tzoannos, Aris Papadopoulos, Nikos Sarris

Page 2: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

MOTIVATION AND CHALLENGE

Big data

Volume Velocity Variety Veracity

Contributor Content Context

Value

5 Vs of Big Data

3 Cs of Veracity

Page 3: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

ALETHIOMETER FRAMEWORK

C

C

C

ontributor

ontent

ontext

3

Page 4: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

C1 CONTRIBUTOR

4

What can we find out about the source of information?

Page 5: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

5

Contributor modalities• Reputation

- Analyse comments in the course of time, discover sentiments and opinions towards a source.- Measured by the number of upvotes or likes.

• History- Information about activity on different social media platforms, combined with validity data.- Measured by the update frequency of valid posts.

• Popularity- Information about following source activity (readings, recommendations).- Measured by the number of friends/followers, and the number of responses.

Page 6: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

6

Contributor modalities

• Influence- Information about activities triggered by this source (re-posts, discussions or comments).- Measured by number of retweets/shares, Klout influence score.

• Presence- Information about type of source (individual, organisation,officially verified account, fake identity, etc.) and its presence on multiple social media platforms.- Measured by the number of accounts in different social media.

Page 7: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

C2 CONTENT

7

Does the posted content look reliable?

Page 8: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

8

• Reputation of linked web content- Measured in terms of domain reputation, page rank (GoogleRank or Alexa PageRank), or properties of the contributors to the content.

• Provenance- Finding the original occurrence of the content and its whole path across sources, places and time, and measuring the reputation of these sources.

• Popularity- Information about how many people are following this content.- Measured by the number of followers, and the number of responses.

Content modalities

Page 9: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

9

• Influence- Analyse if this content is triggering discussions or other actions in the social sphere.- Measured by number of retweets/shares.

• Originality- Check whether the content or parts thereof have been used in the past (e.g., reused text or images that have appeared in the past).

• Authenticity- Check whether the content has been changed with respect to its original state (e.g., changed text or attached multimedia content)

• Objectivity and Diversity- Measured by the variation of opinions found for people, content, or general entities.

Content modalities

Page 10: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

C3 CONTEXT

10

Does the 'what', 'when' and 'where’ stick together?

Page 11: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

11

• Cross-checking- Measured by the number of different reports or mentions about the same thing coming from independent sources

• Coherence- Measurement of text coherence (e.g., Coh-Metrix) and coherence between the content and tags, attached web-links, or attached multimedia.

• Proximity- Measurement of coherence between reference location/time andpublication location/time.

Context modalities

Page 12: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

12How to combine all these parameters?

Contributor

Content

Context

Page 13: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

13

Approach for rating of modality parameters

Rate parameters on 5-point discrete scale, from 0 to 4- [0, a0) → 0, [a0, a1) →1, [a1, a2) → 2, [a2, a3) → 3, [a3, ∞) → 4.- a0: 20th percentile, a1: 40th percentile, a2: 60th percentile, a3: 80th percentile (adjust the scale so it follows a uniform distribution).

Weight the rating of parameters for deriving a total score uniformly or based on their significance

Page 14: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

14

Are all these parameters necessary?

Page 15: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

15

Parameters studied

• Number of followers

• Number of tweets

• User account age

Sample: ~10 M tweets, 5 K users

Collection period: July-September 2013

Preliminary statistical results

Page 16: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

16

Empirical distributions

Heavy-tailed distributions

Multimodal heavy-tailed distributions with three different peaks(6.7 months, 23.3 months, 4.4 yrs)

Page 17: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

17

Correlation coefficients

• Friends - followers: 0.1222• Friends - tweets: 0.08• Followers - tweets: 0.0197

Conclusion:- all parameters relatively independent from one-another- need to be studied independently

Page 18: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

1818

• Summary • Defined Alethiometer: a framework taking into account all

aspects: Contributor, Content and Context

• Showed an approach for combining the ratings of all parameters

• Attested the relative independence of parameters and the need to consider a variety of measures (also previously emphasized in the literature)

• Future work• Investigate statistical properties of other modalities• Extract the significance of modalities • Study correlation between content, contributor and

context modalities

Summary and future work

Page 19: MOTIVATION AND CHALLENGE Big data Volume Velocity Variety Veracity Contributor Content Context Value 5 Vs of Big Data 3 Cs of Veracity.

find us at http://ilab.atc.gr follow us @iLabATC

Thank you

[email protected]

Questions & Answers