Discovering Key Moments from Social Media Streams

Post on 26-Jan-2017

310 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

Transcript

Cody Buntaincbuntain@cs.umd.eduHuman-Computer Interaction LabUniversity of Maryland

Jimmy Linjimmylin@uwaterloo.caUniversity of Waterloo

Jennifer Golbeckgolbeck@cs.umd.eduUniversity of Maryland

CCNC’1611 January 2016

Las Vegas, NV

Discovering Key Moments in Social Media Streams

1

2Introduction

3Introduction

Most event detection

systems track human-

generated, seed keywords

4

Tweets per second mentioning “gol copa, gool copa, goool, golaço” during the match June 12th, 2014 [1]

Tweets per hour related to earthquakes [2]

Introduction

Step 1Identify Keywords

5

goal, score

Step 2Find Bursts

Typical Approach

Introduction

Weaknesses

6

goal == gooooal?

Introduction

Can we identify interesting moments without seed tokens?

7Introduction

Can we identify interesting moments without seed tokens?

8Introduction

Can we identify interesting moments without seed tokens?

9Introduction

Step 1Identify Keywords

10

goal, score

Step 2

LABurst Algorithm

Find Bursts

goooal, 0-1, 0:1,1-0, gollll, holandaaaa, penal, penalti, persie

Introduction

LABurst Algorithm

Discover Unanticipated Moments

11

suarez, bit,

biting

Identify Keywords

Introduction

12Methods

13

193 Key Moments

Methods

14

Can we transfer these sports-trained

models to more impactful domains?

Methods

15

Event Tweet Count Training Data 2010 NFL Division Championship 109,8092012 Premier League Soccer Games 1,064,0402014 NHL Stanley Cup Playoffs 2,421,0652014 NBA Playoffs 500,1702014 Kentucky Derby Horse Race 233,1722014 Belmont Stakes Horse Race 226,1602014 FIFA World Cup Stages A+B 5,867,783Testing Data 2013 MLB World Series Game 5 1,052,8522013 MLB World Series Game 6 1,026,8482013 Honshu Earthquake 444,0182014 NFL Super Bowl 1,024,3672014 FIFA World Cup Third Place 809,4262014 FIFA World Cup Final 1,166,7672014 Iwaki Earthquake 358,966

Total 16,305,443

Methods

LABurst learns

bursts from sporting event

data

16Methods

How do we model these

bursts?

17

Extract Tokens

Methods

How do we model these

bursts?

18Methods

19

Token Feature Vector v

How do we model these

bursts?

Freq. Regression

ΔAverage Freq.

Inter-Arrival TimeMessage EntropyNetwork Density

TF-IDFTF-PDF1

BursT2

Methods

20

Token Feature Vector v

SVM Random Forests

Ensemble

Bursty or Not?

BurstyClassifier

Methods

The more tokens that experience bursts in a

given minute, the more

important the moment

21

Key moment!

Methods

We evaluate LABurst by

comparing it against two

baseline methods

22Evaluation

Baseline 1 RawBurst

23

Find “bursts” in Twitter’s raw message frequency

Current Freq – Avg Freq ⩼ Threshold

? > threshold: KEY MOMENT!Evaluation

Baseline 2 TokenBurst

24

Modify RawBurst to use frequency of pre-specified

seed tokens

Current Freq – Avg Freq ⩼ Threshold

Sport Seed Tokens

World Series run, home, homerun

Super Bowl score, touchdown, td, fieldgoal, points

World Cup goal, gol, golazo, score, foul, penalty, card, red, yellow, points

Evaluation

25

Compare using ROC-

AUC

LABurst ThresholdNumber of tokens

experiencing a burst in this minute

Baseline ThresholdsDifference between

current frequency and average frequency

Evaluation

How well does our method perform?

26

10-Fold Cross Validation

Best scoring LABurst ensemble classifier:

ROC-AUC of 89.84% for training data

Results

Which features are the most important?

27

Feature Sets ROC-AUC Difference

AdaBoost, All Features 89.84% –

Without Regression 87.79% -2.05

Without Entropy 87.94% -1.9

Without TF-IDF 88.85% -0.99

Without TF-PDF 89.00% -0.84

Without Density 89.07% -0.77

Without InterArrival 89.46% -0.38

Without BursT 89.52% -0.31

Without Average

Difference 90.56% 0.72

Results

How well does our method perform?

28Results

How well does our method perform?

29Results

How well does our method perform?

30Results

How well does our method perform?

31Results

Composite ROC-AUC

32

Competitive without seed keywords or

prior domain knowledge

Results

Why is the Super Bowl

hard?

33

Training/Testing Data:

Other Impactful Moments:

Discussion

What was bursting at

these moments?

34

Match Event Bursty Tokens

Brazil v. Netherlands, 12 July

2014

Netherlands' Van Persie scores a goal on a penalty at 3',

1-0

0-1, 1-0, 1:0, 1x0, card, goaaaaaaal, goal, gol, goool,

holandaaaa, kırmızı, pen, penal, penalti, pênalti, persie, red

Brazil v. Netherlands, 12 July

2014

Brazil's Oscar gets a yellow card at 68'

dive, juiz, penalty, ref

Germany v. Argentina, 13 July

2014

Germany’s Götze scores a goal at

113’, 1-0

goaaaaallllllll, goalllll, godammit,

goetze, gollllll, gooooool, gotze, gotzeeee, götze,

nooo, yessss,

Discussion

What other moments did

LABurst discover?

35

LABurst vs. TokenBurst at World Cup Final

Discussion

What other moments did

LABurst discover?

36

LABurst vs. TokenBurst at World Cup Final

Moment: "puyol", "gisele", and "bundchen"

Discussion

What other moments did

LABurst discover?

37

LABurst vs. Baseline at World Cup Final

Moment: "pipita", "higuaín", "", “pipa”, “choke”

Discussion

Can these models be

useful in other domains?

38

Earthquake Detection

Honshu, Japan Earthquake - 25 October 2013

Iwaki, Japan Earthquake - 11 July 2014

Simultaneously detects spikes

about the earthquake

Also detects an aftershock

Discussion

Can discover key moments from Twitter streams without seed tokens

39Conclusions

Can discover key moments from Twitter streams without seed tokens

40Conclusions

Can discover key moments from Twitter streams without seed tokens

41Conclusions

Can discover key moments from Twitter streams

without seed tokens

42Conclusions

Cody Buntaincbuntain@cs.umd.edu@codybuntainHuman-Computer Interaction LabUniversity of Maryland

Thank you! Questions?

43

Discovering Key Moments in Social Media Streams

Backup Slides

44

How do we train these classifiers?

45

Examples of Bursty Tokens:

saints peterson

7-0 1-0

touchdown score

goalpenaltytd

fumble

persie messi

tonalist

Examples of Non-Bursty Tokens:

??

the, i, me, my, myself, we, our, ours, ourselves, you, before,

after, above, below, to, from, up, down, in,

out, on

Stop Words

top related