Top Banner
Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams* Authors: Presenter: Miray Kas Sebastian Alfers - HTW Berlin Bongwon Suh 1 Semantic Modeling * http://link.springer.com/chapter/10.1007%2F978-3-319-02993-1_9
47

Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Jul 14, 2015

Download

Software

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Computational Framework for Generating Visual Summaries of

Topical Clusters in Twitter Streams*

Authors: Presenter: !Miray Kas Sebastian Alfers - HTW Berlin Bongwon Suh

1

Semantic Modeling

* http://link.springer.com/chapter/10.1007%2F978-3-319-02993-1_9

Page 2: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Visual Summaries of Twitter Streams

2

http://flowingdata.com/wp-content/uploads/2010/02/treemap-revised1.gif

http://www.infobarrel.com/media/image/54054.jpg

Page 3: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 1:get &

pre-process Data

construct graph & clustering

extract keywords & summarize

Keywords

Stream Tweets

Preprocessing/ Cleaning

Construct GraphClustering

Select Relevant Clusters Extract Topical

Keywords

Visual Cluster Summary

Step 2:

Step 3:

3

Page 4: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Input: Keywords• initial set of Keywords

• similar to Twitter Search

4

Page 5: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Input: Keywords• initial set of Keywords

• similar to Twitter Search

5

Page 6: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 1: Stream Tweets• HTTP base API

- JSON, REST

6

Page 7: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

7

• OAuth + HTTP

• here: java library with scala and play!framework

Page 8: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 1: Preprocessing• transform Tweets

- easy-to-analyze / clan format

• Process of cleaning: 1. lowercase 2. remove urls, user mentions and stop words

• like @user, „a“ or „123“ 3. remove special characters (#,.)

8

Page 9: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 1: Preprocessing• Example Keywords:

- SCALA - Scala - scala - #scala

• Ling Pipe Library* - remove tense and plurals

9

} scala

*http://alias-i.com/lingpipe/

Page 10: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 1: Preprocessing• Example Tweets

10

new york time reactive

programming tool scala scale

techrepublic

akka-http based reactive stream scala scaladay

Page 11: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 1: Preprocessing• Example Tweets

11

new york time reactive

programming tool scala scale

techrepublic

akka-http based reactive stream scala scaladay

Page 12: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Graph• Word Co-Occurrence Graph

- Word = Node (Unigrams) - Tweet = Link between Nodes

• Example

12 *http://alias-i.com/lingpipe/

akka-http based reactivestream scala scaladay

Page 13: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Graph• Word Co-Occurrence Graph

- Word = Node (Unigrams) - Tweet = Link between Nodes

• Example

13 *http://alias-i.com/lingpipe/

akka-http based reactivestream scala scaladay

Page 14: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Graph• Word Co-Occurrence Graph

- Word = Node (Unigrams) - Tweet = Link between Nodes

• Example

14 *http://alias-i.com/lingpipe/

akka-http

basedreactivestream

scalascaladay

Page 15: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Graph• Word Co-Occurrence Graph

- Word = Node (Unigrams) - Tweet = Link between Nodes

• Example

15 *http://alias-i.com/lingpipe/

akka-http

basedreactivestream

scalascaladay

NodesNodes

NodesLinks

Page 16: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Graph• Word Co-Occurrence Graph

- Word = Node (Unigrams) - Tweet = Link between Nodes

• Example

16 *http://alias-i.com/lingpipe/

akka-http

basedreactivestream

scalascaladay

Page 17: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

17

Page 18: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

18

Page 19: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Graph• Co-Occurrence Graph

- connect nodes (words) within and between tweets

- add strength (weight) and cost (distance)

• More frequently words - increase the strength - decrease cost

19

Page 20: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Graph• Summary

reactive

scala

+

=

based

stream

programming

uses

Page 21: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering• Here: „complete link (max) clustering“ algorithm

- hierarchical clustering algorithm that forms clusters by merging subgroups

• Group Words from Tweets - frequently appear on topic - cluster = topic

* http://nlp.stanford.edu/IR-book/html/htmledition/single-link-and-complete-link-clustering-1.html

Page 22: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering• Here: „complete link (max) clustering“ algorithm

• each node starts as individual cluster

!

• close clusters are successively merged together - close = highest cost within clusters

Clusters = Nodes = Words in tweet

22

Page 23: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering

reactive

scalabased

stream

reactive

scalabased

stream

23

cost = distance = 0.5

cost = distance = 1

1

1

Graph Representation Cluster Representation

Page 24: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering

24

Page 25: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering

distance = 0.5

25

Page 26: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering

distance = 0.5

distance = 1

distance = 1

26

Page 27: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering

distance = 0.5

distance = 1

distance = 1

271

1

Page 28: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering

distance = 0.5

distance = 1

distance = 1

28

distance = 2

1

1

Page 29: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering

29

Page 30: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering• Final step: Dendrogram

- tree diagram - represents the arrangement of hierarchical clusters

• why? - easy to apply thresholds metics

30

Page 31: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering• Final step: Dendrogram

- closer to the root = lower similarity

31

root

reactive scalafirst cluster

Page 32: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering• Final step: Dendrogram

- closer to the root = lower similarity

32

root

reactive scala

new york programming … akka-http based stream scaladay

Page 33: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 2: Clustering• Final step: Dendrogram

- closer to the root = lower similarity

33

root

reactive scala

new york programming … akka-http based stream scaladay

thresholds

Page 34: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

34

Page 35: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 3: Extract topical keywords

35

Preprocessing/ Cleaning

Construct Graph

Extract Topical Keywords

Page 36: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 3: Extract topical keywords• keywords

- express a topic - frequently used - summarize tweets content

• Questions - „What are the relevant keywords?“ - „In what clusters do they appear?“

36

Page 37: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 3: Extract topical keywords• How?

- „topical tweets“ vs. „general tweets“

• frequently in topical tweets!- search keywords „reactive scala“!

• not frequently in general tweets!- general twitter stream (all tweets)

37

Page 38: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 3: Extract topical keywords• Strength of a word

- is a word relevant for that topical cluster?

38

Low Frequency

High Frequency

Low Frequency

High Frequency

Topical Tweets

Gen

eral

Tw

eets

Page 39: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 3: Extract topical keywords• Strength of a word

- is a word relevant for that topical cluster?

39

Low Frequency

High Frequency

Low Frequency

High Frequency

Topical Tweets

Gen

eral

Tw

eets ✔

relevant for topic / cluster

Page 40: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Step 3: Extract topical keywords• Result

- topical strength for each keyword - sort them by relevancy - select top 20 keyword

• choose clusters that contain this words

40

Page 41: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Final Step• Combine clusters and keywords

• create visual summary

41

Page 42: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Final Step

42

• Keyword1

• Keyword2

• Keyword3

• Keyword4

• …

high relevancy

low relevancy

Page 43: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Final Step

43

• Keyword1

• Keyword2

• Keyword3

• Keyword4

• …

high relevancy

low relevancy

Page 44: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Final Step

44

• Treemap Visualisation - color = cluster - area of word = frequency of word

Page 45: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Final Step

45

• Wordcloud Visualisation - color = cluster - size of word = frequency of word

Page 46: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Final Notes• 4. Million Topical Tweets

• 15 Days

• User Study - Treemap vs. Word Cloud

46

Page 47: Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams

Thank You!• Discussion

- Loosing precision while cleaning tweet - Loosing sense while removing stop words like

„not“ (negate) - Unigram vs. Multigram? - ?

47