Top Banner
Pei Lee, ICDE 2014, Chicago, IL, USA Incremental Cluster Evolution Tracking from Highly Dynamic Network Data Pei Lee, Laks V.S. Lakshmanan Computer Science Department University of British Columbia Vancouver, BC, Canada Evangelos E. Milios Computer Science Department Dalhousie University Halifax, NS, Canada 1 2014-4-16
44

[ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Jul 05, 2015

Download

Data & Analytics

Pei Lee

We describe the complete skills of tracking cluster evolution patterns in large evolving networks in this talk. In simple words, given a dynamic graph which is updated at each time moment, how can we incrementally monitor the evolution patterns of graph clusters? Typical evolution patterns include appear/disappear, grow/decay, merge/split. We discussed the incremental computation framework, in contrast to the traditional graph snapshot sequence approach. The ICDE 2014 paper can be found at http://www.cs.ubc.ca/~peil/research.html
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Pei Lee, ICDE 2014, Chicago, IL, USA

Incremental Cluster Evolution Tracking

from Highly Dynamic Network Data

Pei Lee, Laks V.S. LakshmananComputer Science Department

University of British Columbia

Vancouver, BC, Canada

Evangelos E. MiliosComputer Science Department

Dalhousie University

Halifax, NS, Canada

1

2014-4-16

Page 2: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Outline2

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Page 3: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Outline3

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Page 4: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Evolving Network

Network changes with time

Examples:

Social Network

add/remove friends or followers

Co-authorship/citation network

new collaborations/citations added every year

Email/Calling Graph

every edge has a time stamp

4

Page 5: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

An illustration of evolving co-authorship network

5

Taken from http://wiki.cns.iu.edu/pages/viewpage.action?pageId=2199676

Page 6: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Social Streams:

Twitter, Facebook, etc6

Page 7: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

7

Social Event Evolution Tracking

Page 8: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Event Evolution Patterns8

Post Network

(time t)

Post Network

(time t+1)

Event Snapshots

(time t)

Event Snapshots

(time t+1)

Evolution

Patterns:

emerge

disappear

grow

decay

merge

split

evolve

Page 9: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Evolving Network

Social Events

9

Model social stream as an evolving network

Page 10: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Outline10

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Page 11: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Traditional Evolving Network

Mining Approaches

Divide and Conquer:

decompose a dynamic network into a series of

snapshots for each moment,

apply graph mining algorithms on each snapshot

to find useful patterns,

match patterns between consecutive moments to

generate a dynamic pattern sequence.

Imagine the finding of evolving clusters

11

Page 12: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Illustrating Divide-and-Conquer12

Taken from http://sydney.edu.au/engineering/it/~shhong/gallery.htm

Moment 1Moment 2

Moment 3

Moment 4

Moment 5

Page 13: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Divide-and-Conquer:

Clustering in evolving networks13

Ct: a cluster we find at snapshot of time t;

Ct+1: a cluster we find at snapshot of time t+1.

How to define “Ct evolves to Ct+1”?

Heuristics:

If Ct and Ct+1 have the overlap above a given

threshold, we say they are matched.

Formally, based on Jaccard similarity:

Page 14: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Drawbacks of Divide-and-conquer14

Quality:

It is difficult to decide the threshold K

The matching between two consecutive snapshots

will lose accuracy

Performance:

Need to cluster each snapshot from scratch

Lots of redundant computation

Page 15: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

New Proposal: Incremental Computation

for dense subgraph mining15

Basic Idea:

For the very first snapshot, mine the graph pattern

set S0 from scratch

After this, this step is never applied again.

On the steady state, let t start at 1

Obtain the graph update ΔG by comparing the

network at moment t with moment t-1

Derive St from St-1 based on ΔG

Let t increase to t+1

Page 16: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Divide-and-Conquer vs. Incremental

Computation16

Divide-and-Conquer:

1, 2, 3, 4

Incremental Computation:

Initial step: 1

Steady state: 5

Advantages:

Avoid redundant computation

More accurately capture the evolution patterns

Page 17: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Incremental Computation

Framework17

Adjust the clusters at each moment as the

updating of networks

Page 18: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Outline18

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Page 19: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Post Network Construction19

A social stream is a FIFO queue of posts

Post similarity:

Post Network:

Each post is a node

Each edge is constructed if the similarity of end nodes is higher than a given threshold

Content similarity

Time distance

Page 20: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Evolving Post Network20

We can build a post network for your daily

timeline in Facebook/Twitter/LinkedIn

As the streaming of posts, the post network is

evolving very quickly

Challenges of evolving post network mining:

The quick surge of post streams (speed)

A large number of posts are noise (quality)

The huge amount of posts (scalability)

Page 21: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Observing Time Window21

Len: time window length

Δt: time window shifting size at each moment

Notations:

Page 22: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

How to filter out noise?22

Noise is ubiquitous in social streams

“Good morning ”, “thank you ^.^”, etc

About 40% tweets make very little sense

Page 23: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

How to filter out noise?23

Distinguish posts into three types:wt(p): the priority of post p at moment t

For the example in social network:

Core: person with lots of friends

Border: not core, but a friend of core

Noise: not core, and not a friend of core

Page 24: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Outline24

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Page 25: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Skeletal graph of a post network25

Skeletal Graph:

A graph consisting of all core posts

A brief summary of the original post network

Clusters can be derived from skeletal graphs

Our algorithm monitors the changing of

skeletal graphs

Page 26: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Network Evolution Operations26

Add a post

Remove a post

Page 27: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Cluster Evolution Operations27

We define 6 cluster evolution patterns:

appear, disappear, grow, decay, merge and split

Page 28: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Summary: Cluster Evolution28

Add a post:

a new cluster may appear

An existing cluster may grow

Multiple clusters may merge into the single one

Delete a post:

An existing cluster may disappear

An existing cluster may decay

An existing cluster may split into multiple clusters

Page 29: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Network Evolution to Cluster Evolution29

Cluster evolution of adding a post

Page 30: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Network Evolution to Cluster Evolution30

Cluster evolution of deleting a post

Page 31: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Bulk Updating31

Existing incremental computation on dynamic

graphs usually treats the addition/deletion of

nodes or edges one by one

Since social posts arrive at a high speed, the

post-by-post incremental updating will lead to

very poor performance

Bulk updating: update subgraph-by-subgraph

a bulk = a post cluster

More details in Section VII of the paper

Page 32: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Proposed Algorithms32

ICM: Incremental

Cluster Maintenance

eTrack: Cluster

Evolution Tracking

Page 33: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Outline33

Motivation

Evolving network meets social event

Incremental Computation Framework

Divide-and-conquer vs. incremental computation

Post Network Construction

Combat noise

Network and Cluster Evolution

Evolution operations

Empirical Study

Examples

Page 34: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Twitter Technology domain data sets34

Time span: 1 month

Tech-Lite: collecting all the timelines of users

listed in the Technology category of “Who to

follow” and their retweeted users

streaming rate is about 11700 tweets/day

Tech-Full: collecting all the timelines followed

by users who are in the Technology category

streaming rate is about 7216 tweets/hour

Page 35: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Ground Truth35

Major events from News articles:

Crawl news from major technology websites

By treating the news article titles as posts, we

apply our approach to extract events

Peaks in Google Trends

Page 36: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Precision and recall36

HashtagPeaks: use common hashtags to compute post similarity

UnigramPeaks: use common unigrams to compute post similarity

Louvain: use common entities to compute post similarity and apply Louvain community detection algorithm

eTrack: use common entities to compute post similarity and apply our approach

Page 37: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Top 10 social events detected by

different methods37

Page 38: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Running time 38

(a) Adjusting time window length

(b) Adjusting step length

Page 39: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Cluster Evolution Examples

39

Page 40: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

40

Page 41: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

41

Page 42: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Conclusion42

Theoretical side:

We propose an incremental computation

framework for cluster evolution tracking in highly

dynamic networks

Application side:

We propose an efficient tracking system for event

evolution patterns in social streams

Q & A

Page 43: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Post Network Mining43

A snapshot of post network is constructed by

the posts in the same time window

As social posts stream in, events (dense clusters) are identified out

Page 44: [ICDE 2014] Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

Relationships between post

network, skeletal graph and clusters44

Skeletal graph is a sketch of post network

Clusters can be generated from the skeletal

graphs