Top Banner
111/03/25 1 A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks Min-SooKim and Jiawei Han Proceeding of the International Conference on Very Large Data Bases, VLDB, 2009 Speaker: Chien- Liang Wu
32

A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Jan 06, 2016

Download

Documents

Mircea raducanu

A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks. Min-SooKim and Jiawei Han Proceeding of the International Conference on Very Large Data Bases, VLDB, 2009. Speaker: Chien-Liang Wu. Outline. Introduction Motivation & Goals - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

112/04/20 1

A Particle-and-Density Based Evolutionary Clustering Method for

Dynamic Networks

Min-SooKim and Jiawei Han

Proceeding of the International Conference on Very Large Data Bases, VLDB, 2009

Speaker: Chien-Liang Wu

Page 2: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Outline Introduction Motivation & Goals Particle-and-Density Based Evolutionary Clustering

Modeling of Community Local Clustering Mapping of Local Clusters

Experiments Conclusions

2

Page 3: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Dynamic Networks Sequence of networks with different timestamps

allow new nodes’ attachment or existing nodes’ detachment Great potential in capturing natural and social

phenomena over time Ex: network/telephone traffic data, bibliographic data,

dynamic social networks, etc

3t=0 t=1 t=2 t=3 t=4

Page 4: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Evolutionary Clustering Features

Clustering each temporal data with considering the relationship with existing data points

Capture the evolutionary process of clusters in temporal data Assume that the structure of clusters significantly changes in a very

short time Use the temporal smoothness framework

Producing a sequence of local clustering results Comparison with incremental clustering

Dynamic updating when new data points arrive Producing one updated clustering result

4

Page 5: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Temporal Smoothness

Trying to smooth out each cluster over time By trading off between snapshot quality and history quality

Snapshot quality: how accurately the clustering result captures the structure of current network

History quality: how similar the current clustering result is with the previous clustering result

By using user-specific parameter α Cost function minimize it

High α : better snapshot quality Low α : better history quality

5

Page 6: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Motivation Previous evolutionary clustering methods

Assume only a fixed number of clusters over time Not allow arbitrary start/stop of community over time

However, the forming of new communities and dissolving of existing communities are quite natural and common phenomena in real dynamic networks Ex: research groups form or dissolve at some time in the co-

authorship dynamic network from the DBLP data

6

Page 7: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Goals Propose a new evolutionary clustering

Removes the constraint on the fixed number of communities Allows the forming of new communities and the dissolving of

existing communities Solve two sub-problems

Problem 1: how to perform clustering Gt with temporal smoothing when |CRt-1| ≠ |CRt|

Problem 2: how to connect between Ct-1∈CRt-1 with Ct∈CRt when |CRt-1| ≠ |CRt| to determine the stage of each community among the following three stages: evolving, forming, and dissolving

7

Page 8: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Definitions of symbols

8

Page 9: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Modeling of Community Nano-Community

Definition Neighborhood N(v) of a node v∈Vt = {x∈ Vt | ⟨v, x∈Et} {∪ v}

Nano-community NC(v, w) of two nodes v∈Vt -1 and w∈ Vt is defined by a sequence [N(v), N(w)] having a non-zero score for a similarity function Γ: N( ) ×⋅ N( ) →⋅ ℜ

Features A kind of particle capturing how dynamic networks evolve over time at

a nano level Can be represented by a link

9

Page 10: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Similarity Function ΓE()

Similarity function ΓE()

Example

10

otherwise ,0

exists , edgecommon a

or if ,1

))(),(( wv

wv

wNvNE

a

b

dc

a

b

dc

e

Gt-1 Gt

N(a)

N(a)NC(a,a)NC(a,b)

N(b)

NC(a,d)

N(d)Links

between a and Gt

Page 11: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Community Topological model of a community M in the t-partite graph

Clique Ks is the structure of the local cluster Have the highest density in networks

Biclique Kr,s is the structure of the community Extend the number of partites of biclique from two to l

Consider cross section (i.e. a local cluster) of a community

Define l-clique-by-clique (l-KK) by generalizing biclique

l-KK is the densest community structure

11

Page 12: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Quasi l-KK In real applications, most of communities have the looser

structure, i.e., quasi l-KK Data inherent quasi l-KKs in a given dynamic network

Have relatively dense links and edges Provide guidance on how to find the communities

12t1 t2 t3 t4 t5

Page 13: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Clustering with temporal smoothing Previous methods

Adjust the clustering result CRt itself iteratively (⇒degrade performance)

Smooth Ct∈CRt by using the corresponding Ct-1∈CRt-1 (⇒require 1:1 mapping)

Four cases of the relationship between two nodes v and w at timestamps t-1 and t

Case 2: When α ↑ v, w in the same cluster at t When α ↓ v, w in the different cluster at t

13

Page 14: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Cost Embedding Technique The method proposed in this paper

No iterative adjusting CRt by pushing down the cost formula into the node distance dt (⇒no degrading performance)

Smoothing at the data level, which is independent of clustering results (⇒no requirement of 1:1 mapping)

where: do(v, w): original distance between v and w at time t without temporal

smoothing dt(v, w): smoothed distance between v and w at time t

SCN =│ do(v, w)- dt(v, w)│, TCN =│ dt-1(v, w)- dt(v, w)│

14

Page 15: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Cost Embedding Technique(contd.)

The optimal distance d’t(v, w) that minimize the costN

α =1, d’t(v, w) = do(v, w)

α =0, d’t(v, w) = dt-1(v, w)

15

Page 16: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Density-Based Clustering Use DBSCAN method for finding all local clusters at

timestamp t Extend cosine similarity by cost embedding technique

σ(v, w): 0.0~1.0, especially become 1.0 when both v and w are in a clique

where σt(v, w) denotes as σ(v, w) in Gt

16

Page 17: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Clustering of Optimal Modularity DBSCAN requires two kinds of user-defined

parameters εt: specify the minimum similarity between nodes within a

cluster μt: specify the minimum size of cluster

Clustering result is sensitive to εt, but not much sensitive to μt

Determine εt automatically by using the novel concept of modularity

17

Page 18: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Clustering of Optimal Modularity(contd.)

The extended modularity QS

NC: the number of clusters TS: the total similarity between all pairs of nodes in the graph ISc: the total similarity of a pair of nodes within a cluster c

DSc: the total similarity between a node in the cluster c and any node in the graph

Optimal clustering is achieved by maximizing QS

NP-complete18

Page 19: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Clustering of Optimal Modularity(contd.)

Initial clustering with a density parameter seedεt (e.g., median)

Decreases or increases it by Δε (e.g. 0.01 or 0.02) until reaching the maximum modularity

19

Ex: NCAA football data (2006)

Page 20: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Three Stages of Community M Notations

Bt-1,t : bipartite graph between Ct-1∈CRt-1 and Ct∈CRt

θ(Bt-1,t) : link density

Three cases of relationships between Ct-1 and Ct Forming: Ct-1∈CRt-1 s.t. θ(Bt-1,t) > δthreshold

Dissolving: Ct∈CRt s.t. θ(Bt-1,t) > δthreshold

Evolving: θ(Bt-1,t) > δthreshold

A. Growing (Mt-1 Mt): M grows between t-1 and t

B. Shrinking (Mt-1 Mt): M shrinks between t-1 and t

C. Drifting (│Mt-1 ∩ Mt│≠0): M drifts between t-1 and t

Perform mapping based on mutual information instead of using a fixed δthreshold

20

tt

tt

MM

B

VV

L

1

,1

where Mt: the local cluster of M at time t}

Page 21: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Mapping of Local Clusters Link Counting

Mapping task is performed based on the number of links (especially, link distribution)

Lemma:

21t2 t3< α =0.8>

Page 22: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Mutual Information Mutual information equation

Properties If the distribution of P(X) and P(Y) is purely random

MI(X; Y) becomes 0 If the distribution of P(X) and P(Y) is skewed

MI(X; Y) becomes high

22

Page 23: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Purifying Process If the relatively low probability value is set as zero

Purify the distribution more, MI(X; Y) increases Derivation of MI equation for link distribution

23unit MI

Page 24: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Purifying Process (contd.)

Mapping between Ct-1 and Ct indicates Making all cells of Matt-1,t[Ct-1][ ] and ⋅ Matt-1,t[ ][⋅ Ct] except

Matt-1,t[Ct-1] [Ct] zero and updating Arrt-1 ,Arrt, and LCTotal

Combinatorial optimization problem Choose at most min(|CRt-1|, |CRt|) pairs from |CRt-1|×|CRt|

pairs Propose an heuristic algorithm for maximizing MI

First choose ⟨Ct-1, Ct having the ⟩ highest unit MI

24

Page 25: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Example

25

Page 26: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Experiments Synthetic Data set

Timestamp: 1~10 Noise level zout: represents the average number of edges from

a node to nodes in other communities SYN-FIX

# clusters: [4, 4, 4, 4, 4, 4, 4, 4, 4, 4] # nodes in each community: 32 (total 128)

26

Page 27: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Experiments (contd.)

SYN-VAR # clusters: [4, 5, 6, 7, 8, 8, 7, 6, 5, 4] # nodes / cluster: 32 ~ 64 (total 256) , lasting for 5 timestamps and its nodes return to

the original clusters

27

Page 28: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Experiments (contd.)

Real Data DBLP

co-authorship information 127,214 unique authors 10 years from 1999 to 2008

Measure Effectiveness: Normalized Mutual Information (called NMI)

between the ground truth and the clustering result Higher NMI indicates better accuracy

Efficiency: running time (sec.)28

Page 29: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Effectiveness SYN-FIX:

SYN-VAR:

29

Page 30: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Efficiency Improve the time performance over 10 times

Due to avoiding a lot of iterations Suitable for large-scale dynamic network data

30

Page 31: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

DBLP Data When α is high,

Communities become less temporal smooth The number of communities increases

A local cluster is not connected with other local cluster due to the low density between them

The average lifetime of community decreases Low α shows the opposite trend

31

Page 32: A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks

Conclusions Propose particle-and-density based evolutionary clustering

method Nano-community (particle) & quasi l-KK (density)

Provide guidance on how to discover a variable number of communities of arbitrary forming and dissolving

Cost-embedding technique & density-based clustering using optimal modularity

Not require 1:1 mapping for temporal smoothing Achieve high clustering quality and time performance

Mapping method based on mutual information Make sequence of local clusters as close as possible to data inherent quasi

l-KKs

32