Top Banner
Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011
31

Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Dec 28, 2015

Download

Documents

Sydney Warner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Evolutionary Clustering and Analysis of Bibliographic Networks

Manish Gupta (UIUC)Charu C. Aggarwal (IBM)

Jiawei Han (UIUC)Yizhou Sun (UIUC)

ASONAM 2011

Page 2: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Introduction

• Information networks are everywhere: social networks, web, academic networks, biological networks.

• Heterogeneous information networks– Contain multi-typed nodes.– Richer representation compared to homogeneous

networks.• We study clustering and evolution diagnosis in

massive heterogeneous information networks.

Page 3: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Contributions

• We present an evolutionary clustering algorithm for heterogeneous information networks (ENetClus)

• We define metrics to characterize clustering behavior

• We perform study of evolution in a bibliographic heterogeneous network: DBLP

Page 4: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

ENetClus features

• Multi-typed• Evolutionary• Temporal smoothness• Agglomerative• Multiple granularities• Based on NetClus

• Consistency• Quality• Cluster Sizes• Evolution rate• Cluster appearance/

disappearance• Stability of objects• Sociability of objects• Social influence

Evolution metrics

Study over DBLP

Page 5: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Problem Formulation

• Net-Cluster• Net-Cluster tree• Net-Cluster tree sequence• Problem: Given a graph sequence GS,

generate a net-cluster tree sequence CTS such that the trees are consistent and represent high-quality clusters.

Level 1

Level 2

Level 3

K=3

. . .CT1 CT2 CTN

CTS

Page 6: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Level 1

Level 2

Level 3

K=3nc

nc nc nc

nc nc nc nc nc nc nc nc nc

Page 7: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

. . .CT1 CT2 CTN

Page 8: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Approaches

• Problem: Perform evolutionary clustering over a sequence of heterogeneous network snapshots

• Approaches– Use homogeneous clustering techniques

• Does not exploit rich typed information in network• Objects related to same entity may get clustered into different

clusters.

– Use some heterogeneous network clustering algorithm• May provide high snapshot clustering quality• But may not provide good consistency between clusterings

across snapshots

Page 9: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

NetClus

• NetClus is an algorithm to perform clustering over heterogeneous network.

• It performs iterative ranking of clustering of objects. • A probabilistic generative model is used to model

the probability of generation of different objects from each cluster.

• A maximum likelihood technique is used to evaluate the posterior probability of presence of an object in a cluster.

Page 10: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

NetClus

• Priors: Initialize prior probabilities • Initialize: Generate initial net-clusters. • Rank: Build probabilistic generative model for each net-

cluster, i.e., • Cluster-target: Compute p for target objects and adjust

their cluster assignments.• Iterate: Repeat steps 3 and 4 until the clusters don’t change

significantly.• Cluster-attribute: Calculate p for each attribute object in

each net-cluster.• Return p

Page 11: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

ENetClus

• For the first time instant, initialization of priors and net clusters is similar to NetClus

• For other time instants– The prior probability of an object o belonging to cluster ck is defined as

its representativeness in the corresponding cluster within the net-cluster tree for the previous time instant.

– A target object o is assigned to cluster ck with probability pk where pk is the normalized sum of the prior probabilities of neighboring attribute type objects.

• Ranking is similar to NetClus except that prior probabilities are also used along with the authority based ranking. Prior weight controls the effect of priors and hence the temporal smoothness.

Page 12: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

How is ENetClus better than NetClus?NetClus: Inconsistent clusters

ENetClus: Consistent clustersSnapshot1 Snapshot2 Snapshot3

Snapshot1 Snapshot2 Snapshot3

Page 13: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Metrics• Membership probability of object o to cluster ci is denoted

by

• Consistency:

• Chained path consistency: product of consistency over each interval in the sequence

Page 14: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Metrics

• Snapshot Quality– Compactness

– Entropy

Page 15: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Metrics

O’: Objects at time y but not at y-1O’’: Objects at time yO’’’: Objects at time y but not at y+1

Page 16: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Metrics

• Stability of objects– Degree to which an object is stable with respect to

its cluster or network• Sociability of objects– Degree to which an object interacts with different

clusters• Effect of social influence: normality– Normality is the degree to which an object follows

the cluster trend

Page 17: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Experiments

• Dataset– DBLP• 1993 to 2008, 654K papers, 484K authors, 107K title

terms and 3900 conferences• Number of clusters = 4• Levels of net Cluster tree = 4• Prior weight varied from 0 to 1

– Four_area• DM, DB, IR, ML papers• 1993 to 2008, 29K papers, 28K authors, 20 conferences

Page 18: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Page 19: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Page 20: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Page 21: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Page 22: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Page 23: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Page 24: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Page 25: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.
Page 26: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Related work

• Clustering graphs: Mincut, Min-max cut, Spectral, density-based, RankClus [Sun EDBT 09], NetClus [Sun KDD 09]

• Evolutionary clustering: k-means [Chak KDD06], spectral [Chi KDD07], text streams [Mei KDD05], social network structure [Kuma KDD06]

• Evolutionary graph studies: GraphScope [Sun KDD07], density-based [Kim VLDB09], analysis [Back KDD06, Lesk KDD05, Lesk KDD08], communities using FacetNet [Lin WWW08], individual objects [Asur KDD07]

Page 27: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Conclusion

• A clustering algorithm for evolution diagnosis of heterogeneous information networks.

• Metrics for novel insights into the evolution both at the object level and the clustering level

• Analysis and evolutionary study of DBLP

Page 28: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

Acknowledgements

Research was sponsored in part by the U.S. National Science Foundation under grant IIS-09-05215, and by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053 (NS-CTA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Page 29: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

References (1)

Page 30: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

References (2)

Page 31: Evolutionary Clustering and Analysis of Bibliographic Networks Manish Gupta (UIUC) Charu C. Aggarwal (IBM) Jiawei Han (UIUC) Yizhou Sun (UIUC) ASONAM 2011.

References (3)