Estimating Clustering Coefficients and Size of Social Networks via Random Walk Stephen J. Hardiman* Capital Fund Management France Liran Katzir Advanced Technology Labs Microsoft Research, Israel *Research was conducted while the author was unaffiliated
34
Embed
Estimating Clustering Coefficients and Size of β¦lirank/pubs/2013-...Global CC Algorithm 1. Ξ¨π β Sampled nodes average degree - 1. π = 1 if there is an edge π£ β1βπ£
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Estimating Clustering Coefficients and Size of Social Networks via
Random Walk Stephen J. Hardiman*
Capital Fund Management
France
Liran Katzir
Advanced Technology Labs Microsoft Research, Israel
*Research was conducted while the author was unaffiliated
Motivation: Social Networks
Facebook Twitter Qzone Google+
Sina Weibo
Habbo Renren
LinkedIn Vkontakte
Bebo
Tagged Orkut
Netlog
Friendster hi5
Flixster
MyLife Classmates.com
Sonico.com
Plaxo
Motivation: External access
v1 v2
v3 v5
v6
v7
v4 v8
v9
Social Analytics
The online social network
Disk Space
Communication
Privacy
Task: Estimate parameters
Business development/ advertisement/ market size.
Predicting Social Productsβ Potential.
Global Clustering Coefficient
Network Average
CC
Number of Registered
Users
Global CC = 3 x number of triangles
number of connected triplet
Global Clustering Coefficient
v1 v2
v3 v5
v6
v7
v4 v8
v9
Triangle Connected Triplet
Global Clustering Coefficient
Exact: [Alon et al, 1997]
Estimation β input is read at least once:
β’ Random Access: [Avron, 2010]
β’ Streaming Model: [Buriol et al, 2006]
Estimation β sampling:
β’ Random Access: [Schank et al, 2005]
β’ External Access: This work.
Ci = #connections between viβ²s neighbors
di (diβ1)/2
Local Clustering Coefficient
v1 v2
v3 v5
v6
v7
v4 v8
v9
di β degree of node i
d1 = 1 d9 = 2 d2 = 3
C2 =1/3
Network Average CC = average local CC
Network Average CC
Exact: NaΓ―ve.
Estimation β input is read at least once:
β’ Streaming Model: [Becchetti et al, 2010]
Estimation β sampling:
β’ Random Access: [Schank et al, 2005]
β’ External Access: [Ribeiro et al 2010], [Gjoka et al, 2010], This work β Improved accuracy.
Number of Registered Users
Exact: trivial
Estimation β sampling:
β’ External Access: [Hardiman et al 2009], [Katzir et al, 2011], This work β Improved accuracy.
Random Walk
v1 v2
v3 v5
v6
v7
v4 v8
v9
Sampled Nodes: v1 v2 v3 v4
1
22
3
22
2
22
2
22
Stationary
Distribution = ππ
ππ
3
22
2
22
3
22
4
22
2
22
v5
Random Walk - Summary
v1 v2
v3 v5
v6
v7
v4 v8
v9
Visible Nodes Invisible Nodes Sampled Nodes
Visible Edges
Invisible Edges
Global CC Algorithm
1. Ξ¨π β Sampled nodes average degree - 1.
ππ = 1 if there is an edge π£πβ1 β π£π+1,
0 Otherwise.
2. Ξ¦π β Sampled nodes average ππππ .
The estimated global clustering coefficient:
ππ =Ξ¦π
Ξ¨π
ππ = 1 iff π£πβ1, π£π , π£π+1 is a triangle