Top Banner
Finding and Evaluating Community Structure in Networks M.E.J. Newman and M. Girvan Physical Review E 69, 026113 (2004) 11 July 2014 SNU IDB Lab. Namyoon Kim
23

Finding and Evaluating Community Structure in Networks

Dec 31, 2015

Download

Documents

ava-herman

Finding and Evaluating Community Structure in Networks. M.E.J. Newman and M. Girvan Physical Review E 69, 026113 (2004) 1 1 July 2014 SNU IDB Lab. Namyoon Kim. Outline. Introduction Hierarchical Clustering Edge Betweenness The Algorithm Implementation Weighting - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Finding and Evaluating Community  Structure in Networks

Finding and Evaluating Community Structure in NetworksM.E.J. Newman and M. GirvanPhysical Review E 69, 026113 (2004)

11 July 2014SNU IDB Lab.

Namyoon Kim

Page 2: Finding and Evaluating Community  Structure in Networks

2 / 23

Outline

IntroductionHierarchical ClusteringEdge BetweennessThe AlgorithmImplementation

WeightingEdge betweenness contribution

Community strengthModularityTestsConclusion

Page 3: Finding and Evaluating Community  Structure in Networks

3 / 23

Introduction

NetworksInterest in theoretical modelling of networks in recent yearsCovers a wide variety of topics such as statistical physics, applied mathematics, computational biology and social networking

Community StructureWithin network connections: denseBetween network connections: sparse

Page 4: Finding and Evaluating Community  Structure in Networks

4 / 23

Hierarchical Clustering: Agglomerative

AgglomerativeEdges added to an initially empty networkTends to find only the core of communitiesPeripheral nodes are important in finding the true size of a network

Page 5: Finding and Evaluating Community  Structure in Networks

5 / 23

Hierarchical Clustering: Divisive

DivisiveStart with a non-empty network, find the least similar pairs of vertices and re-move their in-between edges

Newman’s approachLook for edges that are between networks

Page 6: Finding and Evaluating Community  Structure in Networks

6 / 23

Edge Betweenness

BetweennessAll paths from community A to community B (and vice versa) must pass through either edges 1 or 2Edges 1 and 2 have high betweenness

source: www.cs.kent.edu/~jin/DM07/PPT/muad.ppt

1

2

Page 7: Finding and Evaluating Community  Structure in Networks

7 / 23

The Algorithm

Shortest path betweennessFind shortest paths for all pairs of vertices and count how many run along each edge

Recalculation stepRemove edge with highest countRecalcuate shortest path betweenness for all edges

Steps1. Calculate betweenness scores for all edges in the network2. Find the edge with the highest score and remove it from the network3. Recalculate betweenness for all remaining edges4. Repeat from step 2

Page 8: Finding and Evaluating Community  Structure in Networks

8 / 23

Implementation – weighting (i)

Weightingi. Initial vertex s is given distance 0 and weight 1

S

(ds = 0, ws = 1)

Page 9: Finding and Evaluating Community  Structure in Networks

9 / 23

Weighting (ii)

Weightingii. Every vertex i adjacent to s is given distance di = ds + 1 = 1

and weight wi = ws = 1

S

(0, 1)

(di = 1, wi = 1)(di = 1, wi = 1)

ii

Page 10: Finding and Evaluating Community  Structure in Networks

10 / 23

Weighting (iii)

Weightingiii. For each vertex j adjacent to i, do:

a) wj = wi, and dj = di + 1,

ONLY when dj is not as-signed

yet b) Add weights of other incoming vertices (i) ONLY

ifdj is assigned AND dj ≥ di +

1

S

(0, 1)

(1, 1)

(1, 1)

ii

(di = 2, wi = 2) (di = 2, wi = 1)

j j

Page 11: Finding and Evaluating Community  Structure in Networks

11 / 23

Weighting (iv)

Weightingiv. Repeat from iii until no vertices remain that have assigned distances but whose neighbours do not have assigned distances

Time complexity: O(E)

S

(0, 1)

(1, 1)

(1, 1)

(2, 2)

(2, 1)

(3, 1)

(3, 3)

Page 12: Finding and Evaluating Community  Structure in Networks

12 / 23

Implementation – edge betweenness contribution (i)

Edge betweenness contributioni. Find every “leaf” vertex t that no paths from s to other vertices go through

S

(1)

(1)(1)

(2) (1)

(1)(3) t t

Page 13: Finding and Evaluating Community  Structure in Networks

13 / 23

Edge betweenness contribution (ii)

Edge betweenness contributionii. From each vertex i neighbouring t, assign a score for the t-i edge of wi/wt

S

(1)

(1)(1)

(2) (1)

(1)(3) t t

2313

1

i i

Page 14: Finding and Evaluating Community  Structure in Networks

14 / 23

Edge betweenness contribution (iii)

Edge betweenness contributioniii. Work upwards to s. From node j to i (j farther from s than i), assign the edge a score of wi/wj×(1 + sum of all scores of edges immediately below j)

S

(1)

(1)(1)

(2) (1)

(1)(3)

2313

1

j j

ii11×(1+1+ 13 )=731

2×(1+ 23 )=56

56

Page 15: Finding and Evaluating Community  Structure in Networks

15 / 23

Edge betweenness contribution (iv)

Edge betweenness contributioniv. Repeat from iii until s is reached

Time complexity: O(E)

S

(1)

(1)(1)

(2) (1)

(1)(3)

2313

1

11×(1+ 56 + 7

3 )=25611×(1+ 56 )=116

5673

56

Page 16: Finding and Evaluating Community  Structure in Networks

16 / 23

Algorithm complexity

Edge betweenness contributionRepeat weighting and edge betweenness contribution calculations for all V source vertices s, E times (every time an edge is removed)

Time complexity:(O(E) + O(E)) × V × E = O(E2V)= O(n3)

S

(1)

(1)(1)

(2) (1)

(1)(3)

2313

1

5673

56

116

256

Page 17: Finding and Evaluating Community  Structure in Networks

17 / 23

Community strength

Community structure strengthHow do we know the algorithm produces good results?

Some definitionsSay we have a network which is currently divided into k communitiesWe have a k × k symmetric matrix eeach element eij = (edges that link vertices in community i to community j) / (all edges in the original* network)*Network’s initial state with no removed edges

Tr e = : fraction of edges in the network that connect vertices in the same community

ai = : fraction of edges that connect to vertices in community i

Page 18: Finding and Evaluating Community  Structure in Networks

18 / 23

Modularity

ModularityQ =

Q = 0 means the split is no better than random partitioningQ = 1 means network has strong community structureGenerally, networks with reasonably well split communities have Q of 0.3 – 0.7

Page 19: Finding and Evaluating Community  Structure in Networks

19 / 23

Tests – shortest-pathszin = mean no. of edges from a vertex to another vertex in same community

zout = mean no. of edges from a vertex to another vertex in different community

Page 20: Finding and Evaluating Community  Structure in Networks

20 / 23

Tests - correctness

Page 21: Finding and Evaluating Community  Structure in Networks

21 / 23

Tests – random walk and recalculation

Page 22: Finding and Evaluating Community  Structure in Networks

22 / 23

ConclusionContributions

A new class of algorithms for performing network clustering

Described the task of extracting the natural community structure from net-works of vertices and edges

Future WorkReduce time complexity

Page 23: Finding and Evaluating Community  Structure in Networks

23 / 23

References[1] M.E.J. Newman and M. Girvan. Finding and Evaluating Community Structure in Networks. Phys. Rev. E 69 (2):026113, 2004.[2] M.E.J. Newman, Fast Algorithm for Detecting Community Structure in Networks. Phys. Rev. E 69, 066133, 2004. Pre-sentation by Muad Abu-Ata, www.cs.kent.edu/~jin/DM07/PPT/muad.ppt