Statistical properties of network community structure Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research
Dec 16, 2015
Statistical properties of network community structure
Jure Leskovec, CMUKevin Lang, Anirban Dasgupta and Michael MahoneyYahoo! Research
Network communities
Communities: Sets of nodes with lots
of connections inside and few to outside (the rest of the network)
Assumption: Networks are
(hierarchically) composed of communities (modules)
Communities, clusters, groups,
modules
Community score (quality)
How community like is a set of nodes?
Want a measure that corresponds to intuition
Conductance (normalized cut):Φ(S) = # edges cut / # edges inside
Small Φ(S) corresponds to more community-like sets of nodes
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Bad communit
yΦ=5/7 = 0.7
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better communit
y
Φ=5/7 = 0.7
Bad communit
y
Φ=2/5 = 0.4
Community score (quality)
Score: Φ(S) = # edges cut / # edges inside
Better communit
y
Φ=5/7 = 0.7
Bad communit
y
Φ=2/5 = 0.4
Best communit
yΦ=2/8 = 0.25
Network Community Profile Plot We define:
Network community profile (NCP) plotPlot the score of best community of size k
Search over all subsets of size k and find best: Φ(k=5) = 0.25
NCP plot is intractable to compute Use approximation algorithm
NCP plot: Small Social Network
Dolphin social network Two communities of dolphins
NCP plotNetwork
NCP plot: Zachary’s karate club
Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B
NCP plotNetwork
NCP plot: Network Science
Collaborations between scientists in Networks
NCP plotNetwork
Geometric and Hierarchical graphs
Hierarchical network
Geometric (grid-like) network – Small social
networks– Geometric and– Hierarchical networkhave downward NCP plot
Our work: Large networks
Previously researchers examined community structure of small networks (~100 nodes)
We examined more than 70 different large social and information networks
Large real-world networks look
completely different!
Example of our findings
Typical example:General relativity collaboration network (4,158 nodes, 13,422 edges)
NCP: LiveJournal (N=5M, E=42M)
Better and better
communities
Worse and worse
communities
Best community
has 100 nodes
Com
mu
nit
y s
core
Community size
Explanation: Downward part
Whiskers are responsible for downward slope of NCP plot
Whisker is a set of nodes connected to the network by a single
edge
NCP plot
Largest whisker
Explanation: Upward part
Each new edge inside the community costs more
NCP plot
Φ=2/1 = 2
Φ=8/3 = 2.6
Φ=64/11 = 5.8
Each node has twice as many
children
Suggested Network Structure
Network structure: Core-
periphery, jellyfish, octopus
Whiskers are responsible for
good communities
Denser and denser core
of the network
Core contains 60%
node and 80% edges
Caveat: Bag of whiskers
What if we allow cuts that give disconnected communities?
Cut all whiskers and compose
communities out of them
Caveat: Bag of whiskersC
om
mu
nit
y s
core
Community size
We get better community scores when
composing disconnected sets of
whiskers
Connected communitie
sBag of whiskers
Comparison to a rewired network
Rewired network: random
network with same degree distribution
What is a good model?
What is a good model that explains such network structure?
None of the existing models work
Pref. attachment Small World Geometric Pref. Attachment
FlatDown and Flat
Flat and Down
Forest Fire model works
Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively
As community grows it
blends into the core of
the network
Forest Fire NCP plot
rewired
network
Bag of whisk
ers
Conclusion and connections Whiskers:
Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social
relationship to 150 people Bond vs. identity communites
Core: Newman et al. analyzed 400k node product network▪ Largest community has 50% nodes▪ Community was labeled “miscelaneous”
Conclusion and connections
NCP plot is a way to analyze network community structure
Our results agree with previous work on small networks
But large networks are fundamentally different
Large networks have core-periphery structure Small well isolated communities blend into the
core of the networks as they grow