Johan Ugander, Lars Backstrom, Jon Kleinberg World Wide Web Conference May , Subgraph Frequencies: The Empirical and Extremal Geography of Large Graph Collections
Johan Ugander, Lars Backstrom, Jon KleinbergWorld Wide Web ConferenceMay !", #$!%
Subgraph Frequencies:The Empirical and Extremal Geographyof Large Graph Collections
▪ Neighborhoods: graph induced by friends of a single ego, excluding ego
Graph collections
All Friends
One-way Communication Mutual Communication
Maintained Relationships
▪ Neighborhoods: graph induced by friends of a single ego, excluding ego
▪ Groups: graph induced by members of a Facebook ‘group‘
▪ Events: graph induced by ‘Yes’ respondents to a Facebook ‘event’
Graph collections
All Friends
One-way Communication Mutual Communication
Maintained Relationships
▪ Neighborhoods: graph induced by friends of a single ego, excluding ego
▪ Groups: graph induced by members of a Facebook ‘group‘
▪ Events: graph induced by ‘Yes’ respondents to a Facebook ‘event’
Graph collections
All Friends
One-way Communication Mutual Communication
Maintained Relationships
Seeking a ‘coordinate system’ on these graphs
SubgraphsAll Friends
One-way Communication Mutual Communication
Maintained Relationships
SubgraphsAll Friends
One-way Communication Mutual Communication
Maintained Relationships
SubgraphsAll Friends
One-way Communication Mutual Communication
Maintained Relationships
SubgraphsAll Friends
One-way Communication Mutual Communication
Maintained Relationships
Compute frequencies
Subgraph Frequencies▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G
is the fraction of k-tuples of nodes in G that induce a copy of F.
Motifs/Frequent subgraphs: Inokuchi et al. #$$$, Milo et al. #$$#, Yan-Han #$$#, Kuramochi-Karypis #$$&Triad census: Davis-Leinhardt !'(!, Wasserman-Faust !''&
Subgraph Frequencies▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G
is the fraction of k-tuples of nodes in G that induce a copy of F.
▪ Subgraph frequency vectors:
s(·, G) = (y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11)
s(·, G) = (x1, x2, x3, x4)
Motifs/Frequent subgraphs: Inokuchi et al. #$$$, Milo et al. #$$#, Yan-Han #$$#, Kuramochi-Karypis #$$&Triad census: Davis-Leinhardt !'(!, Wasserman-Faust !''&
= ($.!), $.%(, $.!&, $.%!)
Subgraph Frequencies▪ Definition: The subgraph frequency s(F,G) of a k-node subgraph F in a graph G
is the fraction of k-tuples of nodes in G that induce a copy of F.
▪ Subgraph frequency vectors:
Motifs/Frequent subgraphs: Inokuchi et al. #$$$, Milo et al. #$$#, Yan-Han #$$#, Kuramochi-Karypis #$$&Triad census: Davis-Leinhardt !'(!, Wasserman-Faust !''&
= ($.!), $.%(, $.!&, $.%!)
s(·, G) = (y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11)
s(·, G) = (x1, x2, x3, x4)
Empirical/Extremal Questions
▪ Consider the subgraph frequencies as a ‘coordinate system’
▪ Empirical Geography:
▪ What subgraph frequencies do social graphs exhibit?
▪ Is there a good model?
▪ Extremal Geography:
▪ How much of this space is even feasible, combinatorially?
▪ Do empirical graphs fill the feasible space?
Empirical/Extremal Questions
▪ What’s a property of graphs and what’s a property of people?
▪ Consider the subgraph frequencies as a ‘coordinate system’
▪ Empirical Geography:
▪ What subgraph frequencies do social graphs exhibit?
▪ Is there a good model?
▪ Extremal Geography:
▪ How much of this space is even feasible, combinatorially?
▪ Do empirical graphs fill the feasible space?
What do we expect?
tria
dic clo
sure
triadic closure
triadic closure
What do we expect?
What do we expect?
We expect few wedges, many triangles for social networks.
The triad space
The triad space
You are here
*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events
You are here
Gn,p
The triad space
*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events
Subgraph frequency of
*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events
Subgraph frequency of
*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events
Subgraph frequency of
Gn,p
*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events
Subgraph frequency of
*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events
ExtremalGraph Theory
Subgraph frequency ofFrequency of the ‘forbidden triad’ is bounded at ! "/#.
Sharp for Kn/#,n/# (bipartite graph) when n is even.
*$ node graphsOrange - Neighborhoods Green - GroupsLavender - Events
Subgraph frequencies
‘Crowd-sourced’ inner bounds
Consider all social graphs and the complements of all graphs, anti-social graphs (which are also graphs!)
What graphs are missing?
▪ Square unlikely to form:
Triadic Closure and Squares
▪ Square unlikely to form:
Triadic Closure and Squares
▪ Square unlikely to form:
▪ Square has very short ‘half-life’:
Triadic Closure and Squares
Continuous Time Markov Chain Model
tria
dic clo
sure
triadic closure
triadic closure
Continuous Time Markov Chain Model
Edge Formation Random Walk (EFRW)▪ Continuous-time Markov chain▪ Transitions between unlabeled, undirected graphs based in edge formation.
▪ Independent Poisson processes for all node pairs:▪ Arbitrary formation: rate ɣ > $ ▪ Arbitrary deletion: rate δ > $
▪ Triadic closure formation for each wedge: rate λ + $
Edge Formation Random Walk (EFRW)▪ Continuous-time Markov chain▪ Transitions between unlabeled, undirected graphs based in edge formation.
▪ Independent Poisson processes for all node pairs:▪ Arbitrary formation: rate ɣ > $ ▪ Arbitrary deletion: rate δ > $
▪ Triadic closure formation for each wedge: rate λ + $
▪ For &-node graphs, succinct Markov chain state transition diagram:
6�
�4�
�
2�
2�
3�
3�
2�
�4�
�
�+�
2�
3(�+�)
6�3�
�
4�2(�+�)
4�2(�+
�)
2(�+2�)�+�
�
�
2��
Edge Formation Random Walk (EFRW)▪ Continuous-time Markov chain▪ Transitions between unlabeled, undirected graphs based in edge formation.
▪ Independent Poisson processes for all node pairs:▪ Arbitrary formation: rate ɣ > $ ▪ Arbitrary deletion: rate δ > $
▪ Triadic closure formation for each wedge: rate λ + $
▪ For &-node graphs, succinct Markov chain state transition diagram:
6�
�4�
�
2�
2�
3�
3�
2�
�4�
�
�+�
2�
3(�+�)
6�3�
�
4�2(�+�)
4�2(�+
�)
2(�+2�)�+�
�
�
2��
Fitting λ to subgraph data▪ How well can we fit λ?
▪ Subgraph frequencies are modeled very well by triadic closure.
frequ
ency
0.00
10.
010
0.10
01.
000
Neighborhoods, n=50
●●
●
●
●
●
●
●
●
●
●
●
●
Neighborhoods data, meanFit model, λ ν = 19.37
frequ
ency
0.00
10.
010
0.10
01.
000
Groups, n=50
●
●
Groups data, meanFit model, λ ν = 7.02
●●
●
● ●
●
●
●
●
● ●
frequ
ency
0.00
10.
010
0.10
01.
000
Events, n=50●
●
●
● ●
●
●
●
●
● ●
●
●
Events data, meanFit model, λ ν = 7.38
(log-scale y-axis)
Extremal graph theory▪ Subgraph frequencies s(F,G) closely related to homomorphism density t(F,G).
▪ Frequency of cliques, lower bounds: Moon-Moser !'"#, Razborov #$$)▪ Frequency of cliques, upper bounds: Kruskal-Katona Theorem▪ Frequency of trees: Sidorenko Conjecture (‘Theorem for trees’)▪ Also linear relationships across sizes.▪ => Linear Program!
[Borgs et al. $%%&, Lovasz $%%']
▪ A proposition for all subgraphs:
Proposition. For every k, there exist constants ✏ and n0 such that the following
holds. If F is a k-node subgraph that is not a clique and not empty, and G is
any graph on n � n0 nodes, then s(F,G) < 1� ✏.
Extremal graph theory
▪ How do different audience graphs differ?
Audience graph classification
20 50 100 200 500 1000
0.05
0.10
0.20
0.50
1.00
size
Aver
age
edge
den
sity
NeighborhoodsNeighborhoods + egoGroupsEvents
40075
▪ How do different audience graphs differ?
▪ Classification challenges A) (*-node neigh. vs. (*-node events B) &$$-node neigh. vs. &$$-node groups
Audience graph classification
20 50 100 200 500 1000
0.05
0.10
0.20
0.50
1.00
size
Aver
age
edge
den
sity
NeighborhoodsNeighborhoods + egoGroupsEvents
40075
▪ How do different audience graphs differ?
▪ Classification challenges A) (*-node neigh. vs. (*-node events B) &$$-node neigh. vs. &$$-node groups
▪ Features: Quad frequencies : (", / (", accuracy Global features: "', / (", accuracy Quad frequencies + Global features: )!, / )#, accuracy
Audience graph classification
20 50 100 200 500 1000
0.05
0.10
0.20
0.50
1.00
size
Aver
age
edge
den
sity
NeighborhoodsNeighborhoods + egoGroupsEvents
40075
▪ Subgraph frequencies usefully characterize social graphs, have extremal limits!
▪ Edge Formation Random Walk model of dense social graphs:
▪ Homomorphism density bounds yield subgraph density bounds:
Conclusions
6�
�4�
�
2�
2�
3�
3�
2�
�4�
�
�+�
2�
3(�+�)
6�3�
�
4�2(�+�)
4�2(�+
�)
2(�+2�)�+�
�
�
2��