CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu
CS224W: Social and Information Network AnalysisJure Leskovec Stanford UniversityJure Leskovec, Stanford University
http://cs224w.stanford.edu
Power‐law degree Power law degree distributions
How do power‐law degreep gnetworks look like?
Function is scale free if:f(ax) = c f(x)
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
Random network(Erdos‐Renyi random graph)
Scale‐free (power‐law) network
In Preferential Attachment model power‐law In Preferential Attachment model power law degrees naturally emerge [Albert‐Barabasi ‘99] Nodes arrive in orderNodes arrive in order A new node j creates m out‐links Prob. of linking to a node i is proportional to its g p pdegree di:
Note: Pref Attachment is not the only model toidijP )(
Note: Pref. Attachment is not the only model to generate power‐law networks What are other mechanisms giving power‐law What are other mechanisms giving power‐law degree networks?
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
Preferential Preferential attachment: Power‐lawPower law degree distributions Node degrees:
But no local clustering
Can we get multiple
Clustering coefficient:
properties?
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4
Preferential attachment is a model of a Preferential attachment is a model of a growing network
What governs the network What governs the network growth and evolution? P1) Node arrival process: P1) Node arrival process: When nodes enter the network
P2) Edge initiation process: P2) Edge initiation process: Each node decides when to initiate an edge
P3) Edge destination process: P3) Edge destination process: The node determines destination of the edge
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5
[Leskovec et al. KDD 08]
4 online social networks with 4 online social networks with exact edge arrival sequence For every edge (u,v) we know exactFor every edge (u,v) we know exact time of the appearance tuv
Directly observe mechanisms leadingand so on for millions… Directly observe mechanisms leading
to global network properties
(F)(D)(A)
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
(A)(L)
(F) (D)(F) (D)
Flickr: Exponential
Delicious: Linear
(A) (L)
Answers: S b li
LinkedIn: Q d ti
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Sub‐linear Quadratic
10/27/2010 7
How long do nodes live? How long do nodes live? Node life‐time is the time between the 1st and the last edge of a nodelast edge of a node
How often nodes “wake up” to create edges? How often nodes wake up to create edges?
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
Lifetime aLi k dI Lifetime a: time between node’s first d l t d
LinkedInand last edge
Node lifetime is exponential: p(a) = λ exp(‐λa) 10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9
How often nodes “wake up” to create edges? How often nodes wake up to create edges? Edge gap δ(d): time between dth and d+1st edge of a node:of a node: Let ti(d) be the creation time of d‐th edge of node i δi(d) = ti(d+1) ‐ ti(d)δi(d) ti(d ) ti(d) Then δ(d) is a distribution (histogram) of δi(d) over all nodes i
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10
Edge gap δ(d):Edge gap δ(d): inter‐arrival time between
LinkedIndth and d+1stedge
For every dwe get a different plotp
epg ),;(10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11
As the degree of the node degree increases As the degree of the node degree increases, how α and β change?
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
α is const, β linear in d – gaps get smaller with dα is const, β linear in d gaps get smaller with dd
g edp ),,;(
tyProba
bilit
Degreed=1d=3 d=2
P
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Edge gap13
Source node i wakes up and creates an edge Source node i wakes up and creates an edge How does i select a target node j? What is the degree of the target j? What is the degree of the target j? Do preferential attachment really hold?
How many hops away if the target j?How many hops away if the target j? Are edges attaching locally?
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14
[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]
Are edges more likel to connect to higher Are edges more likely to connect to higher degree nodes?
kk)( kkpe )(Gnp
PANetwork τ
G
Flickr
Gnp 0
PA 1
Flickr 1
Delicious 1
Answers 0.9
LinkedIn 0.6
1510/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]
Just before the edge (u w) is placed how many Just before the edge (u,w) is placed how many hops is between u and w?
Fraction of triad closing edges
Network %Δ
Flickr 66%
closing edges
GnpPA
Delicious 28%
Answers 23%
k dFli k
ww
LinkedIn 50%
Real edges are local
Flickr
uuww
vvReal edges are local.
Most of them close triangles!1610/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
New triad closing edge (u w) appears next New triad‐closing edge (u,w) appears next We model this as:1 Ch ’ i hb w
v’1. Choose u’s neighbor v2. Choose v’s neighbor w
( )
uv
3. Connect (u,w) Compute edge prob. under Random‐R d ( )Random: p(u,w) =
“S ” f h ( )“Score” of a graph = p(u,w)
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/27/2010 17
Impro ement o er the baseline Improvement over the baseline:Strategy to select v (1st node)
ode)
t w (2
ndno
Select
Strategies to pick a neighbor:Strategies to pick a neighbor: random: uniformly at random deg: proportional to its degree com: prop. to the number of common friends u
wv
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
last: prop. to time since last activity comlast: prop. to com*last
v
10/27/2010 18
[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]
Theorem: Exponential node lifetimes andpower‐law with exponential cutoff edge gaps lead to power‐law degree distributions
Interesting as temporal behavior predicts structural network property
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19
[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]
Node lifetime: pl(a) = pl( ) Node of life‐time a, what is its final degree D?
What is distribution of D as a func. of ,,?
The 2 exp funcs “cancel”. Power‐law survives10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20
The model of network evolution The model of network evolution
Process Model
P1) Node arrival • Node arrival function is given
P2) Edge initiation• Node lifetime is exponential• Edge gaps get smaller as the d idegree increases
P3) Edge destinationPick edge destination using
d d
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
3) grandom‐random
10/27/2010 21
Given the model one can take an existing Given the model one can take an existing network continue its evolution
Compare true and predicted degree exponent: p
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22
How do networks evolve at the macro level? How do networks evolve at the macro level? What are global phenomena of network growth?
Questions: What is the relation between the number of nodesWhat is the relation between the number of nodes
n(t) and number of edges e(t) over time t? How does diameter change as the network grows?g g How does degree distribution evolve as the network grows?
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23
[Leskovec et al. KDD 05]
N(t) nodes at time t N(t) … nodes at time t E(t) … edges at time t Suppose that Suppose that
N(t+1) = 2 * N(t) Q: what is Q: what is
E(t+1) =A d bl d! A: over‐doubled! But obeying the Densification Power Law
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24
[w/ Kleinberg‐Faloutsos, KDD ’05]
What is the relation between Internet
the number of nodes and the edges over time? a=1.2E(
t)
Prior work assumes: constant average degree over time
N(t)
Networks are denser over time Densification Power Law:
Citations
N(t)
Densification Power Law:a=1.6E(
t)
a … densification exponent (1 ≤ a ≤ 2) N(t)2510/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
[Leskovec et al. KDD 05]
Densification Power Law Densification Power Law the number of edges grows faster than the number of nodes – average degree is increasingg g g
orequivalently
a … densification exponent: 1 ≤ a ≤ 2: a=1: linear growth – constant out‐degree (traditionally assumed) a=2: quadratic growth clique a=2: quadratic growth – clique
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26
[w/ Kleinberg‐Faloutsos, KDD ’05]
Prior models and intuition say InternetPrior models and intuition say that the network diameter slowly grows (like log N, log log N)
diam
eter
d
si e of the graph
ter
size of the graph
Citations
diam
et
Diameter shrinks over time as the network grows the di t b t th d
time
distances between the nodes slowly decrease
2710/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
[Leskovec et al. TKDD 07]
met
er
Erdos‐Renyirandom graphIs shrinking
diameter just a di
amDensification exponent a =1.3
jconsequence of densification?
size of the graph
p
size of the graph
Densifying random graph has increasing diameterThere is more to shrinking diameter
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
diameterThere is more to shrinking diameter than just densification
28
[Leskovec et al. TKDD 07]
Cit tiIs it the degree sequence?Compare diameter of a:
eter
Citations
True network (red) Random network with
diam
e
the same degree distribution (blue) size of the graph
Densification + degree sequence h k dgive shrinking diameter
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29
[Leskovec et al. TKDD 07]
How does degree distribution evolve to allow How does degree distribution evolve to allow for densification?
Option 1) Degree exponent is constant: Option 1) Degree exponent is constant: Fact 1: For degree exponent 1< < 2: a = 2/
Email network
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30
[Leskovec et al. TKDD 07]
How does degree distribution evolve to allow How does degree distribution evolve to allow for densification?
Option 2) Exponent evolves with graph size n: Option 2) Exponent n evolves with graph size n: Fact 2:
Citation network
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31
[Leskovec et al. TKDD 07]
Let’s assume the Let’s assume the community structure
One expects many Science Arts
University
One expects many within‐group friendships and fewer CS Math Drama Music
Science Arts
friendships and fewer cross‐group ones
How hard is it to cross communities? Self‐similar university
it t tcommunity structure
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32
Assume the cross community linking Assume the cross‐community linking probability of nodes at tree‐distance h is:
where: c ≥ 1 the Difficulty constantwhere: c ≥ 1 … the Difficulty constanth … tree‐distance
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33
n = 2k nodes reside in the leaves of the b‐wayn 2 nodes reside in the leaves of the b way community hierarchy (assume b=2)
Each node then independently creates edgesEach node then independently creates edges based the community hierarchy: f(h)=c-h
How many edges m are in a graph of n nodes? Community tree evolves by a complete new level of nodes being added in each time step
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34
[Leskovec et al. TKDD 07]
l d d h h Claim: Community Guided Attachment graph model, the expected out‐degree of a node is
i lproportional to
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35
[Leskovec et al. TKDD 07]
What is the link prob.: p(u v)=c-h(u,v)What is the link prob.: p(u,v) c What is expected out‐degree of a node x?
How many nodes are at distance h?
A l t Analyze separate cases:
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36
Can also generalize the modelto get power‐law degrees and densification [see TKDD 07]
Claim: The Community Guided Attachment Claim: The Community Guided Attachment leads to Densification Power Law with exponent:p
d ifi ti t a … densification exponent b … community tree branching factor c difficulty constant 1 ≤ c ≤ b c … difficulty constant, 1 ≤ c ≤ b
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37
DPL: DPL:
Gives any non‐integer Densification exponent Gives any non‐integer Densification exponent If c = 1: easy to cross communities Then: a=2 quadratic growth of edges near Then: a=2, quadratic growth of edges – near clique
If c = b: hard to cross communities If c b: hard to cross communities Then: a=1, linear growth of edges – constant out‐degreedegree
10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38
[Leskovec et al. TKDD 07]
But, we do not want to have explicit communities, p Want to model graphs that density and have shrinking diameters
Intuition: How do we meet friends at a party? H d id tif f h iti ? How do we identify references when writing papers?
vw
10/27/2010 39
[Leskovec et al. TKDD 07]
The Forest Fire model has 2 parameters:The Forest Fire model has 2 parameters: p … forward burning probability r … backward burning probabilityh d l The model: Each turn a new node v arrives Uniformly at random chooses an Uniformly at random chooses an “ambassador” w
Flip 2 geometric coins to determine the b f i d t li k f t f llnumber of in‐ and out‐links of w to follow
Fire spreads recursively until it dies New node v links to all burned nodes
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4010/27/2010
Forest Fire generates graphs that densify and Forest Fire generates graphs that densify and have shrinking diameter
E(t)densification diameter
E(t)
1.32m
eter
diam
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41
N(t) N(t)10/27/2010
Forest Fire also generates graphs with Power Forest Fire also generates graphs with Power‐Law degree distribution
i d t din‐degree out‐degree
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42
log count vs. log in-degree log count vs. log out-degree10/27/2010
Fix backward b bilit dprobability r and
vary forward burning probability Clique‐like
graphg p yp
Notice a sharp
graphIncreasingdiameter
Constantdi t
ptransition between sparse and clique‐like graphs
Sparse graph
Decreasing d
diameter
like graphs
Sweet spot is very narrow
diameter
narrow
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4310/27/2010