Stanford University t bt th d time distances between e no es slowly decrease 10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 27 ...

CS224W: Social and Information Network AnalysisJure Leskovec Stanford UniversityJure Leskovec, Stanford University

http://cs224w.stanford.edu

Power‐law degree Power law degree distributions

How do power‐law degreep gnetworks look like?

Function is scale free if:f(ax) = c f(x)

10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

Random network(Erdos‐Renyi random graph)

Scale‐free (power‐law) network

In Preferential Attachment model power‐law In Preferential Attachment model power law degrees naturally emerge [Albert‐Barabasi ‘99] Nodes arrive in orderNodes arrive in order A new node j creates m out‐links Prob. of linking to a node i is proportional to its g p pdegree di:

Note: Pref Attachment is not the only model toidijP )(

Note: Pref. Attachment is not the only model to generate power‐law networks What are other mechanisms giving power‐law What are other mechanisms giving power‐law degree networks?


Preferential Preferential attachment: Power‐lawPower law degree distributions Node degrees:

But no local clustering

Can we get multiple

Clustering coefficient:

properties?


Preferential attachment is a model of a Preferential attachment is a model of a growing network

What governs the network What governs the network growth and evolution? P1) Node arrival process: P1) Node arrival process: When nodes enter the network

P2) Edge initiation process: P2) Edge initiation process: Each node decides when to initiate an edge

P3) Edge destination process: P3) Edge destination process: The node determines destination of the edge


[Leskovec et al. KDD 08]

4 online social networks with 4 online social networks with exact edge arrival sequence For every edge (u,v) we know exactFor every edge (u,v) we know exact time of the appearance tuv

Directly observe mechanisms leadingand so on for millions… Directly observe mechanisms leading

to global network properties

(F)(D)(A)


(A)(L)

(F) (D)(F) (D)

Flickr: Exponential

Delicious: Linear

(A) (L)

Answers: S b li

LinkedIn: Q d ti

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Sub‐linear Quadratic

10/27/2010 7

How long do nodes live? How long do nodes live? Node life‐time is the time between the 1st and the last edge of a nodelast edge of a node

How often nodes “wake up” to create edges? How often nodes wake up to create edges?


Lifetime aLi k dI Lifetime a: time between node’s first d l t d

LinkedInand last edge

Node lifetime is exponential: p(a) = λ exp(‐λa) 10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

How often nodes “wake up” to create edges? How often nodes wake up to create edges? Edge gap δ(d): time between dth and d+1st edge of a node:of a node: Let ti(d) be the creation time of d‐th edge of node i δi(d) = ti(d+1) ‐ ti(d)δi(d) ti(d ) ti(d) Then δ(d) is a distribution (histogram) of δi(d) over all nodes i


Edge gap δ(d):Edge gap δ(d): inter‐arrival time between

LinkedIndth and d+1stedge

For every dwe get a different plotp

epg ),;(10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

As the degree of the node degree increases As the degree of the node degree increases, how α and β change?


α is const, β linear in d – gaps get smaller with dα is const, β linear in d gaps get smaller with dd

g edp ),,;(

tyProba

bilit

Degreed=1d=3 d=2

P

10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Edge gap13

Source node i wakes up and creates an edge Source node i wakes up and creates an edge How does i select a target node j? What is the degree of the target j? What is the degree of the target j? Do preferential attachment really hold?

How many hops away if the target j?How many hops away if the target j? Are edges attaching locally?


[w/ Backstrom‐Kumar‐Tomkins, KDD ’08]

Are edges more likel to connect to higher Are edges more likely to connect to higher degree nodes?

kk)( kkpe )(Gnp

PANetwork τ

G

Flickr

Gnp 0

PA 1

Flickr 1

Delicious 1

Answers 0.9

LinkedIn 0.6



Just before the edge (u w) is placed how many Just before the edge (u,w) is placed how many hops is between u and w?

Fraction of triad closing edges

Network %Δ

Flickr 66%

closing edges

GnpPA

Delicious 28%

Answers 23%

k dFli k

ww

LinkedIn 50%

Real edges are local

Flickr

uuww

vvReal edges are local.

Most of them close triangles!1610/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

New triad closing edge (u w) appears next New triad‐closing edge (u,w) appears next We model this as:1 Ch ’ i hb w

v’1. Choose u’s neighbor v2. Choose v’s neighbor w

( )

uv

3. Connect (u,w) Compute edge prob. under Random‐R d ( )Random: p(u,w) =

“S ” f h ( )“Score” of a graph = p(u,w)

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/27/2010 17

Impro ement o er the baseline Improvement over the baseline:Strategy to select v (1st node)

ode)

t w (2

ndno

Select

Strategies to pick a neighbor:Strategies to pick a neighbor: random: uniformly at random deg: proportional to its degree com: prop. to the number of common friends u

wv


last: prop. to time since last activity comlast: prop. to com*last

v

10/27/2010 18


Theorem: Exponential node lifetimes andpower‐law with exponential cutoff edge gaps lead to power‐law degree distributions

Interesting as temporal behavior predicts structural network property



Node lifetime: pl(a) = pl( ) Node of life‐time a, what is its final degree D?

What is distribution of D as a func. of ,,?

The 2 exp funcs “cancel”. Power‐law survives10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

The model of network evolution The model of network evolution

Process Model

P1) Node arrival • Node arrival function is given

P2) Edge initiation• Node lifetime is exponential• Edge gaps get smaller as the d idegree increases

P3) Edge destinationPick edge destination using

d d


3) grandom‐random

10/27/2010 21

Given the model one can take an existing Given the model one can take an existing network continue its evolution

Compare true and predicted degree exponent: p


How do networks evolve at the macro level? How do networks evolve at the macro level? What are global phenomena of network growth?

Questions: What is the relation between the number of nodesWhat is the relation between the number of nodes

n(t) and number of edges e(t) over time t? How does diameter change as the network grows?g g How does degree distribution evolve as the network grows?



N(t) nodes at time t N(t) … nodes at time t E(t) … edges at time t Suppose that Suppose that

N(t+1) = 2 * N(t) Q: what is Q: what is

E(t+1) =A d bl d! A: over‐doubled! But obeying the Densification Power Law


[w/ Kleinberg‐Faloutsos, KDD ’05]

What is the relation between Internet

the number of nodes and the edges over time? a=1.2E(

t)

Prior work assumes: constant average degree over time

N(t)

Networks are denser over time Densification Power Law:

Citations

N(t)

Densification Power Law:a=1.6E(

t)

a … densification exponent (1 ≤ a ≤ 2) N(t)2510/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu


Densification Power Law Densification Power Law the number of edges grows faster than the number of nodes – average degree is increasingg g g

orequivalently

a … densification exponent: 1 ≤ a ≤ 2: a=1: linear growth – constant out‐degree (traditionally assumed) a=2: quadratic growth clique a=2: quadratic growth – clique


[w/ Kleinberg‐Faloutsos, KDD ’05]

Prior models and intuition say InternetPrior models and intuition say that the network diameter slowly grows (like log N, log log N)

diam

eter

d

si e of the graph

ter

size of the graph

Citations

diam

et

Diameter shrinks over time as the network grows the di t b t th d

time

distances between the nodes slowly decrease


[Leskovec et al. TKDD 07]

met

er

Erdos‐Renyirandom graphIs shrinking

diameter just a di

amDensification exponent a =1.3

jconsequence of densification?

size of the graph

p

size of the graph

Densifying random graph has increasing diameterThere is more to shrinking diameter


diameterThere is more to shrinking diameter than just densification

28


Cit tiIs it the degree sequence?Compare diameter of a:

eter

Citations

True network (red) Random network with

diam

e

the same degree distribution (blue) size of the graph

Densification + degree sequence h k dgive shrinking diameter



How does degree distribution evolve to allow How does degree distribution evolve to allow for densification?

Option 1) Degree exponent is constant: Option 1) Degree exponent is constant: Fact 1: For degree exponent 1< < 2: a = 2/

Email network



How does degree distribution evolve to allow How does degree distribution evolve to allow for densification?

Option 2) Exponent evolves with graph size n: Option 2) Exponent n evolves with graph size n: Fact 2:

Citation network



Let’s assume the Let’s assume the community structure

One expects many Science Arts

University

One expects many within‐group friendships and fewer CS Math Drama Music

Science Arts

friendships and fewer cross‐group ones

How hard is it to cross communities? Self‐similar university

it t tcommunity structure


Assume the cross community linking Assume the cross‐community linking probability of nodes at tree‐distance h is:

where: c ≥ 1 the Difficulty constantwhere: c ≥ 1 … the Difficulty constanth … tree‐distance


n = 2k nodes reside in the leaves of the b‐wayn 2 nodes reside in the leaves of the b way community hierarchy (assume b=2)

Each node then independently creates edgesEach node then independently creates edges based the community hierarchy: f(h)=c-h

How many edges m are in a graph of n nodes? Community tree evolves by a complete new level of nodes being added in each time step



l d d h h Claim: Community Guided Attachment graph model, the expected out‐degree of a node is

i lproportional to



What is the link prob.: p(u v)=c-h(u,v)What is the link prob.: p(u,v) c What is expected out‐degree of a node x?

How many nodes are at distance h?

A l t Analyze separate cases:


Can also generalize the modelto get power‐law degrees and densification [see TKDD 07]

Claim: The Community Guided Attachment Claim: The Community Guided Attachment leads to Densification Power Law with exponent:p

d ifi ti t a … densification exponent b … community tree branching factor c difficulty constant 1 ≤ c ≤ b c … difficulty constant, 1 ≤ c ≤ b


DPL: DPL:

Gives any non‐integer Densification exponent Gives any non‐integer Densification exponent If c = 1: easy to cross communities Then: a=2 quadratic growth of edges near Then: a=2, quadratic growth of edges – near clique

If c = b: hard to cross communities If c b: hard to cross communities Then: a=1, linear growth of edges – constant out‐degreedegree



But, we do not want to have explicit communities, p Want to model graphs that density and have shrinking diameters

Intuition: How do we meet friends at a party? H d id tif f h iti ? How do we identify references when writing papers?

vw

10/27/2010 39


The Forest Fire model has 2 parameters:The Forest Fire model has 2 parameters: p … forward burning probability r … backward burning probabilityh d l The model: Each turn a new node v arrives Uniformly at random chooses an Uniformly at random chooses an “ambassador” w

Flip 2 geometric coins to determine the b f i d t li k f t f llnumber of in‐ and out‐links of w to follow

Fire spreads recursively until it dies New node v links to all burned nodes

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4010/27/2010

Forest Fire generates graphs that densify and Forest Fire generates graphs that densify and have shrinking diameter

E(t)densification diameter

E(t)

1.32m

eter

diam

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41

N(t) N(t)10/27/2010

Forest Fire also generates graphs with Power Forest Fire also generates graphs with Power‐Law degree distribution

i d t din‐degree out‐degree

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42

log count vs. log in-degree log count vs. log out-degree10/27/2010

Fix backward b bilit dprobability r and

vary forward burning probability Clique‐like

graphg p yp

Notice a sharp

graphIncreasingdiameter

Constantdi t

ptransition between sparse and clique‐like graphs

Sparse graph

Decreasing d

diameter

like graphs

Sweet spot is very narrow

diameter

narrow

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4310/27/2010

Stanford University t bt th d time distances between e no es slowly decrease 10/27/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, 27 ...

Documents