Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France
Feb 10, 2016
Traffic-driven model of the World-Wide-Web Graph
A. Barrat, LPT, Orsay, FranceM. Barthélemy, CEA, FranceA. Vespignani, LPT, Orsay, France
Outline The WebGraph Some empirical characteristics Various models Weights and strengths Our model:
Definition Analysis: analytics+numerics
Conclusions
The Web as a directed graph
i
jl nodes i: web-pagesdirected links: hyperlinks
in- and out- degrees:
•Small world : captured by Erdös-Renyi graphs
Poisson distribution
<k> = p N
With probability p an edge is established among couple of vertices
Empirical facts
•Small world•Large clustering: different neighbours of a node will likely know each other
1
2
3
n
Higher probability to be connected
=>graph models with large clustering, e.g. Watts-Strogatz 1998
Empirical facts
•Small world•Large clustering•Dynamical network•Broad connectivity distributions
•also observed in many other contexts (from biological to social networks)•huge activity of modeling
Empirical facts
(Barabasi-Albert 1999; Broder et al. 2000; Kumar et al. 2000; Adamic-Huberman 2001; Laura et al. 2003)
Various growing networks models Barabáási-Albert (1999): preferential attachment Many variations on the BA model: rewiring (Tadic
2001, Krapivsky et al. 2001), addition of edges, directed model (Dorogovtsev-Mendes 2000, Cooper-Frieze 2001), fitness (Bianconi-Barabáási 2001), ...
Kumar et al. (2000): copying mechanism Pandurangan et al. (2002): PageRank+pref.
attachment Laura et al. (2002): Multi-layer model Menczer (2002): textual content of web-pages
The Web as a directed graph
i
jl nodes i: web-pagesdirected links: hyperlinks
Broad P(kin) ; cut-off for P(kout)
(Broder et al. 2000; Kumar et al. 2000; Adamic-Huberman 2001; Laura et al. 2003)
Additional level of complexity: Weights and Strengths
i
jLinks carry weights/traffic:
wij
In- and out- strengths
l
Adamic-Huberman 2001: broad distribution of sin
Model: directed network
n i
j (i) Growth
(ii) Strength driven preferential attachment (n: kout=m outlinks)
AND...
“Busy gets busier”
Weights reinforcement mechanism
i
j
n
The new traffic n-i increases the traffic i-j“Busy gets busier”
Evolution equations
(Continuous approximation)
Coupling term
Resolution
Ansatz
supported by numerics:
Results
Approximation
Total in-weight i sini : approximately proportional to the
total number of in-links i kini , times average weight hwi = 1+
Then: A=1+
sin 2 [2;2+1/m]
Measure of A
prediction of
Numerical simulations
Approx of
Numerical simulations
NB: broad P(sout) even if kout=m
Clustering spectrum
i.e.: fraction of connected couples of neighbours of node i
Clustering spectrum
• increases => clustering increases
• New pages: point to various well-known pages, often connected together => large clustering for small nodes
• Old, popular pages with large k: many in-links from many less popular pages which are not connected together => smaller clustering for large nodes
Clustering and weighted clustering
takes into account the relevance of triangles in the global traffic
Clustering and weighted clustering
Weighted Clustering larger than topological clustering:triangles carry a large part of the traffic
AssortativityAverage connectivity of nearest neighbours of i
Assortativity
•knn: disassortative behaviour, as usual in growing networksmodels, and typical in technological networks
•lack of correlations in popularity as measured by the in-degree
Summary Web: heterogeneous topology and traffic Mechanism taking into account interplay between
topology and traffic Simple mechanism=>complex behaviour, scale-free
distributions for connectivity and traffic Analytical study possible Study of correlations: non-trivial hierarchical
behaviour Possibility to add features (fitnesses, rewiring,
addition of edges, etc...), to modify the redistribution rule...
Empirical studies of traffic and correlations?