Yandex wg-talk
Post on 29-Jun-2015
638 Views
Preview:
DESCRIPTION
Transcript
Random graph process models of largenetworks
Colin CooperDepartment of Informatics
King’s College London
28th October 2013Yandex
Random graph processGraph process: at each step the existing graph is modified bymaking a small number of structural changes, e.g.
I Add a new vertex with edges incident to existing graphI Add edges within the existing graphI Delete some edges or verticesI Exchange some existing edges for others
If these changes are random then some asymptotic structuralproperties may emerge as the process evolves. For example
The degree sequence has a power law with parameter γ
Outline
Introduction
Various web graph models
Degree distribution: Undirected model
Hub-Authority model: Directed
Web-graphs of increasing degree
Experimental studiesLarge-scale dynamic networks such as the Internet and theWorld Wide Web
I Barabási and Albert, Emergence of scaling in randomnetworks, (1999).
I Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata,Tomkins and Wiener, Graph Structure in the Web, (2000).
I M. Faloutsos and P. Faloutsos and C. Faloutsos, OnPower-law Relationships of the Internet Topology, (1999)
Power law degree sequenceProportion of vertices of a given degree k follows anapproximate inverse power law
nk ∼ Ck−γ
for some constants C, γVarious explanatory models e.g.
I Bollobás, Riordan, Spencer and Tusnády, The degreesequence of a scale-free random graph process, (2001)
I Aiello, Chung and Lu, A random graph model for massivegraphs, (2000)
I Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins andUpfal. Stochastic models for the web graph, (2000)
I Dorogovtsev, Mendes and Samukhin, Structure of growingnetworks with preferential linking (2000)
Preferential attachmentOne approach: generate graphs via a preferential attachmentPA: attach to a vertex proportional to degreePA gives a power law distribution parameter γ = 3
The preferential attachment model dates back to Yule
G. Yule. A mathematical theory of evolution based on theconclusions of Dr. J.C. Willis, Philosophical Transactions of theRoyal Society of London (Series B) (1924).
Yule model: Random tree. Each point independentlygenerates children with rate 1 in time interval ∆t . Early pointshave most children
PA was proposed as a random graph model for the web byBarabási and Albert. Emergence of scaling in randomnetworks, (1999)
Publications relevant to this talkCooper and Frieze, A general model of web graphs, RSA(2003)An analysis of the recurrence for the expected number ofvertices of degree k , combined with concentration results andbounds for maximum degree.Uses Laplace’s method to solve recurrences with rationalcoefficients
Cooper. The age specific degree distribution of web-graphs,CPC (2006)Derives degree distribution directly, and uses this to obtainexpected number of vertices of degree k
Cooper, Pralat. Scale-free graphs of increasing degree, RSA(2011)Adapts the degree distribution method to obtain results forgrowth model
Web-graph modelsSimple undirected or directed process models where a mixtureof vertices and edges are added at each step eitherpreferentially or uniformly at randomFor undirected web-graph processes, as the degree k tends toinfinity, the expected proportion of vertices of degree k tends toNk ∝ k−γ . The power law parameter is given by
γ = 1 + 1/η.
Here η is the limiting ratio of the expected number of edgeendpoints inserted in the process by preferential attachment tothe expected total degreeThe maximum degree ∆ in this model is a.s.
∆ = O (nη)
where n is the number of verticesSurprisingly, these results seem to hold for other types ofprocess model and can be useful as a general heuristic
Some examples of the power law heuristicStandard preferential attachment: Make G(t) from G(t − 1) byadding a new vertex vt with (an average of) m neighbourschosen preferentially from G(t − 1)
η =m2m
=12
Power law γ = 1 +1η
= 1 + 2 = 3
Maximum degree ∆ = O(
n1/2)
Experimental evidence PA modelRapid convergence for PA graphs γ = 320,000 vertices is enough (see light blue plot data)
Thanks to Yiannis Siantos for the figure
Non-standard triangle closing modelMake G(t) from G(t − 1) by adding a new vertex vtwith one neighbour u chosen u.a.r from G(t − 1)and one edge from vt to a random neighbour w of u
Pr(w chosen) ∝ d(w)
One edge in 4 is chosen preferentially
Proportion of edges added preferentially is
η =14
So heuristically
Power law γ = 1 +1η
= 1 + 4 = 5
Maximum degree ∆ = O(
n1/4)
Experimentally this seems to be true in the limit (see next slide)The model seems difficult to analyze formally
Heuristic gives no information on convergence rateSlow convergence: Large experiments up to 4× 108 verticesStill not quite arrived at γ = 5, ∆ = O
(n1/4)
Thanks to Yiannis Siantos for the figure
Web-graph model generative choices
Web-graph model: Power law degree sequenceFor undirected web-graph process, as the vertex degree ktends to infinity, the expected proportion of vertices of degree ktends to Nk ∝ k−γ . The power law parameter is given by
γ = 1 + 1/η
where η is the limiting ratio of the expected number of edgeendpoints inserted by preferential attachment to the expectedtotal degree
Any γ > 2 can be obtained by suitable choices of parameters
Undirected Web-graph model parameters
I At each step either NEW vertex (+edges) is added withprobability αor extra edges added between OLD vertices with prob.β = 1− αFor convenience edges are regarded as "directed out" fromnew vertex
I The number of edges is sampled from a distributiondepending on the choice made (NEW, OLD)
I Each edge endpoint makes independent UAR or PAchoices:
A. New vertex v , choice for edges directed OUT from vB. Old vertex v , choice for extra edge directed OUT from vC. Old vertex v , choice for extra edge directed IN to v
Undirected model continuedNEW procedure.All edges are "directed out" from new vertex.Each edge of v chooses independently using probabilitymixture (parameter A)
Pr(w is selected) = A1d(w , t)2|E(t)|
+ A21|V (t)|
where ∑w
Pr(w is selected by ei) = A1 + A2 = 1
In all OLD cases Z = A,B,C we have
pZ (v , t) = Z1d(v , t − 1)
2|E(t − 1)|+ Z2
1|V (t − 1)|
Result of these choicesI At each step with prob. α, NEW vertex (+edges) is added,
with prob. β = 1− α extra edges are added between OLDvertices
I The number of edges m,M (NEW, OLD) sampled from aprobability distribution. Expected number of edges m,M
I A. New vertex v , edges directed OUT from vB. Old vertex v , edges directed OUT from vC. Old vertex v , edges directed IN to v
I Degree distribution depends on two parameters η, ν
PA η =αmA1 + βM(B1 + C1)
2(αm + βM)
UAR ν =αmA2 + βM(B2 + C2)
α
Degree distribution: Undirected model
η =αmA1 + βM(B1 + C1)
2(αm + βM)PA
ν =αmA2 + βM(B2 + C2)
αUar
Vertex v of initial degree m added at step vDistribution of degree d(v , t), of v at step t
P(d(v , t) = m+`|m) ∼(`+ m + ν
η − 1`
) (vt
)mη+ν (1−
(vt
)η)`Assumes t →∞ and v is added after time v0 →∞, and` = o(t1/4)
Illustration: Pr (degree increases by 2)Prob. of change p, no change q at step t
p(j , t) ∼ η(m + j)t
+ν
tq(j , t) = 1− p(j , t)
Change points τ1, τ2
v | − − −−−−|τ1 −−−−−−|τ2 −−−−−−−−|t
Prob of exactly 2 changes at τ1, τ2
q(0, v + 1) · · · q(0, τ1 − 1)p(0, τ1) first change at τ1
×q(1, τ1 + 1) · · · q(1, τ2 − 1)p(1, τ2) second change at τ2
×q(2, τ2 + 1) · · · q(2, t) no further changes
This evaluates to
F (τ1, τ2) ∼ ((ηm+ν)(η(m+1)+ν))(v
t
)m+ν(ητη−1
1tη
)(ητη−1
2tη
)
This evaluates to
F (τ1, τ2) ∼ ((ηm+ν)(η(m+1)+ν))(v
t
)m+ν(ητη−1
1tη
)(ητη−1
2tη
)
Add over all possible τ1, τ2
∑F (τ1, τ2) ∼ (ηm+ν)(η(m+1)+ν)
2!
(vt
)m+ν(∫ t
v
(ητη−1
tη
)dτ)2
∼ (ηm+ν)(η(m+1)+ν)
2!
(vt
)m+ν (1−
(vt
)η)2
From deg. distn we can obtain..I n(` | m) expected proportion of vertices of degree m + `
n(` | m) =((`+ m − 1)η + ν) · · · (mη + ν)
((`+ m)η + ν + 1) · · · (mη + ν + 1)
I Proportion, Nt (` | m) of vertices of degree m + `concentrated around n(` | m) provided t →∞, and ` nottoo large
I As `→∞, n(` | m) ∼ K `−(1+1/η)
Range of η is 0 < η < 1. Power law coefficient γ ≥ 2
η =αmA1 + βM(B1 + C1)
2(αm + βM)
I As η → 0. Geometric degree sequence random graph
limη→0
nη(` | m) ∼ 1ν + 1
(ν
ν + 1
)`
Hub-Authority model: DirectedHub: Vertex with a lot of edges directed out (opinionated page)Authority: Vertex with a lot of edges directed in (popular page)The initial in- and out-degree is given by a distribution (P−,P+)
How does a new vertex v added at step t + 1 choose itsIN-neighbours?
Pr(w points to v) = D1d+(w , t)|E(t)|
+ D21|V (t)|
It is most likely a hub vertex will point an edge to vHow does a new vertex added at step t + 1 choose itsOUT-neighbours?
Pr(v points to w) = A1d−(w , t)|E(t)|
+ A21|V (t)|
,
it is most likely v will point to an authority vertex
Results summaryUndirected model
(√
) Age dependent degree distribution(√
) Number of vertices with given degree(√
) Asymptotic degree sequence n(k) ∼ k−x
Hub-Authority model(√
) Age dependent in- and out-degree distribution(√,×) Number of vertices with given in- & out-degree (as an
integral)(√
) Asymptotic degree sequence
n(k , l) ∼ k−x−`−x+
, x = x(k , `)
General Directed model(×) The in- and out-degree distribution is not obtainableexplicitlySum of path dependent integrals (order of events matters)
Directed model. Definition onlyIn general, the choice type can be made on a mixture of IN andOUT degreeE.g. How does a new vertex added at step t choose itsOUT-neighbours?
Pr(v points to w) =
A(1,+)d+(w , t − 1)
|E(t − 1)|+ A(1,−)
d−(w , t − 1)
|E(t − 1)|+ A2
1|V (t − 1)|
,
whereA(1,+) + A(1,+) + A2 = 1
An in-degree of 2 at w could be made up of various choices(++), (+−), (−+), (−−) at w by subsequent vertices t > w
Results: Hub-Authority modelDegree distribution: Explicit distribution (similar to undirected)
Power law: Number of vertices n(r , s) of in-degree r ,out-degree s is of the form
n(r , s | m−,m+) = Cr ,sr−x−s−x+
The parameters x−, x+ depend on the relative sizes of r , sThey change as s increases from 1 to s = Θ(rη
+/η−)Functional form x = f (η+, η−, ν,m+,m−) quotient
η+, η− are the preferential attachment parametersThe parameter η− is the limiting ratio of the expected number ofedges whose terminal vertex was chosen by preferentialattachment, to the expected number of edges of the process
η− =αm+A1 + βMC1
αm+ + γm− + βM
How does degree sequence differ from Undirected?
Pr(d−(v , t) = r ,d+(v , t) = s) ∼ Pr(d−(v , t) = r)Pr(d+(v , t) = s)
Expected proportion of vertices of degree (r , s)
n(r , s) = Cr−(1−ξ−)s−(1−ξ
+)J(r , s)
where ξ+ = m+ + ν+/η+ and
J(r , s) =
∫ 1
0xa(1− x)r (1− xb)sdx
where b = η+/η− and a = η+/η− ξ+ + 1/η− + ξ− − 1
Asymptotics for J(r , s) depend on relative sizes of r , s
Increasing degree model: Preferential AttachmentCan we escape from power law γ = 3 by increasing thenumber of edges added at each step?At each step t add NEW vertex with f (t) edges
f (t) = [tc], 0 < c < 1
For k � tc the power law we get is
nk = C(
t
k3−c1+c
) 1+c1−c
Need c > 0 constant to escape power law γ = 3 given by PAmodelsWhen c = 1 all vertices have degree ∼ t so no power lawanymoreFor 0 < c < 1 the power law is γ(c) = 1 + 2/(1− c) > 3
Concluding remarksGood points of web-graph model
I Method works well for undirected modelsI Provides a heuristic for predicting degree sequence power
law and maximum degree in unrelated modelsI Generalizes to hypergraph models (not covered in this talk)I If 1 ≤ m(t) = to(1) edges added at step t , power law is 3
Not so good points of web-graph modelI Directed models less pleasing, as power law varies as a
function of relative sizes of in-degree and out-degreeI General directed model: no closed form for degree
distribution?I Model does not explain/predict power laws with parameterγ < 2 (As η ≤ 1 it must be that γ = 1 + 1/η ≥ 2)
THANK YOU
QUESTIONS
top related