May 11, 2015
SNA = Complex Network Analysis on Social Networks
Outline
Notation & Metrics
Models
Models Discussion
Conclusion
Degree DistributionPath LengthsTransitivityRandom GraphsSmall-WorldsPreferential Attachment
2
G = (V ,E) E ⊂V 2
(x, x) x ∈V{ }∩E =∅
Network
Adjacency Matrix
Aij =1 if (i,j) ∈E0 otherwise
⎧⎨⎩
Directed Network
A symmetricUndirected Network
kiin = A ji
j∑ki
out = Aijj∑
ki = kiin + ki
out
ki = A jij∑ = Aij
j∑
Degree Distribution
Average Degree
px =1n# i ki = x{ }
k = n−1 kxx∈V∑
3
Local Clustering Coefficient Ci =ki2( )−1T (i)
Clustering Coefficient C = 1n
Cii∈V∑
T(i): # distinct triangles with i as vertex
C =number of closed paths of length 2( )
number of paths of length 2( ) =number of triangles( )× 3
number of connected triples( )
Measure of Transitivity
4
Sk (M) =M + .^ Mk ^ .+Mk( )
Set of Adjacency Matrices
A,+,⋅( )AB = A + .⋅B The matrix product depends from
the operations of the semi-ring
Other matrix products make sense: e.g., or A,+,^( ) A,^,+( )
min
We consider:
L = Sn …S1( ) M( )Shortest path lengths matrix:
Shortest Path Length and Diameter
Diameter: d = maxijL Average shortest path: = Lij
scalar operations
AB[ ]ij = Aik ⋅Bkjk∑
5
Computational Complexity of ASPL:
O n3+α( ) α ≈ 3 / 4All pairs shortest path matrix based (parallelizable):
All pairs shortest path Bellman-Ford: O n3( )All pairs shortest path Dijkstra w. Fibonacci Heaps: O n2 logn + nm( )
x = Mq (S)
Computing the CPL
q#S elements are ≤ than x and (1-q)#S are > than x
x = Lqδ (S) q#S(1-δ) elements are ≤ than x and (1-q)#S(1-δ) are > than x
s = 2q2ln 2
1−δ( )2δ 2
Huber Algorithm
Let R a random sample of S such that #R=s, then Lqδ(S) = Mq(R) with probability p = 1-ε.
6
s = 2q2ln 2
1−δ( )2δ 2
7
1
10
100
1000
10000
100000
1000000
10000000
1 10 100 1000
Facebook Hugs Degree Distribution
Nodes: 1322631 Edges: 1555597m/n: 1.17 CPL: 11.74Clustering Coefficient: 0.0527Number of Components: 18987Isles: 0Largest Component Size: 1169456
8For small k power-laws do not hold
For large k we have statistical fluctuations
0.1
1
10
100
1000
10000
100000
1000000
1 10 100 1000
Power-Law: ! gamma=3
Many networks have power-law degree distribution. pk ∝ k−γ γ >1• Citation networks
• Biological networks
• WWW graph
• Internet graph
• Social Networks
9
kr = ?
G(n, p)G(n,m)
p
ppp
pp
pp
p
p Pr(Aij = 1) = p
Erdös-Rényi Random Graphs
Ensembles of Graphs
When describe values of properties, we actually the expected value of the property
d := d = Pr(G) ⋅d(G)G∑ ∝ logn
log kPr(G) = pm (1− p)
n2( )−m
m =n2
⎛⎝⎜
⎞⎠⎟p k = (n −1)p
pk =n −1k
⎛⎝⎜
⎞⎠⎟pk (1− p)n−1−k
C = k (n −1)−1
Connectedness Threshold logn / n
pk = e− k k k
k!n→∞ 10
p
Watts-Strogatz Model
11
In the modified model, we only add the edges.
ki =κ + si
Edges in the lattice # added
shortcuts
ps = e−κ s κ p( )s
s!
pk = e−κ s κ p( )k−κ
k −κ( )!
C = 3(κ − 2)4(κ −1)+ 8κ p + 4κ p2
≈ log(npκ )
κ 2p
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Strogatz-Watts Model - 10000 nodes k = 4
CPL(p)/CPL(0)C(p)/C(0)
CP
L(p)
/CP
L(0)
C(p
)/C(0
)
pShort CPLThreshold Large Clustering Coefficient
Threshold12
Matt Britt ©13
BARABASI-ALBERT-MODEL(G,M0,STEPS) FOR K FROM 1 TO STEPS N0 ← NEW-NODE(G) ADD-NODE(G,N0) A ← MAKE-ARRAY() FOR N IN NODES(G) PUSH(A, N) FOR J IN DEGREE(N) PUSH(A, N) FOR J FROM 1 TO M N ← RANDOM-CHOICE(A) ADD-LINK (N0, N)
Barabási-Albert Model
Pr(V = x ) = Pr(E = e) =e∈N (x )∑
= kxm
= 2kxkx
x∑
pk ∝ x−3
No analytical proof available
≈ logn
log logn
C ≈ n−3/4
Scale-free entailsshort CPL
Transitivity disappearswith network size
Connectedness Threshold
lognlog logn
14
OSN Refs. Users Links <k> C CPL
d γ r
Club NexusCyworldCyworld TLiveJournalFlickrTwitterOrkutOrkutYoutubeFacebookFB HFB GLBrightKiteFourSquareLiveJournalTwitterTwitter
Adamic et al 2.5 K 10 K 8.2 0.2 4 13 n.a. n.a.Ahn et al 12 M 191 M 31.6 0.2 3.2 16 -0.1Ahn et al 92 K 0.7 M 15.3 0.3 7.2 n.a. n.a. 0.4
Mislove et al 5 M 77 M 17 0.3 5.9 20 0.2Mislove et al 1.8 M 22 M 12.2 0.3 5.7 27 0.2Kwak et al 41 M 1700 M n.a. n.a. 4 4.1 n.a.
Mislove et al 3 M 223 M 106 0.2 4.3 9 1.5 0.1Ahn et al 100 K 1.5 M 30.2 0.3 3.8 n.a. 3.7 0.3
Mislove et al 1.1 M 5 M 4.29 0.1 5.1 21 -0Gjoka et al 1 M n.a. n.a. 0.2 n.a. n.a. 0.23Nazir et al 51 K 116 K n.a. 0.4 n.a. 29 n.a.Nazir et al 277 K 600 K n.a. 0.3 n.a. 45 n.a.
Scellato et al 54 K 213 K 7.88 0.2 4.7 n.a. n.a.Scellato et al 58 K 351 K 12 0.3 4.6 n.a. n.a.Scellato et al 993 K 29.6 M 29.9 0.2 4.9 n.a. n.a.
Java et al 87 K 829 K 18.9 0.1 n.a. 6 0.59Scellato et al 409 K 183 M 447 0.2 2.8 n.a. n.a.
15
• Moreover:
• Mostly no navigability
• Uniformity assumption
• Sometimes too complex for analytic study
• Few features studied
• Power-law?
16
Static Deg C Rigid
ER
WS
BA
Yes Poisson Low -
Yes Poisson Ok Yes
No PL γ=3 Fixable Yes
Alternative models for degree distributionsPower-laws are difficult to fit.When they do, there are often better distributions.
Power-law with cutoff almost always fits better than plain power-law.
f (x;γ ,β ) = x−γ eβx
Sometimes the log-normal distribution is more appropriate
f (x;σ ,m) = 1xσ (2π )1/2
exp − log(x /m)( )22σ 2
⎛
⎝⎜⎞
⎠⎟
Most of the times random and preferential attachment processes concur
F(x;r) = 1− (rm)1+r (x + rm)−(1+r )r→ 0
scale-free negative exponential dist.
r→∞
17
Nebraska
Kansas
Massachussets
Omaha
Wichita
Boston
6 Degrees
• Random people from Omaha & Wichita were asked to send a postcard to a person in Boston:
• Write the name on the postcard
• Forward the message only to people personally known that was more likely to know the target 18
1st run: 64/296 arrived, most delivered to him by 2 men
2nd run: 24/160 arrived, 2/3delivered by “Mr. Jacobs”
2 ≤ hops ≤ 10; µ=5.x
CPL, hubs, ...... and Kleinberg’s Intuition
Milgram’s Experiment
Biased Preferential AttachmentAt each step:
A new node is added to the network and is assigned to one of thesets P, I and L according to a probability distribution h
e0 ∈+ edges are added to the network
for each edge (u,v) u is chosen with distribution D0 and:
if u ∈ I, v is a new node and is assigned to P;
if u ∈ L, v is chosen according to Dγ.
Dβ (u)∝(β +1)(ku +1) u ∈Lku +1 u ∈I0 u ∈P
⎧⎨⎪
⎩⎪
No analytic results available.19
Transitive Linking Model [Davidsen 02]
I At each step:TL: a random node is chosen, and it introduces two other nodes that
are linked to it; if the node does not have 2 edges, it introduceshimself to a random node
RM: with probability p a node is chosen and removed along its edgesand replaced with a node with one random edge
I When p ⇤ 1 the TL dominates the process:I the degree distribution is a power-law with cutoffI 1 � C = p(⌅k⇧ � 1), i.e., quite large in practice
I For larger values of p the two different process concur to form anexponential degree distribution
I for p ⇥ 1 the degree distribution is essentially a Poissondistribution
Bergenti, Franchi, Poggi (Univ. Parma) Models for Agent-based Simulation of SN SNAMAS ’11 11 / 19
Transitive Linking
Instead of p it would make sense to have distinct p and rparameters for nodes leaving and entering the network
Few analytic results available.20
[1] Dorogovtsev, S. N. and Mendes, J. F. F. 2003 Evolution of Networks: From Biological Nets to the Internet and WWW (Physics). Oxford University Press, USA.
[2] Watts, D. J. 2003 Small Worlds: The Dynamics of Networks between Order and Randomness (Princeton Studies in Complexity). Princeton University Press.
[3] Jackson, M. O. 2010 Social and Economic Networks. Princeton University Press.[4] Newman, M. 2010 Networks: An Introduction. Oxford University Press, USA.[5] Wasserman, S. and Faust, K. 1994 Social Network Analysis: Methods and Applications
(Structural Analysis in the Social Sciences). Cambridge University Press.[6] Scott, J. P. 2000 Social Network Analysis: A Handbook. Sage Publications Ltd.[7] Kepner, J. and Gilbert, J. 2011 Graph Algorithms in the Language of Linear Algebra
(Software, Environments, and Tools). Society for Industrial & Applied Mathematics.[8] Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2009 Introduction to
Algorithms. The MIT Press.[9] Skiena, S. S. 2010 The Algorithm Design Manual. Springer.[10] Bollobas, B. 1998 Modern Graph Theory. Springer.[11] Watts, D. J. and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’networks.
Nature. 393, 6684, 440-442.[12] Barabási, A. L. and Albert, R. 1999. Emergence of scaling in random networks. Science.
286, 5439, 509.[13] Kleinberg, J. 2000. The small-world phenomenon: an algorithm perspective. Proceedings of
the thirty-second annual ACM symposium on Theory of computing. 163-170.[14] Milgram, S. 1967. The small world problem. Psychology today. 2, 1, 60-67.
21
Thanks for your kind attention.
Enrico Franchi ([email protected])AOTLAB, Dipartimento Ingegneria dell’Informazione, Università di Parma
22