CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu
CS224W: Social and Information Network AnalysisJure Leskovec, Stanford University
http://cs224w.stanford.edu
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 2
Observations
Smalldiameter,Edgeclustering
Patternsofsignededgecreation
ViralMarketing,Blogosphere,Memetracking
Scale-Free
Densificationpowerlaw,Shrinking diameters
Strengthofweakties,Core-periphery
Models
Erdös-Renyi model,Small-worldmodel
Structuralbalance,Theoryofstatus
Independentcascademodel,Gametheoreticmodel
Preferentialattachment,Copyingmodel
Microscopicmodel ofevolvingnetworks
Kronecker Graphs
Algorithms
Decentralizedsearch
Models forpredictingedgesigns
Influencemaximization,Outbreakdetection,LIM
PageRank,Hubsandauthorities
Linkprediction,Supervised randomwalks
Community detection:Girvan-Newman,Modularity
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 3
Whatdoweobservethatneedsexplaining?¡ Small-worldmodel:§ Diameter§ Clusteringcoefficient
¡ PreferentialAttachment:§ Nodedegreedistribution
§ Whatfractionofnodeshasdegree𝒌 (asafunctionof𝒌)?§ Predictionfromsimplerandomgraphmodels:p(𝒌) = exponentialfunctionof𝒌
§ Observation:Oftenapower-law:𝒑 𝒌 ∝ 𝒌(𝜶
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 4
Expected based on Gnp Found in data
𝑷 𝒌 ∝ 𝒌(𝜶
¡ Takeanetwork,plotahistogramof𝑷(𝒌) vs.𝒌
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 5
Flickr socialnetwork
n= 584,207, m=3,555,115
[Leskovec et al. KDD ‘08]
Plot: fraction of nodes with degree 𝑘:
𝑝(𝑘) =| 𝑢|𝑑0 = 𝑘 |
𝑁
Prob
abilit
y: 𝑝(𝑘)=𝑃(𝑋
=𝑘)
¡ Plotthesamedataonlog-log scale:
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 6
Flickr socialnetwork
n= 584,207, m=3,555,115
[Leskovec et al. KDD ‘08]
How to distinguish:𝑃(𝑘) ∝ exp(−𝑘) vs.𝑃(𝑘) ∝ 𝑘(8 ?
Take logarithms: if 𝑦 = 𝑓(𝑥) = 𝑒(= then log 𝑦 = −𝑥
If 𝑦 = 𝑥(8 then log 𝑦 = −𝛼log(𝑥)
So on log-log axis power-law looks like a straight line of slope −𝛼 !
Slope = −𝛼 = 1.75
Prob
abilit
y: 𝑝(𝑘)=𝑃(𝑋
=𝑘) 𝑃 𝑘 ∝ 𝑘(F.GH
¡ InternetAutonomousSystems[Faloutsos,Faloutsos andFaloutsos,1999]
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 7
Internet domain topology
¡ TheWorldWideWeb[Broder etal.,2000]
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 8
¡ OtherNetworks[Barabasi-Albert,1999]
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 9
Power-gridWeb graphActor collaborations
¡ Aboveacertain𝒙 value,thepowerlawisalwayshigherthantheexponential!
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 10
20 40 60 80 100
0.2
0.6
1
1)( −= cxxp
xcxp −=)(
5.0)( −= cxxp
x
p(x)
¡ Power-lawvs.Exponentialonlog-logandsemi-log(log-lin)scales
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 11
[Clauset-Shalizi-Newman 2007]
semi-log
5.0)( −= cxxp
xcxp −=)(10
100 101 102 103
-4
10-3
10-2
10-1
100
log-log
1)( −= cxxp
xcxp −=)(
5.0)( −= cxxp
1)( −= cxxp
x … lineary … logarithmic
x … logarithmic axisy … logarithmic axis
1 2 3 4 5 6
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 12
¡ Power-lawdegreeexponentistypically2<α <3§ Webgraph:
§ αin=2.1,αout=2.4[Broder etal.00]§ Autonomoussystems:
§ α =2.4[Faloutsos3,99]§ Actor-collaborations:
§ α =2.3[Barabasi-Albert00]§ Citationstopapers:
§ α ≈ 3[Redner 98]§ Onlinesocialnetworks:
§ α ≈ 2[Leskovecetal.07]
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 13
¡ Definition:Networkswithapower-lawtailintheirdegreedistributionarecalled“scale-freenetworks”
¡ Wheredoesthenamecomefrom?§ Scale invariance: There isnocharacteristicscale
§ Scaleinvariance isthatlawsdonotchangeifscalesoflength,energy,orothervariables,aremultipliedbyacommonfactor
§ Scale-freefunction:𝒇 𝒂𝒙 = 𝒂𝝀𝒇(𝒙)§ Power-lawfunction:𝒇 𝒂𝒙 = 𝒂𝝀𝒙𝝀 = 𝒂𝝀𝒇(𝒙)
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 14
Log() or Exp() are not scale free!𝑓 𝑎𝑥 = log 𝑎𝑥 = log 𝑎 + log 𝑥 = log 𝑎 + 𝑓 𝑥𝑓 𝑎𝑥 = exp 𝑎𝑥 = exp 𝑥 O = 𝑓 𝑥 O
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 15
Many other quantities follow heavy-tailed distributions
[Clauset-Shalizi-Newman 2007]
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 16
[Chris Anderson, Wired, 2004]
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 17
CMU grad-students at the G20 meeting in
Pittsburgh in Sept 2009
¡ Degreesareheavilyskewed:Distribution𝑃(𝑋 > 𝑥) isheavytailedif:
𝐥𝐢𝐦𝒙→U
𝑷 𝑿 > 𝒙𝒆(𝝀𝒙
= ∞¡ Note:
§ NormalPDF:𝑝 𝑥 = FYZ[
𝑒(\]^ _
_`_
§ ExponentialPDF:𝑝 𝑥 = 𝜆𝑒(b=
§ then𝑃 𝑋 > 𝑥 = 1− 𝑃(𝑋 ≤ 𝑥) = 𝑒(b=
arenotheavytailed!
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 19
¡ Variousnames,kindsandforms:§ Longtail,Heavytail,Zipf’s law,Pareto’slaw
¡ Heavytaileddistributions:§ P(x)isproportionalto:
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 20
[Clauset-Shalizi-Newman 2007]
𝑃 𝑥 ∝
¡ Whatisthenormalizingconstant?p(x) = Z x-α Z = ?
§ 𝒑(𝒙) isadistribution:∫𝒑 𝒙 𝒅𝒙 = 𝟏Continuous approximation
§ 1 = ∫ 𝑝 𝑥 𝑑𝑥U=g
= 𝑍 ∫ 𝑥(8𝑑𝑥U=g
§ = − i8(F
𝑥(8jF =gU = − i
8(F∞F(8 − 𝑥kF(8
§ ⇒𝑍 = 𝛼 − 1 𝑥k8(F
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 21
[Clauset-Shalizi-Newman 2007]
𝒑 𝒙 =𝜶 − 𝟏𝒙𝒎
𝒙𝒙𝒎
(𝜶
p(x) diverges as x→0 so xm is the minimum
value of the power-law distribution x ∈ [xm, ∞]
xm
Need: α > 1 !
Integral:
n 𝒂𝒙 𝒏 =𝒂𝒙 𝒏j𝟏𝒂(𝒏 + 𝟏)
¡ What’stheexpectedvalueofapower-lawrandomvariableX?
¡ 𝐸 𝑋 = ∫ 𝑥𝑝 𝑥 𝑑𝑥U=g
= 𝑍 ∫ 𝑥(8jF𝑑𝑥U=g
¡ = iY(8
𝑥Y(8 =gU = 8(F =gq]r
((8(Y)[∞Y(8 − 𝑥kY(8]
⇒𝑬 𝑿 =𝜶 − 𝟏𝜶 − 𝟐𝒙𝒎
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 22
[Clauset-Shalizi-Newman 2007]
Need: α > 2 !
Power-law density:
𝑝 𝑥 =𝛼 − 1𝑥k
𝑥𝑥k
(8
𝑍 =𝛼 − 1𝑥kF(8
¡ Power-lawshaveinfinitemoments!
§ If𝛼 ≤ 2 :𝐸[𝑋] = ∞§ If𝛼 ≤ 3 :𝑉𝑎𝑟[𝑋] = ∞
§ Averageismeaningless,asthevarianceistoohigh!¡ Consequence:Sampleaverageofn samplesfromapower-lawwithexponentα
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 23
𝐸 𝑋 =𝛼 − 1𝛼 − 2𝑥k
In real networks2 < α < 3 so:E[X] = constVar[X] = ∞
Estimatingα fromdata:¡ (1)Fitalineonlog-logaxisusingleastsquares:§ Solve𝒂𝒓𝒈𝐦𝐢𝐧
𝜶𝐥𝐨𝐠 𝒚 − 𝜶 𝐥𝐨𝐠 𝒙 + 𝒃 𝟐
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 25
BAD!
Estimatingα fromdata:¡ PlotComplementaryCDF(CCDF)𝑷 𝑿 ≥ 𝒙 .Thentheestimated𝜶 = 𝟏 + 𝜶′where𝜶′ istheslopeof𝑷(𝑿 ≥ 𝒙).
¡ Fact: If𝒑 𝒙 = 𝑷 𝑿 = 𝒙 ∝ 𝒙(𝜶then𝑷 𝑿 ≥ 𝒙 ∝ 𝒙((𝜶(𝟏)
§ 𝑃 𝑋 ≥ 𝑥 = ∑ 𝑝(𝑗)U��= ≈ ∫ 𝑍𝑦(8𝑑𝑦U
= =
§ = iF(8
𝑦F(8 =U = i
F(8𝑥( 8(F
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 26
OK!
Estimatingα fromdata:¡ Usemaximumlikelihoodapproach:§ Thelog-likelihoodofobserveddatadi:
§ 𝐿 𝛼 = ln ∏ 𝑝 𝑑��� = ∑ ln𝑝(𝑑�)�
�
§ = ∑ ln(𝛼 − 1) − ln 𝑥k − 𝛼 ln ��=g
��
§ Wanttofind𝜶 thatmax𝐿(𝜶):Set�� 8�8
= 0
§�� 8�8
= 0⇒ �8(F
− ∑ ln ��=g
�� = 0
§ ⇒𝜶� = 𝟏 + 𝒏 ∑ 𝒍𝒏 𝒅𝒊𝒙𝒎
𝒏𝒊
(𝟏
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 27
Power-law density:
𝑝 𝑥 =𝛼 − 1𝑥k
𝑥𝑥k
(8
OK!
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 28
LinearscaleLogscale,α=1.75
CCDF,Logscale,α=1.75
CCDF,Logscale,α=1.75,
exp. cutoff
¡ Cannotarisefromsumsofindependentevents!§ Recall:in𝑮𝒏𝒑 eachpairofnodesinconnectedindependentlywithprob.𝒑§ 𝑿… degreeofnode 𝒗§ 𝑿𝒘 … eventthat w linksto v§ 𝑿 = ∑ 𝑿𝒘𝒘§ 𝑬 𝑿 = ∑ 𝑬 𝑿𝒘 = 𝒏 − 𝟏 𝒑𝒘
§ Now,whatis𝑷 𝑿 = 𝒌 ? Centrallimittheorem!§ 𝑿𝟏,… , 𝑿𝒏: randomvars withmean µ, variance σ2
§ 𝑺𝒏 = ∑ 𝑿𝒊𝒊 :𝐸 𝑆� = 𝒏𝝁 ,Var 𝑆� = 𝒏𝝈𝟐,SD 𝑆� = 𝝈 𝒏
§ 𝑷 𝑺𝒏 = 𝑬 𝑺𝒏 + 𝒙 ¤ 𝐒𝐃 𝑺𝒏 ~ 𝟏𝟐𝝅𝐞(𝐱
𝟐
𝟐
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 29
Random network Scale-free (power-law) network(Erdos-Renyi random graph)
Degree distribution is Binomial
Degree distribution is Power-law
JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 3011/1/16
¡ Howdoesnetworkconnectivitychangeasnodesgetremoved?[Albertetal.00;Palmeretal.01]
¡ Nodescanberemoved:§ Randomfailure:
§ Removenodesuniformlyatrandom
§ Targetedattack:§ Removenodesinorderofdecreasingdegree
¡ Thisisimportantforrobustnessoftheinternetaswellasepidemiology
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 32
¡ Networkswithequalnumberofnodesandedges:§ ERrandomgraph§ Scale-freenetwork
¡ Studythepropertiesofthenetworkasanincreasingfractionofnodesareremoved§ Nodeselection:
§ Random(thiscorrespondstorandomfailures)§ Nodeswithlargestdegrees(correspondstotargetedattacks)
¡ Measures:§ Fractionofnodesinthelargestconnectedcomponent§ Averageshortestpathlengthbetweennodesinthelargestcomponent
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 33
¡ Scale-freegraphsareresilienttorandomattacks,butsensitivetotargetedattacks.
¡ Forrandomnetworksthereissmallerdifferencebetweenthetwo
random failure targeted attack
Size
of t
he la
rges
t co
nnec
ted
com
pone
nt
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 34
¡ Whatproportionofthenodesmustberemovedinorderforthesize(S)ofthegiantcomponenttodropto0?
¡ Infinitescale-freenetworkswith𝛾 < 3neverbreakdownunderrandomnodefailures
Source: Cohen et al., Resilience of the Internet to Random Breakdowns11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 35
𝛾… degree exponentK… maximum degree
Fraction deleted nodes
Frac
tion
in la
rges
t co
nnec
ted
com
pone
nt
¡ Realnetworksareresilienttorandomfailures¡ Gnp hasbetterresiliencetotargetedattacks
§ Needtoremoveallpagesofdegree>5 todisconnecttheWeb§ Butthisisaverysmallfractionofallwebpages
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 36
Fraction of removed nodes
Mea
n pa
th le
ngth
Fraction of removed nodes
Randomfailures
Targetedattack
Gnp networkAS network
Randomfailures
Targetedattack
Source: Error and attack tolerance of complex networks. Réka Albert, Hawoong Jeong and Albert-László Barabási11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 37
Frac
tion
in la
rges
t co
nnec
ted
com
pone
nt
Fraction deleted nodes
Shor
test
pat
h le
ngth
¡ Thefirstfew%ofnodesremoved.
¡ d… avg.shortestpathlength
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 38
Shor
test
pat
h le
ngth
Fraction deleted nodes
¡ Preferentialattachment[Price‘65,Albert-Barabasi ’99,Mitzenmacher ‘03]
§ Nodesarriveinorder1,2,…,n§ Atstepj,letdi bethedegreeofnodei < j§ Anewnodej arrivesandcreatesm out-links§ Prob.ofj linkingtoapreviousnodei isproportionaltodegreedi ofnodei
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 40
∑=→
kk
i
ddijP )(
¡ Newnodesaremorelikelytolinktonodesthatalreadyhavehighdegree
¡ HerbertSimon’sresult:§ Power-lawsarisefrom“Richgetricher”(cumulativeadvantage)
¡ Examples§ Citations[deSolla Price‘65]: Newcitationstoapaperareproportionaltothenumberitalreadyhas§ Herding: Ifalotofpeopleciteapaper,thenitmustbegood,andthereforeIshouldciteittoo
§ Sociology: Mattheweffect§ Eminentscientistsoftengetmorecreditthanacomparativelyunknownresearcher,eveniftheirworkissimilar
§ http://en.wikipedia.org/wiki/Matthew_effect11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 41
Wewillanalyzethefollowingmodel:¡ Nodesarriveinorder1,2,3,… , 𝑛¡ Whennode𝒋 iscreateditmakesasingleout-link toanearliernode𝒊 chosen:§ 1)Withprob.𝒑,𝒋 linksto𝒊 chosenuniformlyatrandom (fromamongallearliernodes)
§ 2)Withprob.𝟏 − 𝒑,node𝒋 chooses𝒊 uniformlyatrandom&linkstoarandomnodel that i pointsto§ Thisissameassaying:Withprob.𝟏 − 𝒑,node𝒋 linkstonode𝒍 withprob.proportionalto𝒅𝒍 (thein-degreeof𝒍)
§ Ourgraphisdirected: Everynodehasout-degree111/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 42
[Mitzenmacher, ‘03]
Node j
¡ Claim: Thedescribedmodelgeneratesnetworkswherethefractionofnodeswithin-degreek scalesas:
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 43
)11()( q
i kkdP+−
∝=where q=1-p
p−+=111α
So we get power-lawdegree distributionwith exponent:
¡ Considerdeterministicandcontinuousapproximation tothedegreeofnode𝒊 asafunctionoftime𝒕§ 𝒕 isthenumberofnodesthathavearrivedsofar§ In-Degree𝒅𝒊(𝒕) ofnode𝒊 (𝑖 = 1,2,… , 𝑛)isacontinuousquantity anditgrowsdeterministicallyasafunctionoftime𝒕
¡ Plan: Analyze𝒅𝒊(𝒕) – continuousin-degreeofnode𝒊 attime𝒕 > 𝒊§ Note:Node𝒊 arrivestothegraphattime𝒕
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 44
¡ Initialcondition:§ 𝒅𝒊(𝒕) = 𝟎,when 𝒕 = 𝒊 (nodei justarrived)
¡ Expectedchange of𝒅𝒊(𝒕) overtime:§ Node𝒊 gainsanin-linkatstep𝒕 + 𝟏 onlyifalinkfromanewlycreatednode𝒕 + 𝟏 pointstoit.
§ What’stheprobabilityofthisevent?§ Withprob.𝒑 node𝒕 + 𝟏 linksrandomly:§ Linkstoournode𝒊 withprob.𝟏/𝒕
§ Withprob.𝟏− 𝒑 node𝒕 + 𝟏 linkspreferentially:§ Linkstoournode𝒊 withprob.𝒅𝒊(𝒕)/𝒕
§ Prob.node𝒕 + 𝟏 linksto𝒊 is:𝒑 𝟏𝒕+ 𝟏− 𝒑 𝒅𝒊(𝒕)
𝒕11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 45
Node i
¡ At𝒕 = 𝟒 node𝒊 = 𝟒 comes.Ithasout-degreeof1todeterministicallysharewithothernodes:
¡
¡ 𝒅𝒊 𝒕 − 𝒅𝒊 𝒕 − 𝟏 = ���(´)�´
= 𝐩 𝟏𝒕+ 𝟏 − 𝒑 𝒅𝒊(𝒕)
𝒕¡ Howdoes𝒅𝒊(𝒕) evolveas𝒕&∞?
11/1/16 JureLeskovec,StanfordCS224W:SocialandInformationNetworkAnalysis,http://cs224w.stanford.edu 46
Node i di(t) di(t+1)0 0 =0 + 𝑝 F
¶+ 1 − 𝑝 ·
¶
1 2 =2 + 𝑝 F¶+ 1 − 𝑝 Y
¶
2 0 =0 + 𝑝 F¶+ 1 − 𝑝 F
¶
3 1 =1 + 𝑝 F¶+ 1 − 𝑝 F
¶
4 / 0
01
2 3
4