Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1 Multigraph sampling
Dec 26, 2015
Multigraph Sampling of Online Social Networks
Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou
1Multigraph sampling
Outline
• Multigraph sampling– Motivation– Sampling method– Internet Measurements– Conclusion
2Multigraph sampling Minas Gjoka
3
Problem statement
• Obtain a representative sample of OSN users by exploration of the social graph.
F
HE
I
G
D
B
C
A
Multigraph sampling Minas Gjoka
Motivation for multiple relations
• Principled methods for graph sampling– Metropolis Hastings Random Walk– Re-weighted Random Walk
“Walking in Facebook: A Case Study of Unbiased Sampling of OSNs,” INFOCOM ‘10
• But..graph characteristics affect mixing and convergence
• fragmented social graph• highly clustered areas
4Multigraph sampling Minas Gjoka
Fragmented social graph
5Union
Friendship
Event attendance
Group membershipMultigraph sampling
Largest Connected ComponentOther Connected Components
Proposal
• Graph exploration using multiple user relations– perform random walk– re-weighting at the end of the walk– online convergence diagnostics applicable
• Theoretical benefits– faster mixing– discovery of isolated components
• Open questions– how to combine relations– implementation efficiency– evaluation of sampling benefits in a realistic scenario
7Multigraph sampling Minas Gjoka
8
D
F
H
EI
J
GC
B
A
K
D
F
H
EI
J
GC
B
A
K
D
F
H
EI
J
GC
B
A
K
Friends
Events
Groups
Multigraph sampling Minas Gjoka
9
D
F
H
EI
J
GC
B
A
K
D
F
H
EI
J
GC
B
A
K
D
F
H
EI
J
GC
B
A
K
D
F
H
EI
J
GC
B
A
K
Friends
Events
Groups
Multigraph sampling Minas Gjoka
10
D
F
H
EI
J
GC
B
A
K
deg(F, tot) = 8
deg(F, red) = 1
deg(F, blue) = 3
deg(F, green) = 4
G* = Friends + Events + Groups
( G* is a union multigraph )
Combination of multiple relations
D
F
H
E
I
J
GC
B
A
K G = Friends + Events + Groups
( G is a union graph )
Multigraph sampling Minas Gjoka
Multigraph samplingImplementation efficiency
Degree information available without enumeration
5)( Fd
8/1)( Friendsp
8/4)( Eventsp
8/3)( Groupsp
Take advantage of pages functionality 11
8)(* Fd
Multigraph sampling Minas Gjoka
Multigraph samplingInternet Measurements
• Last.fm, an Internet radio service– social networking features– multiple relations– fragmented graph components and highly clustered
users expected
• Last.fm relations used– Friends– Groups– Events– Neighbors
12Multigraph sampling Minas Gjoka
Data CollectionSampled node information
• Crawling using Last.fm API and HTML scrapinguserIDcountryageregistration time…
13Multigraph sampling Minas Gjoka
Summary of datasetsLast.fm - July 2010
Crawl type # Total Users % Unique Users
Friends 5x50K 71%
Events 5x50K 58%Groups 5x50K 74%Neighbors 5x50K 53%Friends-Events-Groups-Neighbors
5x50K 76%
UNI 500K 99%
15Multigraph sampling Minas Gjoka
Related Work
• Fastest mixing Markov Chain– Boyd et al - SIAM Review 2004
• Sampling in fragmented graphs– Ribeiro et al. Frontier Sampling – IMC 2010
• Last.fm studies– Konstas et al - SIGIR ‘09– Schifanella et al - WSDM ‘10
19Multigraph sampling Minas Gjoka
20
Conclusion
• Introduced multigraph sampling– simple and efficient– discovers isolates components– better approximation of distributions and means– multigraph dataset planned for public release
• Future work on multigraph sampling– selection of relations– weighted relations
Multigraph sampling Minas Gjoka