YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: A Community-Based Model of Online Social Networkskroon/pubs/botha2010community... · A Community-Based Model of Online Social Networks Leendert Botha and Steve Kroon lwbotha@ml.sun.ac.za,

EL ECTRON ICMEDIA L A B

A Community-Based Model of Online Social NetworksLeendert Botha and Steve Kroon

[email protected], [email protected]

1. Problem StatementAn accurate random graph model for social networks(SNs) can help provide:

• insight into how SNs grow;

• a basis for SN analysis without violating pri-vacy; and

• a test bed for algorithms and novel data struc-tures.

2. SolutionWe propose a model for generating SNs. Ourcommunity-based model simulates the growth ofSNs over time, focusing on reproducing distinctiveproperties of SNs, including

• low average separation;

• high level of clustering; and

• a power-law degree distribution.

3. Our approachWe propose a community-based approach, first modeling the community structure and then translatingthat model into a SN. A major advantage of this approach is that it is very intuitive with an obviouscorrespondence to real-world behavior where people meet new friends through the communities they belongto.

21

43

5

A

1 2 4 113

Communities

Users

CB

5 6 97 8

8

6 10

7

9

11

10

A

B

C

4. Model definitionWe develop a bipartite graph B representing a community structure as follows:

1. Community nodes, user nodes and connections are added at different rates.

2. With users we associate activity values, with communities density values, and with connections, com-mitment values.

3. The mechanism for creating connections is as follows:

• A node, uj , is chosen preferentially based on activity.

• A community c is selected preferentially based on the commitments of uj .

• ci is selected from the set of communities uj is not a member of, using PA based on the overlapbetween c and these communities. (The overlap θ(c, ck) is the number of mutual members of cand ck).

• The user node ui is connected to the community node ck.

5. Building Social NetworkWhenever a user node uj is connected to a commu-nity node ci in B, uj is connected in the SN to eachmember ui of ck with probability

f(δik, δjk, dk) ∝ exp[−(

1δik

+1δjk

+1dk

)]The final probability that two users ui and uj willbe connected in the completed SN is given by:

P (eij) =r∑

k=1

[f(ui, uj , ck) ·

k−1∏l=1

(1− f(ui, uj , cl))

]

with the sum over their mutual communities.

7. Current WorkShrinking diameters and densification power laws: In a real-world SN, the connections grow super-linearly in the number of users and the diameter shrinks over time. We are currently investigating if andwhen our model replicates this behavior.

Parameter estimation: We are currently implementing an automated parameter estimation techniquebased on simulated annealing.

6. Results

2000 4000 6000 8000 10000 12000Nodes, n

10-2

10-1

CC

FN

PA

GL

Our Model

0 2 4 6 8 10 12Path Distance

101

102

103

104

105

106

107

108

Count

FN

PA

GL

Our Model

2000 4000 6000 8000 10000 12000Nodes, n

1.8

2.0

2.2

2.4

2.6

2.8

� FN

PA

GL

Our Model

Clustering coefficient: The leftmost figure shows the evolution of the clustering coefficients of the truedata (FN) and the fitted models (PA [1], GL [2] and Our Model). Our model provides the best fit and isthe only model to capture the initial growth period of the network in which the clustering increases withnetwork size.

Average Separation: The centre figure shows a histogram of the shortest path lengths. Our modelmatches the histogram noticeably better than the other two models, both of which overproduce shorterpaths and fail to produce paths of longer length.

Degree distribution: The rightmost figure shows the evolution of the power-law parameters of the degreedistribution. The PA model provides a very bad fit, whereas our model and the GL model yields a good fitat the end of the simulation, with our model the only one to show the same downward trend in α, althoughit decreases more rapidly than in the true data.

References

[1] A. Barabasi and R. Albert. Emergence of scaling in ran-dom networks. Science, 286(5439):509, 1999.

[2] J. Guillaume and M. Latapy. Bipartite graphs as modelsof complex networks. Physica A: Statistical Mechanicsand its Applications, 371(2):795–813, 2006.

ACKNOWLEDGEMENTS:

• MIH for supporting the research project.

• Brian Amberg for the poster style sheet: http://www.

brian-amberg.de/uni/poster/.

1