ELECTRONIC MEDIA LAB A Community-Based Model of Online Social Networks Leendert Botha and Steve Kroon [email protected], [email protected] 1. Problem Statement An accurate random graph model for social networks (SNs) can help provide: • insight into how SNs grow; • a basis for SN analysis without violating pri- vacy; and • a test bed for algorithms and novel data struc- tures. 2. Solution We propose a model for generating SNs. Our community-based model simulates the growth of SNs over time, focusing on reproducing distinctive properties of SNs, including • low average separation; • high level of clustering; and • a power-law degree distribution. 3. Our approach We propose a community-based approach, first modeling the community structure and then translating that model into a SN. A major advantage of this approach is that it is very intuitive with an obvious correspondence to real-world behavior where people meet new friends through the communities they belong to. 2 1 4 3 5 A 1 2 4 11 3 Communities Users C B 5 6 9 7 8 8 6 10 7 9 11 10 A B C 4. Model definition We develop a bipartite graph B representing a community structure as follows: 1. Community nodes, user nodes and connections are added at different rates. 2. With users we associate activity values, with communities density values, and with connections, com- mitment values. 3. The mechanism for creating connections is as follows: • A node, u j , is chosen preferentially based on activity. • A community c is selected preferentially based on the commitments of u j . • c i is selected from the set of communities u j is not a member of, using PA based on the overlap between c and these communities. (The overlap θ (c, c k ) is the number of mutual members of c and c k ). • The user node u i is connected to the community node c k . 5. Building Social Network Whenever a user node u j is connected to a commu- nity node c i in B , u j is connected in the SN to each member u i of c k with probability f (δ ik ,δ jk ,d k ) ∝ exp - 1 δ ik + 1 δ jk + 1 d k The final probability that two users u i and u j will be connected in the completed SN is given by: P (e ij )= r X k=1 " f (u i ,u j ,c k ) · k-1 Y l=1 (1 - f (u i ,u j ,c l )) # with the sum over their mutual communities. 7. Current Work Shrinking diameters and densification power laws: In a real-world SN, the connections grow super- linearly in the number of users and the diameter shrinks over time. We are currently investigating if and when our model replicates this behavior. Parameter estimation: We are currently implementing an automated parameter estimation technique based on simulated annealing. 6. Results Clustering coefficient: The leftmost figure shows the evolution of the clustering coefficients of the true data (FN) and the fitted models (PA [1], GL [2] and Our Model). Our model provides the best fit and is the only model to capture the initial growth period of the network in which the clustering increases with network size. Average Separation: The centre figure shows a histogram of the shortest path lengths. Our model matches the histogram noticeably better than the other two models, both of which overproduce shorter paths and fail to produce paths of longer length. Degree distribution: The rightmost figure shows the evolution of the power-law parameters of the degree distribution. The PA model provides a very bad fit, whereas our model and the GL model yields a good fit at the end of the simulation, with our model the only one to show the same downward trend in α, although it decreases more rapidly than in the true data. References [1] A. Barabasi and R. Albert. Emergence of scaling in ran- dom networks. Science, 286(5439):509, 1999. [2] J. Guillaume and M. Latapy. Bipartite graphs as models of complex networks. Physica A: Statistical Mechanics and its Applications, 371(2):795–813, 2006. ACKNOWLEDGEMENTS: • MIH for supporting the research project. • Brian Amberg for the poster style sheet: http://www. brian-amberg.de/uni/poster/.