Top Banner
Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented by Aleksandra Potapova
26

Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Dec 22, 2015

Download

Documents

Charla Allen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Measurement and Analysis of Online Social Networks

Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee

Presented by Aleksandra Potapova

Page 2: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Focus

• graphs of online social networks– how they were obtained– how they were verified

• how measurement and analysis was performed

• properties of obtained graphs• why these properties are relevant

Page 3: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

What was studied?

• Flickr• YouTube• LiveJournal,• Orkut

Page 4: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Why should we perform measurements and analysis in social networks?

• To design future online social network based systems

• To understand the impact of online social networks on the Internet

• To reduce the number of spam • To improve security aspect

Page 5: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

5

Summary of graph properties

• small-world• power-law• scale-free• correlation between indegree and outdegree• large strongly connected core of high-degree

nodes surrounded by small clusters of low-degree nodes

Page 6: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Crawling Algorithms for large graphs

• BFS and DFS• Snowball method(crawling only small subset

of a graph by ending BFS early): – Partial BFS craws overestimate node degree and

underestimate the level of symmetry.– In social networks, they underestimate the power-

law coefficient, but closely match other metrics such as overall clustering coefficient.

Page 7: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

How social networks should be crawled?

• The focus of the paper – WCC– Forward and reverse links should be used

Page 8: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

How the graphs were obtained?

• API– users– groups– forward/backward links

• HTML Screen Scraping

Page 9: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

How to Verify Samples

1. Obtain a random user sample– LJ: feature which returns 5,000 random users– Flickr: random 8-digit user id generation

2. Conduct a crawl using these random users as seeds3. See if these random nodes connect to the original

WCC4. See what the graph structure of the newly crawled

graph compares to original

Page 10: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Crawling Concerns – FW links

• no effect on largest WCC

Page 11: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

11

Crawling Concerns – FW links• increasing the size of the WCC by starting at a

different seed

Page 12: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

12

Site YT Flickr LJ Orkut

Users(mill) 1.1 1.8 5.2 3

Links(mill) 4.9 22 72 223

symmetry 79.1% 62.0% 73.5% 100.0%

Access (FW: Forward-only)

(SS: HTML screen-scraping)

API

(users only)

FW

SS for group info

API

(users + groups)

FW

API

(users + groups)

FW + BW

SS for users + groups

Page 13: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

13

Link Symmetry

• even with directed links, there is a high level of symmetry

• possibly contributed to by informing users of new incoming links

• makes it harder to identify reputable sources due to dilution

Page 14: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

14

Power-law node degrees

• Orkut deviates:– only 11.3% of network reached (effect of partial

BFS crawl – Snowball method)– artificial cap of user’s number of outgoing links,

leads to a distortion in distribution of high degrees

• differs from Web

Page 15: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

15

Power-law node degrees

Page 16: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Power-law node degrees

Page 17: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

17

Correlation of indegree and outdegree

• over 50% of nodes have indegree within 20% of their outdegree

Page 18: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

18

Path lengths and diameter

• all four networks have short path length

Page 19: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

19

Link degree correlations

• JDD: joint degree distribution(how often nodes of different degree connect to each other)

• Knn --- mapping between outdegree and average indegree of all nodes connected to nodes of that outdegree– Used for aproxmation of JDD

• YouTube different due to extremely popular users being connected to by many unpopular users

• Orkut shows bump due to undersampling

Page 20: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Measurement and Analysis of Online Social Networks 20

Joint degree distribution and Scale-free behaviour

undersamplingof low-degreenodes celebrity-driven

nature

cap on links

Page 21: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Measurement and Analysis of Online Social Networks 21

Densely connected core• removing 10% of core nodes results in breaking up graph into millions of

very small SCCs• graphs below show results as nodes are removed starting with highest-

degree nodes (left) and path length as graph is constructed beginning with highest-degree nodes(right)

Sub logarithmic growth

Page 22: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Measurement and Analysis of Online Social Networks 22

Tightly clustered fringe

• based on clustering coefficient• social network graphs show stronger

clustering, most likely due to mutual friends

Possibly because personal content is not shared

Page 23: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Measurement and Analysis of Online Social Networks 23

Groups

• group sizes follow power-law distribution• represent tightly clustered communities

Page 24: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Measurement and Analysis of Online Social Networks 24

Groups

• Orkut special case maybe because of partial crawl

Page 25: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

Measurement and Analysis of Online Social Networks 25

Node Value Determination

Directed Graph, current model• nodes with many incoming links (hubs) have

value due to their connection to many users• it becomes easy to spread important information to

the other nodes, e.g. DNS• unhealthy in case of spam or viruses

• in order for a user to send spam, they have become a more important node, amass friends

Page 26: Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.

• Questions?