Top Banner
51

Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes ([email protected])

Feb 26, 2019

Download

Documents

ledat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)
Page 2: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Social Network Analysis

Challenges in Computer Science – April 1, 2014

Frank Takes ([email protected])

LIACS, Leiden University

Page 3: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Overview

Context

Social Network Analysis

Online Social Networks

Friendship Graph

Centrality Measures

Example

Conclusions

Page 4: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Data Science

Data Science is the study of the generalizable extraction of knowledge from data, yet the key word is science (Wikipedia)

Builds on techniques and theories from many fields, including machine learning, computer programming, statistics, data engineering, pattern recognition and learning, visualization, …

Goal is extracting meaning from data and creating data products

Data science is a buzzword, often used interchangeably with analytics or big data…

Page 5: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Big Data?

Online Social Network

Friendship Graph

8 million users

1 billion friendships between users

10GB on disk

Is this Big Data?

No.

Page 6: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Three V’s of Big Data

Volume

Velocity

Volatile

Page 7: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Or….?

Page 8: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

From Data to Networks

Unstructured data: numeric measurementsfrom a temperature sensor, textual contexts of a news article

Structured data: data organized according to a model or data structure, for example:

Database: tables with rows and columns

Graph/Network: nodes and edges

Page 9: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Graphs

C

D

E

B

AVertex/Node/Knoop

Relationship/Edge/Link/Tak

Distance/Afstandd(C, E) = d(E, C) = 2

n = 5 nodesm = 6 edges

Page 10: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Social Network Analysis

Social Network Analysis (SNA): the study of social networks to understand their structure and behavior.

Social Network: a social structure of people, related (directly or indirectly) to each other through a common relation or interest.

Social Networks != Social Media

Page 11: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Social Network Analysis

Social Network Analysis (SNA) Sociology

Algorithms

Data Mining

Social Networks Real-life (explicit)

Online (explicit)

Derived (implicit) e-mail networks, citation networks, co-author networks, terrorist collaboration networks

Page 12: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)
Page 13: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Online Social Networks

Page 14: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

History

1997: SixDegrees.com

2000: Friendster

2003: LinkedIn & MySpace

2004: Hyves

2005: Facebook

2006: Twitter

................

2010: The Social Network (movie)

Page 15: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Online Social Networks

User (node) has a profile

Profiles have attributes (labels/annotations)

Explicit links (edges) Social Links / Friendship links

User groups

Implicit links Social messaging

Common attributes

Directed vs. undirected links

Page 16: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

2011

Page 17: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)
Page 18: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)
Page 19: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)
Page 20: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Example: Facebook

More than 1 billion active users

Average user has 130 friends

Estimated 100 billion social links

Over 600 million interactive objects (pages, groups and events)

More than 45 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) shared each month

Page 21: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

OSNA Research Topics

User behavior

Privacy & Anonymity

Trust & Authorities

Diffusion of information

Sampling & Crawling

Community Detection

Friendship Graph

Page 22: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Friendship Graph

Page 23: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Friendship Graph

Page 24: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Friendship Graph

Static analysis Who is the most important person in a network?

Can we distinguish between groups of people in the network?

What is the average distance between two peoplein the network?

Dynamic analysis Who are likely to become friends next?

How does the social network evolve over time?

Page 25: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Centrality measures

Degree centrality

Betweenness centrality

Closeness centrality

Graph centrality (eccentricity centrality)

Eigenvector centrality

Random walk centrality

Hyperlink Induced Topic Search (HITS)

PageRank

Page 26: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Centrality

B

C

D

E

A

Degree Centrality:C has the highest degree

Who has a central position in this graph?

E

F

G

H

B

C

D

E

A

E

F

G

H

Page 27: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Centrality

B

C

D

E

A

Betweenness Centrality:E is part of the largest

number of shortest paths

Who has a central position in this graph?

E

F

G

H

B

C

D

E

A

E

F

G

H

Page 28: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Google PageRank

Page 29: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Google PageRank

4 webpages A, B, C and D (N = 4)

Initially: PR(A) = PR(B) = PR(C) = PR(D) = 1/n

L(A) is the outdegree of page A

Now if B, C and D each link to A, the simple PageRank PR(A) of a page A is equal to:

Page 30: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Google PageRank

A B

C D

0.25 0.25

0.25 0.25

B

C D

0.458 0.083

0.208 0

A

Page 31: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Google PageRank

PageRank as suggested by Larry Page in 1999

N = number of pages, pi and pj are pages

M(pi) is the set of pages linking to pi

L(pj) is the outdegree of pj

d = 0.85, 85% chance to follow a link, 15% chance to jump to a random page (random surfer)

t = 0

t = t + 1

Page 32: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Frequent Subgraphs

B

C

B

A

A

B

C

B

A

A

Frequent Subgraph: A-B-C

B

C

B

A

A

B

C

B

A

A

What pattern occurs frequently in this graph?

Page 33: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Clustering

htt

p:/

/bay

ram

ann

ako

v.w

ord

pre

ss.c

om

Page 34: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Six Degrees of Separation"I read somewhere that everybody on this planet is separated by only six other people. Six degrees of separation. Between us and everybody else on this planet. The president of the United States. A gondolier in Venice. Fill in the names. I find that A) tremendously comforting that we're so close and B) like Chinese water torture that we're so close. Because you have to find the right six people to make the connection. It's not just big names. It's anyone. A native in a rain forest. A Tierra del Fuegan. An Eskimo. I am bound to everyone on this planet by a trail of six people. It's a profound thought."

John Guare, 1990

Page 35: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Six Degrees of Separation

Stanley Milgram, 1969

300 brieven van Omaha naar Boston

Geadresseerd aan 300 willekeurige mensen, met het verzoek debrief door te sturenrichting de uiteindelijkgeadresseerde.

Na gemiddeld 5.5 stap-pen kwam de brief bij de geadresseerde aan.

Page 36: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Six Degrees of Separation

Testen op een online sociaal netwerk

Dataset

8 miljoen gebruikers

900 miljoen onderlinge vriendschappen

9GB in text (datafile), 4GB in memory

Alle afstanden vergelijken: 8M x 8M = 64 x 1012

Sampling: onderlinge afstand van paren van 1000 willekeurige gebruikers bepalen mb.v. kortstepad-algoritme van Dijkstra

Page 37: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Gemiddelde afstanden

Netwerk Gebruikers Vriendschappen Gemiddelde afstand

Flickr 1.800.000 22.600.000 5.67

Hyves 8.000.000 900.000.000 4.75

LiveJournal 5.300.000 77.400.000 5.88

Orkut 3.100.000 223.500.000 4.25

YouTube 1.160.000 4.950.000 5.10

1000 samples

Random gekozen

Voor datasets van sociale netwerken:

Page 38: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Distance Distribution

Page 39: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Data storage in memory

n nodes, m links, k links per node on average

1 < k < n < m < n2

Adjacency Matrix Sorted Adjecency List

Size 8M x 8M x 1bit = 64 Tbit = 8 TbyteO(n2) space

900M x 8 bytes (INT pairs) = 7.2 GbyteO(m) space

Link existence O(1) time O(log k) time

Link addition O(1) time O(k log k) time

Link deletion O(1) time O(1) time

Neighborhood O(n) time O(1) time

Page 40: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Friendship Graph Analysis

Static analysis

Densely connected core

Fringe of low-degree nodes

Few isolated communities & singletons

Static properties

Node degree distribution, average distance, diameter

Edge/node ratio, level of symmetry

Number of cliques, k-cliques, etc.

Small world phenomanon

Page 41: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Degree Distribution

Page 42: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Small World Networks

Class of networks with certain properties:

Sparse graphs

Highly connected

Short average node-to-node distance: d ~ log(n)

Fat tailed power law node degree distribution

Densely connected core with many (near-)cliques

Existence of hubs: nodes with a very high degree

Fringe of low(er)-degree nodes

Page 43: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Regular vs. Small World

Page 44: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Small World Networks

Other examples of small world networks

Web graphs

Gene networks

E-mail networks

Telephone call graphs

Information networks

Internet topology networks

Scientific co-authorship networks

Corporate networks (interlocks or ownerships)

Page 45: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Collaboration @ LIACS

Page 46: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Friendship Graph Analysis

Static analysis

Dynamic analysis

Network evolution

Network modelling

Network growth

Link Prediction

Triadic Closure

Preferential Attachment

Page 47: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Link Prediction

B

C

D

E

A

B

C

D

E

A

Two principles: preferential attachment andtriadic closure

Page 48: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Triadic Closure

B

C

A

B

C

A

Two principles: preferential attachment andtriadic closure

Page 49: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Preferential Attachment

Nodes with a large degree acquire new links at a faster rate.

B

C D

J

A E F

G

Page 50: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Conclusions

When you hear “big data”, then it is almostnever really big data.

Online social networks are an excellent domain of study for data (or graph-) miners.

Social Network Analysis is important for many areas of research, not only computer science.

(Small world) networks are everywhere.

Page 51: Social Network Analysis - Leiden Universityliacs.leidenuniv.nl/~takesfw/pdf/sna.pdf · Social Network Analysis Challenges in Computer Science –April 1, 2014 Frank Takes (ftakes@liacs.nl)

Try this at home

Graph visualization: http://www.gephi.orghttp://nodexl.codeplex.com

Network datasets: http://snap.stanford.eduhttp://konect.uni-koblenz.de/networks