Top Banner
School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic
44

School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

School of InformationUniversity of Michigan

SI 614Basic network concepts and intro to Pajek

Lecture 2

Instructor: Lada Adamic

Page 2: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Outline

Basic network metrics Bipartite graphs Graph theory in math Pajek

Page 3: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Network elements: edges

Directed (also called arcs) A -> B

A likes B, A gave a gift to B, A is B’s child

Undirected A <-> B or A – B

A and B like each other A and B are siblings A and B are co-authors

Edge attributes weight (e.g. frequency of communication) ranking (best friend, second best friend…) type (friend, relative, co-worker) properties depending on the structure of the rest of the graph:

e.g. betweenness

Page 4: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Directed networks

2

1

1

2

1

2

1

2

1

2

21

1

2

1

2

1

2

12

1

2

1

2

1

2

1

21

2 1

2

1

2

12 1

2

1

2

12

1

2

12

1

2

1 2

12

Ada

Cora

Louise

Jean

Helen

Martha

Alice

Robin

Marion

Maxine

Lena

Hazel Hilda

Frances

Eva

RuthEdna

Adele

Jane

Anna

Mary

Betty

Ella

Ellen

Laura

Irene

girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960)

first and second choices shown

Page 5: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Edge weights can have positive or negative values

One gene activates/inhibits another

One person trusting/distrusting another Research challenge:

How does one ‘propagate’ negative feelings in a social network? Is my enemy’s enemy my friend?

Transcription regulatory network in baker’s yeast

Page 6: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Adjacency matrices

Representing edges (who is adjacent to whom) as a matrix Aij = 1 if node i has an edge to node j

= 0 if node i does not have an edge to j

Aii = 0 unless the network has self-loops

Aij = Aji if the network is undirected,or if i and j share a reciprocated edge

ij

i

ij

1

2

3

4

Example:

5

0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 0 0 0 1

1 1 0 0 0

A =

Page 7: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Adjacency lists

Edge list 2 3 2 4 3 2 3 4 4 5 5 2 5 1

Adjacency list is easier to work with if network is

large sparse

quickly retrieve all neighbors for a node 1: 2: 3 4 3: 2 4 4: 5 5: 1 2

1

2

3

45

Page 8: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Nodes

Node network properties from immediate connections

indegreehow many directed edges (arcs) are incident on a node

outdegreehow many directed edges (arcs) originate at a node

degree (in or out)number of edges incident on a node

from the entire graph centrality (betweenness, closeness)

outdegree=2

indegree=3

degree=5

Page 9: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Node degree from matrix values

Outdegree =0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 0 0 0 1

1 1 0 0 0

A =

n

jijA

1

example: outdegree for node 3 is 2, which we obtain by summing the number of non-zero entries in the 3rd row

Indegree =0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 0 0 0 1

1 1 0 0 0

A =

n

iijA

1

example: the indegree for node 3 is 1, which we obtain by summing the number of non-zero entries in the 3rd column

n

iiA

13

n

jjA

13

1

2

3

45

Page 10: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Other node attributes

Homophily: tendency of like individuals to associate with one another

take your pick… geographical location function musical tastes…

Page 11: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Network metrics: degree sequence and degree distribution

Degree sequence: An ordered list of the (in,out) degree of each node

In-degree sequence: [2, 2, 2, 1, 1, 1, 1, 0]

Out-degree sequence: [2, 2, 2, 2, 1, 1, 1, 0]

(undirected) degree sequence: [3, 3, 3, 2, 2, 1, 1, 1]

Degree distribution: A frequency count of the occurrence of each degree

In-degree distribution: [(2,3) (1,4) (0,1)]

Out-degree distribution: [(2,4) (1,3) (0,1)]

(undirected) distribution: [(3,3) (2,2) (1,3)]

0 1 20

1

2

3

4

5

indegree

fre

qu

en

cy

Page 12: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Network metrics: connected components

Strongly connected components Each node within the component can be reached from every other node

in the component by following directed links

Strongly connected components B C D E A G H F

Weakly connected components: every node can be reached from every other node by following links in either direction

A

B

C

DE

FG

H

A

B

C

DE

FG

H

Weakly connected components A B C D E G H F

In undirected networks one talks simply about ‘connected components’

Page 13: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Network metrics: shortest paths

Shortest path (also called a geodesic path) The shortest sequence of links connecting two nodes Not always unique

A and C are connected by 2 shortest paths

A – E – B - C A – E – D - C

Diameter: the largest geodesic distance in the graph

A

B

C

DE

The distance between A and C is the maximum for the graph: 3

Caution: some people use the term ‘diameter’ to be the average shortest path distance, in this class we will use it only to refer to the maximal distance

1

2

2

3

3

Page 14: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Giant components and the web graph

if the largest component encompasses a significant fraction of the graph, it is called the giant component

Page 15: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

The bowtie model of the web

The Web is a directed graph: webpages link to other

webpages The connected components

tell us what set of pages can be reached from any other just by surfing (no ‘jumping’ around by typing in a URL or using a search engine)

Broder et al. 1999 – crawl of over 200 million pages and 1.5 billion links.

SCC – 27.5% IN and OUT – 21.5% Tendrils and tubes – 21.5% Disconnected – 8%

image: Mark Levene

Page 16: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

bipartite (two-mode) networks

edges occur only between two groups of nodes, not within those groups

for example, we may have individuals and events directors and boards of directors customers and the items they purchase metabolites and the reactions they participate in

Page 17: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

going from a bipartite to a one-mode graph

One mode projection two nodes from the first

group are connected if they link to the same node in the second group

some loss of information naturally high

occurrence of cliques

Two-mode networkgroup 1

group 2

Page 18: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Now in matrix notation

Bij = 1 if node i from the first group

links to node j from the second group = 0 otherwise

B is usually not a square matrix! for example: we have n customers and m products

i

j

1 0 0 0

1 0 0 0

1 1 0 0

1 1 1 1

0 0 0 1

B =

Page 19: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Collapsing to a one-mode network

i and k are linked if they both link to j Aik= j Bij Bkj

A= B BT

the transpose of a matrix swaps Bxy and Byx

if B is an nxm matrix, BT is an mxn matrix

i

j=1

k

j=2

B = BT =

1 0 0 0

1 0 0 0

1 1 0 0

1 1 1 1

0 0 0 1

1 1 1 1 0

0 0 1 1 0

0 0 0 1 0

0 0 0 1 1

Page 20: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Matrix multiplication

general formula for matrix multiplication Zij= k Xik Ykj

let Z = A, X = B, Y = BT

1 0 0 0

1 0 0 0

1 1 0 0

1 1 1 1

0 0 0 1

A =

1 1 1 1 0

0 0 1 1 0

0 0 0 1 0

0 0 0 1 1

=

1 1 1 1 0

1 1 1 1 0

1 1 2 2 0

1 1 2 4 1

0 0 0 1 1

1 1

1 2

11 1 1 1 1

1

0

0

= 1*1+1*1 + 1*0 + 1*0= 2

Page 21: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Collapsing a two-mode network to a one mode-network

Assume the nodes in group 1 are people and the nodes in group 2 are movies

The diagonal entries of A give the number of movies each person has seen

The off-diagonal elements of A give the number of movies that both people have seen

A is symmetric

A =

1 1 1 1 0

1 1 1 1 0

1 1 2 2 0

1 1 2 4 1

0 0 0 1 1

1 1

1 2

1

Page 22: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Networks of actors

Page 23: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

History: Graph theory

Euler’s Seven Bridges of Königsberg – one of the first problems in graph theory

Is there a route that crosses each bridge only once and returns to the starting point?

Page 24: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Eulerian paths

If starting point and end point are the same: only possible if no nodes have an odd degree

each path must visit and leave each shore

If don’t need to return to starting point can have 0 or 2 nodes with an odd degree

Eulerian path: traverse each

edge exactly once

Hamiltonian path: visit

each vertex exactly once

Page 25: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Bi-cliques (cliques in bipartite graphs)

Km,n is the complete bipartite graph with m and n vertices of the two different types

K3,3 maps to the utility graph Is there a way to connect three utilities, e.g. gas, water, electricity to

three houses without having any of the pipes cross?

K3,3

Utility graph

Page 26: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Planar graphs

A graph is planar if it can be drawn on a plane without any edges crossing

Page 27: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

When graphs are not planar

Two graphs are homeomorphic if you can make one into the other by adding a vertex of degree 2

Page 28: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Cliques and complete graphs

Kn is the complete graph (clique) with K vertices each vertex is connected to every other vertex there are n*(n-1)/2 undirected edges

K5 K8K3

Page 29: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Peterson graph

Example of using edge contractions to show a graph is not planar

Page 30: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Edge contractions defined

A finite graph G is planar if and only if it has no subgraph that is homeomorphic or edge-contractible to the complete graph in five vertices (K5) or the complete bipartite graph K3, 3. (Kuratowski's Theorem)

Page 31: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

graph density

Of the connections that may exist between n nodes directed graph

emax = n*(n-1)each of the n nodes can connect to (n-1) other nodes

undirected graphemax = n*(n-1)/2since edges are undirected, count each one only once

What fraction are present? density = e/ emax

For example, out of 12possible connections, this graphhas 7, giving it a density of 7/12 = 0.583

But it is more difficult for a larger networkto achieve the same density

measure not useful for comparing networks of different densities

Page 32: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

#s of planar graphs of different sizes

1:1

2:2

3:4

4:11

Every planar graph

has a straight line

embedding

(homework exercise)

Page 33: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Trees

Trees are undirected graphs that contain no cycles

Page 34: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

examples of trees

In nature trees river networks arteries (or veins, but not both)

Man made sewer system

Computer science binary search trees decision trees (AI)

Network analysis minimum spanning trees

from one node – how to reach all other nodes most quickly may not be unique, because shortest paths are not always unique depends on weight of edges

Page 35: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Using Pajek for exploratory social network analysis

Pajek – (pronounced in Slovenian as Pah-yek) means ‘spider’

website: vlado.fmf.uni-lj.si/pub/networks/pajek/ download application (free) tutorials lectures data sets

Windows only (works on Linux via Wine)

can be installed via NAL in the student lab (DIAD)

helpful book: ‘Exploratory Social Network Analysis with Pajek’ by Wouter de Nooy, Andrej Mrvar and Vladimir Batagelj first 2 chapters are required reading and on cTools

Page 36: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Pajek interface

Drop down list of networks opened or created with pajek. Active is displayed

Drop down list of network partitions by discrete variables, e.g. degree, mode, label

Drop down list of continuous node attributes, e.g. centrality, clustering coefficients

things we’ll use right away

things we’ll use later for clustering

Page 37: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

opening a network fileclick on folder icon

to open a file

Save changes to your network, network partitions, etc., if you’d like to keep them

Page 38: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Working with network files in Pajek

The active network, partition, etc is shown on top of the drop down list

Draw the network

Page 39: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Pajek data format

*Vertices 26

1 "Ada" 0.1646 0.2144 0.5000

2 "Cora" 0.0481 0.3869 0.5000

3 "Louise" 0.3472 0.1913 0.5000

..

*Arcs 1 3 2 c Black

..

*Edges 1 2 1 c Black

..

2

1

1Ada

Cora

Louise

number of vertices vertex x,y,z coordinates (optional)

directed edges

undirected edges

from Ada(1) to Louise(3) as

choice “2” and color Black

between Ada(1) to Cora(2) as

choice “1” and color Black

Page 40: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Live demo of Pajek

Opening a network Visualization Essential measurements

Page 41: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Final project guidelines

Work individually or in groups (up to 4 people) Important dates

Feb. 13th Project proposals due (5%) 1 page abstract & 5 minute class presentation

March 20th Project status report due (5%) 3-6 pages of

result summaries (including figures and tables) plan of remaining work

April 17th in class student presentations of results (5%) April 24th final project reports due (25%)

6-12 pages of related work main results ‘future’ work/extensions

Page 42: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Final Project

Option 1: Analyze a network What it should be

More than just a measurement of the average shortest path, clustering coefficient, and degree distribution

An interpretation of measurement results If applicable:

discovery of community or other structure assortativity motifs weights, thresholds longitudinal data (how the network changes over time)

Visualizations of all or part of the network that point out a particular feature Qualitative comparison with other networks

What it should not be a literature review

The data can be artificially generated or a real-world dataset If you intend to work on data concerning human subjects, you may need

to start an IRB application ASAP

Page 43: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Final Project

Option 2: New network model What it should be

Method for generating a network e.g. preferential attachment optimization wrt. different criteria

Analysis of resulting network comparison with random graphs how do attributes change depending on model parameters

What it should not be an already thoroughly explored model

Page 44: School of Information University of Michigan SI 614 Basic network concepts and intro to Pajek Lecture 2 Instructor: Lada Adamic.

Final Project

Option 3: Novel algorithm What it should be

An algorithm to analyze the network e.g. clustering or community detection algorithm webpage ranking algorithm

OR a process that is influenced by the network gossip spreading games such as the prisoner’s dilemma

Analysis of algorithm on several different networks

What it should not be an exact replica of an existing algorithm applied to a network where

it has already been studied