Top Banner
3 Aaron Clauset @aaronclauset Assistant Professor of Computer Science University of Colorado Boulder External Faculty, Santa Fe Institute Network Analysis and Modeling © 2017 Aaron Clauset lecture 0: what are networks and how do we talk about them?
48

Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

Mar 16, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

100150

200250

300

Aaron Clauset @aaronclausetAssistant Professor of Computer ScienceUniversity of Colorado BoulderExternal Faculty, Santa Fe Institute

Network Analysis and Modeling

© 2017 Aaron Clauset

lecture 0: what are networks and how do we talk about them?

Page 2: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

who are network scientists?

Physicists

Computer Scientists

Applied Mathematicians

Statisticians

Biologists

Ecologists

Sociologists

Political Scientists

it’s a big community!}

Page 3: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

who are network scientists?

Physicists

Computer Scientists

Applied Mathematicians

Statisticians

Biologists

Ecologists

Sociologists

Political Scientists

it’s a big community!

• different traditions

• different tools

• different questions

}

Page 4: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

who are network scientists?

Physicists

Computer Scientists

Applied Mathematicians

Statisticians

Biologists

Ecologists

Sociologists

Political Scientists

it’s a big community!

• different traditions

• different tools

• different questions

increasingly, not ONE community, but MANY, only loosely interacting communities

}

Page 5: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

who are network scientists?

Physicists

Computer Scientists

Applied Mathematicians

Statisticians

Biologists

Ecologists

Sociologists

Political Scientists

phase transitions, universality

data / algorithm oriented, predictions

dynamical systems, diff. eq.

inference, consistency, covariates

experiments, causality, molecules

observation, experiments, species

individuals, differences, causality

rationality, influence, conflict

}

Page 6: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

what are networks?

Page 7: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

what are networks?• an approach• a mathematical representation• provide structure to complexity• structure above

individuals / components • structure below

system / population

system / population

individuals / components

}

Page 8: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

CSCI 5352 Network Analysis and Modeling : learning goals

1. develop a network intuition for reasoning about how structural patterns are related, and how they influence dynamics in / on networks

2. master basic terminology and concepts

3. master practical tools for analyzing / modeling structure of network data

4. build familiarity with advanced techniques for exploring / testing hypotheses about networks

Page 9: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

building intuitionbasic concepts, toolspractical toolsadvanced toolsCourse schedule (roughly) :

1. network basics2. centrality measures3. random graphs (simple)4. configuration model5. large-scale structure (communities, hierarchies, etc.)6. probabilistic generative models (SBMs)7. metadata, label and link prediction8. spreading processes (social, biological, SI-type)9. data wrangling + data sampling (artifacts)10. role of statistics in hypothesis generation / testing11. spatial networks12. citations networks, dynamics, preferential attachment13. temporal networks14. student project presentations

Page 10: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

100150

200250

300

http://santafe.edu/~aaronc/courses/5352/Course webpage:

Page 11: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

Network data for assignments

Page 12: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

lessons learned from past instances

what’s difficult:

1. students need to know many different things:

2. can’t teach all of these things to all types of students!

• vast amounts of advanced material in each of these directions

• students have little experience / intuition of what makes good science

• some probability Erdos-Renyi, configuration, calculations• some mathematics physics-style calculations, phase transitions• some statistics basic data analysis, correlations, distributions• some machine learning prediction, likelihoods, features, estimation algorithms• some programming data wrangling, coding up measures and algorithms

Page 13: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

what works well:

1. simple mathematical problemsbuild intuition + practice with concepts

nA nB

A

B

calculate the diameter

closeness centrality

modularity of a line graph

n− rr

betweenness of

Q(r)

A

A

lessons learned from past instances

Page 14: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

what works well:

2. analyze real networkstest understanding + practice with implementing methods

102

103

104

105

2

2.5

3

3.5

Network size, n

Mean g

eodesi

c path

length

USF

Haverford

Caltech

Penn

mean geodesics and O(log n)1 4 7 10 13 16 19 22 25 28 31 34

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

vertex label

harm

onic

centr

alit

y

Karate clubconfiguration modelreal-world network

node centrality vs. configuration model(when is a pattern interesting?) Assortativity (gender)

-0.1 -0.05 0 0.05 0.1

Den

sity

×10-3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

attribute assortativity

lessons learned from past instances

Page 15: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

what works well:

3. simple prediction taskstest intuition + run numerical experiments

Fraction of labels observed, f0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Frac

tion

of c

orre

ct la

bel p

redi

ctio

ns

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1malaria genes, HVR5Norwegian boards, net1m-2011-08-01

Fraction of edges observed, f0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

AUC

0.5

0.6

0.7

0.8

0.9

1HVR5 malaria genes network

degree productJaccard coefficientshortest pathbaseline (guessing)

label prediction via homophily link prediction via heuristic

lessons learned from past instances

Page 16: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

in-degree, kin

100 101 102 103 104 105

Pr(K

≥ k

in)

10-6

10-5

10-4

10-3

10-2

10-1

100

r=1r=4no preferential attachment

015

5

1

l

10

10

cin-cout p

15

0.550 0

what works well:

4. simple simulationsexplore dynamics vs. structure + numerical experiments

simulate epidemics (SIR) on planted partitions simulate Price’s model

lessons learned from past instances

Page 17: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

what works well:

5. team projectsteamwork + exploring their own ideas

lessons learned from past instances

Page 18: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

key takeaways

0

0.51

a(t)

0

1

0

1

0200

400600

0

1

alignment position t

1

23 4

56

78

9

calculate alignment scoresconvert to alignment indicatorsremove short aligned regionsextract highly variable regions

NGDYKEKVSNNLRAIFNKIYENLNDPKLKKHYQKDAPNY

NGDYKKKVSNNLKTIFKKIYDALKDTVKETYKDDPNY

NGDYKEKVSNNLRAIFKKIYDALEDTVKETYKDDPNY

16

6

13

16 6

13

A

B

C

D

Page 19: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

• network intuition is hard to develop!good intuition draws on many skills (probability, statistics, computation, causal dynamics, etc.)

• best results come from1. exercises to get practice with calculations2. practice analyzing diverse real-world networks3. conducting out numerical experiments & simulations

• practical tasks are a pedagogical tool (e.g., link and label prediction)

• interpreting the results requires a good intuition

• null models are key conceptual idea: is a pattern interesting?

• networks are fun!

key takeaways

0

0.51

a(t)

0

1

0

1

0200

400600

0

1

alignment position t

1

23 4

56

78

9

calculate alignment scoresconvert to alignment indicatorsremove short aligned regionsextract highly variable regions

NGDYKEKVSNNLRAIFNKIYENLNDPKLKKHYQKDAPNY

NGDYKKKVSNNLKTIFKKIYDALKDTVKETYKDDPNY

NGDYKEKVSNNLRAIFKKIYDALEDTVKETYKDDPNY

16

6

13

16 6

13

A

B

C

D

Page 20: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

1. defining a network

2. describing a network

Page 21: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

vertices edges

what is a vertex?

when are two vertices connected?

V distinct objects (vertices / nodes / actors)

E ✓ V ⇥ V

pairwise relations (edges / links / ties)

Page 22: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

tele

com

mun

icat

ions

info

rmat

iona

ltr

ansp

orta

tion

soci

albi

olog

ical

network vertex edgeInternet(1) computer IP network adjacency

Internet(2) autonomous system (ISP) BGP connection

software function function call

World Wide Web web page hyperlink

documents article, patent, or legal case citation

power grid transmission generating or relay station transmission line

rail system rail station railroad tracks

road network(1) intersection pavement

road network(2) named road intersection

airport network airport non-stop flight

friendship network person friendship

sexual network person intercourse

metabolic network metabolite metabolic reaction

protein-interaction network protein bonding

gene regulatory network gene regulatory effect

neuronal network neuron synapse

food web species predation or resource transfer

Page 23: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

high schoolfriendshipssocial networks

vertex: a person

edge: friendship, collaborations, sexual contacts, communication, authority, exchange, etc.

Page 24: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

Adamic & Glance 2005

political blogs

information networks

vertex: books, blogs, webpages, etc.

edge: citations, hyperlinks, recommendations, similarity, etc.

political books

Page 25: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

ISP network

IP-level Internet

Enron email

communication networks

vertex: network router, ISP, email address, mobile phone number, etc.

edge: exchange of information

Page 26: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

US Interstates

global shipping

global air traffic

transportation networks

vertex: city, airport, junction, railway station, river confluence, etc.

edge: physical transportation of material

Page 27: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

!"#$%$&'()')*+,$-

biological networks

vertex: species, metabolic, protein, gene, neuron, etc.

edge: predation, chemical reaction, binding, regulation, activation, etc.

core metabolism

grassland foodweb

Page 28: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

what’s a network?

pop quiz

Page 29: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

Andromeda galaxy

what’s a network?

Page 30: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

cauliflower fractal

what’s a network?

Page 31: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

diamond lattice

what’s a network?

Page 32: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

representing networks

Page 33: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

2

1

5

3

6

4

a simple network

undirectedunweightedno self-loops

Page 34: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

2

1

5

3

6

4

a simple network

A 1 2 3 4 5 61 0 1 0 0 1 02 1 0 1 1 0 03 0 1 0 1 1 14 0 1 1 0 0 05 1 0 1 0 0 06 0 0 1 0 0 0

adjacency matrix

adjacency listA1 � {2, 5}2 � {1, 3, 4}3 � {2, 4, 5, 6}4 � {2, 3}5 � {1, 3}6 � {3}

undirectedunweightedno self-loops

Page 35: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

2

1

5

3

6

4

Self-loopMulti-edge

Weighted edgeDirected edge

Weighted node

a less simple network

undirectedunweightedno self-loops

Page 36: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

A1 ! {(5, 1), (5, 1), (5, 2)}2 ! {(1, 1), (2, 1

2 ), (3, 2), (3, 1), (4, 1)}3 ! {(2, 2), (2, 1), (4, 2), (5, 4), (6, 4)}4 ! {(2, 1), (3, 2)}5 ! {(1, 1), (1, 1), (1, 2), (3, 4)}6 ! {(3, 4), (6, 2)}

A 1 2 3 4 5 61 0 0 0 0 {1, 1, 2} 02 1 1

2 {2, 1} 1 0 03 0 {2, 1} 0 2 4 44 0 1 2 0 0 05 {1, 1, 2} 0 4 0 0 06 0 0 4 0 0 2

2

1

5

3

6

4

Self-loopMulti-edge

Weighted edgeDirected edge

Weighted node

a less simple network

adjacency matrix

adjacency list

Page 37: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

directed networks

directed acyclic graph directed graph

citation networks

foodwebs*

epidemiological

others?

WWW

friendship?

flows of goods, information

economic exchange

dominance

neuronal

transcription

time travelers

Aij 6= Aji

Page 38: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

bipartite networks

2 3 4 51

1 2 3 41

2

3

4

23

4

5

1

authors & papers

actors & movies/scenes

musicians & albums

people & online groups

people & corporate boards

people & locations (checkins)

metabolites & reactions

genes & substrings

words & documents

plants & pollinators

no within-type edges

bipartite network }

Page 39: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

bipartite networks

2 3 4 51

1 2 3 41

2

3

4

23

4

5

1

authors & papers

actors & movies/scenes

musicians & albums

people & online groups

people & corporate boards

people & locations (checkins)

metabolites & reactions

genes & substrings

words & documents

plants & pollinators

no within-type edges

one-mode projections}bipartite

network }one type only

Page 40: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

t+ 3

23

4

5

1

23

4

5

1

23

4

5

1

t+ 1t t+ 2

23

4

5

1

temporal networks

any network over timediscrete time (snapshots), edges continuous time, edges

(i, j, t)

(i, j, ts,�t)

Page 41: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

describing networks

what networks look like

Page 42: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

describing networks

what networks look likequestions:• how are the edges organized?

• how do vertices differ?

• does network location matter?

• are there underlying patterns?

what we want to know• what processes shape these networks?

• how can we tell?

Page 43: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

describing networks

a first step : describe its features

Page 44: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

describing networks

a first step : describe its features

• degree distributions

• short-loop density (triangles, etc.)

• shortest paths (diameter, etc.)

• vertex positions

• correlations between these

f : G ! {x1, . . . , xk}

Page 45: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

describing networks

a first step : describe its features

f : object ! {x1, . . . , xk}

Page 46: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

describing networks

a first step : describe its features

• physical dimensions

• material density, composition

• radius of gyration

• correlations between these

helpful for exploration, but not what we want…

f : object ! {✓1, . . . , ✓k}

Page 47: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

describing networks

what we want : understand its structure

• what are the fundamental parts?

• how are these parts organized?

• where are the degrees of freedom ?

• how can we define an abstract class?

• structure — dynamics — function?

what does local-level structure look like?what does large-scale structure look like?how does structure constrain function?

f : object ! {✓1, . . . , ✓k}

~✓

Page 48: Network Analysis and Modeling - Santa Fe Institutetuvalu.santafe.edu/~aaronc/courses/5352/csci5352_2017_L0.pdfCSCI 5352 Network Analysis and Modeling: learning goals 1. develop a network

100150

200250

300

fin