Top Banner
Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
61
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Networks, Maps, Relations

(Humanities Hackathon 2012, Day 4)

Page 2: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 3: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 4: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Objects of study: novels, species, philosophers, philosophies, words, concepts, languages, songs….

The problem at hand: describe relationships between the objects. (similarity, influence, equivalence, co-location….)

Page 5: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Graphs

• Simplest case: relations between pairs of objects.

• BINARY: objects are either related or they’re not (no attempt to measure extent or other qualities)

Page 6: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

(D.P. Hayes, Social Network Theory and the Claim that Shakespeare of Stratford…)

Page 7: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 8: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 9: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

How I made this graph (not recommended)

• adj <- array(c(0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,0,0,0,1,0,1,1,0,1,0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,0,1,0,0,0,1,1,0,1,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,1,1,1,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,0,0,1,0,0,0,1,1,0,1,0,0),c(20,20))

• >PL = graph.adjacency(adj,mode="undirected")

Page 10: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

How I made this graph

>Names = c( "Beaumont”, "Chapman" "Chettle" , "Dekker”, "Drayton" "Fletcher" , "Greene" , "Heywood" "Jonson" , "Kyd” ,"Lodge” , "Lyly" "Marlowe" , "Marston" , "Middleton" "Munday" , "Nashe" , "Peele" "Webster" , "SHAKESPEARE”)

> V(PL)$name = Names OR> V(PL)$name <- Names

Page 11: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Graphs

A graph (or network) consists of:

• A set of vertices (or nodes)• A set of edges of the form (v,w) where v and w

are vertices.• Two vertices are adjacent if they are joined by

an edge.

Page 12: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Directed graphs

Undirected graphs model symmetric relations: A is connected to B means B is connected to A.

(similarity, overlap, blood relation…)

Directed graphs (or digraphs) model non-symmetric relations:

(biological descent, Internet links, phone calls…)

Page 13: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Weighted graphs

In a weighted graph, edges are assigned numbers – typically measuring the strength of a relation, not just whether it is there or not.

(e.g. edge from v to w records number of e-mails from v to w, not just existence of e-mail from v to w.)

Page 14: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 15: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 16: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Shakespeare graph (undirected):• Vertices are Elizabethan playwrights• Edges are collaborations (or friendships, or co-

defendancies)

Page 17: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 18: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 19: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

MORAL: A picture of a graph is not a graph. The graph is the list of adjacencies, nothing more.

Page 20: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

ASIDE: why do this?

Oversimplification, BUTAll statements about books are

oversimplifications, e.g. “Raymond Carver wrote Cathedral”

Our goal is “distant reading”

Page 21: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Basic notions

• The degree (or valence) of a vertex is the number of edges attached to it. Loose measure of “importance”

> degree(PL) Beaumont Chapman Chettle Dekker Drayton Fletcher 2 5 7 10 5

5

…Webster SHAKESPEARE

4 9

Page 22: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

• For directed graphs, the in-degree of a vertex x is the number of edges pointing to x, and the out-degree is the number of edges emanating from x.

• Web graph: in-degree = number of links pointing to my page, out-degree = number of outbound links on my page

Page 23: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Basic notions

• The distance between two vertices is the length of the shortest chain of adjacencies connecting them.

• > shortest.paths(PL,"SHAKESPEARE","Lyly")• Lyly• SHAKESPEARE 3• >

lapply(get.shortest.paths(PL,'SHAKESPEARE','Lyly'),function(x) V(PL)$name[x])

• [[1]]• [1] "SHAKESPEARE" "Greene" "Nashe" "Lyly" (sorry for this ugliness)

Page 24: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Basic notions

• The diameter of a graph is the greatest distance between any two vertices.

• > diameter(PL)• [1] 5• > farthest.nodes(PL)• [1] 1 12 5• > shortest.paths(PL,1,12)• Lyly• Beaumont 5

Page 25: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Complete graphs

• Every vertex adjacent to every other5 vertices10 edges

Page 26: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Complete graphs

More generally: n vertices, each vertex connected to n-1 others for a total of n(n-1)

This counts each edge twice!So (n^2-n)/2 edges.Number of edges scales as number of vertices

squared: studying a graph on 10 times as many vertices can take 100 times as long. (Or more, depending on the question asked…)

Page 27: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Trees

A tree is a graph in which every two vertices are joined by one, but only one, path. Equivalently: no cycles.

Page 28: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 29: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Communities

• A clique is a set of vertices which are all mutually adjacent.

(So: any pair of adjacent vertices is a clique of size 2, any “triangle” is a clique of size 3…)

• e.g Shakespeare, Dekker, Chettle.• > largest.cliques(PL)• [[1]]• [1] 4 3 16 8 20

(Dekker,Chettle,Munday,Heywood,Shakespeare)

Page 30: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 31: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Communities

A graph is connected if any vertex can be reached from any other by a chain of adjacencies. Every graph breaks up into connected pieces called connected components.

Page 32: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

A geometry of their own

“Really, universally, relations stop nowhere, and the exquisite problem of the artist is eternally but to draw, by a geometry of his own, the circle within which they shall happily appear to do so.” (Henry James, preface to Roderick Hudson)

How to draw this circle?

Page 33: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Clustering

Connected component: a set of vertices which has no connection to the remainder of the graph.

Cluster: a set of vertices which has relatively few connections to the rest of the graph.

(Note that this isn’t a definition…) Many ways to cluster, no “right way”

Page 34: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Clustering in R• > edge.betweenness.community(PL)• Graph community structure calculated with the edge betweenness algorithm• Number of communities (best split): 2 • Modularity (best split): 0.2781065 • Membership vector:• Membership vector:• Beaumont Chapman Chettle Dekker Drayton Fletcher • 1 1 1 1 1 1 • Greene Heywood Jonson Kyd Lodge Lyly • 2 1 1 2 2 2 • Marlowe Marston Middleton Munday Nashe Peele • 2 1 1 1 2 2 • Webster SHAKESPEARE • 1 1

Page 35: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

How the clusters look

Page 36: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

“The University Wits were a group of late 16th century English playwrights who were educated at the universities (Oxford or Cambridge) and who became playwrights and popular secular writers. Prominent members of this group were Christopher Marlowe, Robert Greene, and Thomas Nashe from Cambridge, and John Lyly, Thomas Lodge, George Peele from Oxford.” (Wikipedia)

Page 37: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Macbeth

Page 38: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Clusters of characters in Macbeth> edge.betweenness.community(Macbeth)Graph community structure calculated with the edge betweenness algorithmNumber of communities (best split): 10 Modularity (best split): 0.06733369 Membership vector: MACBETH LADY MACBETH MACDUFF MALCOLM 1 2 1 1 ROSS BANQUO First Witch LENNOX 1 3 4 1 First Murderer DUNCAN Second Witch Third Witch 2 5 4 4 ALL SIWARD Messenger Second Murderer 1 6 7 8 Servant SEYTON 9 10

Page 39: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Breakpoint

When can networks tell us things we don’t already know?

Page 40: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

200 names

Vertices: 200 baby names for boys popular in 2011.

For each name, record popularity in WI, TX, PA, CA, MA, GA, OH, MO, FL, CO, NY, IL

Edges: Two names are adjacent if their popularity distribution across states are “very similar”

Page 41: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

200 names

• >lapply(largest.cliques(MaleNames), function(x) V(MaleNames)$name[ x ])

[[1]][1] "Jacob" "Anthony" "Dylan" "Matthew"

"Brian" (popular in NY,CA,MA, less so in CO,MO,GA)

Page 42: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

200 names

• > V(MaleNames)$name[neighbors(MaleNames,'Malachi')]• [1] "Ashton" "Ashton" "Kaden" "Kaden" "Malachi"

"Malachi"• > V(MaleNames)$name[neighbors(MaleNames,'Owen')]• [1] "Maxwell" "Maxwell" "Brady" "Brady" "Cole" "Cole"

"Owen" "Owen" • V(MaleNames)$name[neighbors(MaleNames,'Patrick')]• [1] "Thomas" "Thomas" "Patrick" "Patrick" "John"

"John" "Sean" "Sean" "Ryan" "Ryan" "Peter" "Peter"

Page 43: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 44: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 45: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

edge.betweenness.communities finds groups of girls’ names like

• Alaina, Maci, Mackenzie, Lillian, Addison, Alivia

• Piper, Harper, Brooklyn, Brooklynn• Aubrey, Zoey, Autumn, Ellie• Lucy, Josephine, Elise, Clara, Eleanor

Page 46: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Density

How likely are two things to be related?The density of a graph is the probability that two random

elements are related: i.e.[total number of edges]/[total number of pairs of vertices]>graph.density(MaleNames)[1] 0.1084846> graph.density(FemaleNames)[1] 0.09950159>graph.density(Macbeth)[1] 0.2810458

Page 47: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Transitivity

• A relation is transitive if “A related to B” and “B related to C” implies “A related to C.”

Transitive: “Is descended from,” “born in same city as”

Non-transitive: “is friends with”, “lived at some point in same city as”

Page 48: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

How transitive is a graph?

Some relations are transitive, others are not. But we don’t have to stop at “yes” or “no”.

How frequently are two friends of yours friends with each other?

• Always• Never• Something in between

Page 49: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

How transitive is a graph?

Transitivity (or “clustering coefficient”) gives the probability that two random neighbors of the same vertex are neighbors to each other.

> transitivity(MaleNames)[1] 0.4972335> transitivity(FemaleNames)[1] 0.4546713> transitivity(Macbeth)[1] 0.4545455

Page 50: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

How transitive is a graph?

In both name cases, two random neighbors have about a 50% chance of being connected (while two random vertices have about a 10% chance of being connected.) Quite transitive!

Facebook thinks the same is true for “friends” (and makes this so by thinking so!)

Page 51: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Stub: incompletely specified networks

Standard problem: incomplete data. Did X and Y collaborate? Lack of an edge might mean “we know they didn’t” or “we don’t know that they did.”

One idea: use network structure – if graph is highly transitive, and X and Y have many common collaborators, this is evidence that X and Y collaborated.

Page 52: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Metrics, clustering, trees

Suppose given: a set of objects (e.g. novels) and for each pair of objects a degree of dissimilarity (a number)

(survey data, lexical similarity, voting similarity…)

This data (subject to “triangle inequality”) is called a metric on the set of objects.

Page 53: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Metrics, clustering, trees

Can we associate each object with a point on the plane so that the distances between points correspond to the dissimilarities between objects?

Page 54: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Metrics, clustering, trees

Distance From City Distance To City Distance (km)Newark Jersey City 8.02Paterson Elizabeth 28.3Toms River Edison 65.4Trenton Camden 45.55Clifton Cherry Hill 126.24Passaic East Orange 11.84Union City North Bergen 2.92Irvington Bayonne 12.38South VinelandWayne 176.47Union Vineland 149.49New BrunswickBloomfield42.14Perth Amboy East Brunswick 15.46West Orange Plainfield 23.19West New York Hackensack 11.18Sayreville Junction Lakewood 41.97Atlantic City Sayreville 121.87Teaneck Linden 36.19……

Page 55: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Metrics, clustering, trees

Doesn’t always work: 4 objects, each pair at distance 1.

Multidimensional scaling: embeds objects in the plane (or higher-dimensional space) while approximately realizing desired distances.

(e.g. Rosenberg, Nelson, Vivekananthan (1968)

Page 56: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 57: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

Hierarchical clustering

A clustering of a set is a partition into categories.A hierarchical clustering is when we partition

the categories into subcategories, subcategories into subsubcategories….

Page 58: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Page 59: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

A hierarchical clustering on a set of objects is the same as a tree whose leaves are the objects!

Agglomerative clustering, etc. – find hierarchical clustering that best respects measured dissimilarities (analogue of MDS)

Page 60: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

• Desideratum: objects that are very dissimilar should not be in the same subsubsubsubcategory (or: their distance in the tree should be large)

Page 61: Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)

LET US HACK!