Top Banner
(Social) Network Analysis Scott A. Hale Oxford Internet Institute http://www.scotthale.net/ 17 July 2014
19

Oxford Digital Humanities Summer School

May 06, 2015

Download

Education

Scott A. Hale

Slides
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Oxford Digital Humanities Summer School

(Social) Network Analysis

Scott A. HaleOxford Internet Institute

http://www.scotthale.net/

17 July 2014

Page 2: Oxford Digital Humanities Summer School

What are networks?

Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)

Additional details

Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.

Edges can be directed or undirected andweighted or unweighted

Additionally, networks may be multilayerand/or multimodal.

Page 3: Oxford Digital Humanities Summer School

What are networks?

Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)

Additional details

Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.

Edges can be directed or undirected andweighted or unweighted

Additionally, networks may be multilayerand/or multimodal.

Page 4: Oxford Digital Humanities Summer School

Why?

Characterize network structure

How far apart / well-connected are nodes?Are some nodes at more important positions?Is the network composed of communities?

How does network structure affect processes?

Information diffusionCoordination/cooperationResilience to failure/attack

Page 5: Oxford Digital Humanities Summer School

A network

First questions when approaching a network

What are edges? What are nodes?

What kind of network?

Inclusion/exclusion criteria

Page 6: Oxford Digital Humanities Summer School

Network data repositories

http://www.diggingintodata.org/Repositories/tabid/167/

Default.aspx

http://datamob.org

http://snap.stanford.edu/data

http://www-personal.umich.edu/~mejn/netdata

Page 7: Oxford Digital Humanities Summer School

Python resources

tweepy: Package for Twitter stream and search APIs (only python 2.7 atthe moment)

search and stream API example code along with code to creatementions/retweet network athttps://github.com/computermacgyver/twitter-python

Python two versions:

2.7.x – many packages, issues with non-English scripts

3.x – less packages, but excellent handling of international scripts(unicode)

Page 8: Oxford Digital Humanities Summer School

NetworkX

http://networkx.github.io/

Package to represent networks as python objects

Convenient functions to add, delete, iterate nodes/edges

Functions to calculate network statistics (degree, clustering, etc.)

Easily generate comparison graphs based on statistical models

Visualization

Alternatives include igraph (available for Python and R)

Page 9: Oxford Digital Humanities Summer School

Gephi

Open-source, cross-platform GUI interface

Primary strength is to visualize networks

Basic statistical properties are also available

Alternatives include NodeXL, Pajek, GUESS, NetDraw, Tulip, and more

Page 10: Oxford Digital Humanities Summer School

Network measures

With many nodes visualizations are often difficult/impossible to interpret.Statistical measures can be very revealing, however.

Node-level

Degree (in, out): How many incoming/outgoing edges does a node have?Centrality (next slide)Constraint

Network-level

Components: Number of disconnected subsets of nodesDensity: observed edges

maximum number of edges possible

Clustering coefficient closed tripletsconnected triples

Path length distributionDistributions of node-level measures

Page 11: Oxford Digital Humanities Summer School

Centrality measures

Degree

Closeness: Measures the average geodesic distance to ALL other nodes.Informally, an indication of the ability of a node to diffuse a propertyefficiently.

Betweenness: Number of shortest paths the node lies on. Informally,the betweenness is high if a node bridges clusters.

Eigenvector: A weighted degree centrality (inbound links from highlycentral nodes count more).

PageRank: Not strictly a centrality measure, but similar to eigenvectorbut modeled as a random walk with a teleportation parameter

Page 12: Oxford Digital Humanities Summer School

NetworkX: Nodes

import networkx as nx

g=nx.Graph() #A new (empty) undirected graph

g.add_node("Alan") #Add one new node

g.add_nodes_from(["Bob","Carol","Denise"])#Add three new nodes from list

#Nodes can have attributes

g.node["Alan"]["gender"]="M"

g.node["Bob"]["gender"]="M"

g.node["Carol"]["gender"]="F"

g.node["Denise"]["gender"]="F"

for n in g:

print("{0} has gender {1}".format(n,g.node[n]["gender"]))

Page 13: Oxford Digital Humanities Summer School

NetworkX: Edges

#Interesting graphs have edges

g.add_edge("Alan","Bob") #Add one new edge

#Add two new edges

g.add_edges_from([["Carol","Denise"],["Carol","Bob"]])

#Edge attributes

g.edge["Alan"]["Bob"]["relationship"]="Friends"

g.edge["Carol"]["Denise"]["relationship"]="Friends"

g.edge["Carol"]["Bob"]["relationship"]="Married"

#New edge with an attribute

g.add_edges_from([["Carol","Alan",

{"relationship":"Friends"}]])

Page 14: Oxford Digital Humanities Summer School

NetworkX: Edges

for e in g.edges_iter():

n1=e[0]

n2=e[1]

print("{0} and {1} are {2}".format(n1,n2,g.edge[n1][n2]["relationship"]))

Page 15: Oxford Digital Humanities Summer School

NetworkX: Measures

g.number_of_nodes()

g.nodes(data=True)

g.number_of_edges()

g.edges(data=True)

nx.info(g)

nx.density(g)

nx.number_connected_components(g)

nx.degree_histogram(g)

nx.betweenness_centrality(g)

nx.clustering(g)

nx.clustering(g, nodes=["Bob"])

Page 16: Oxford Digital Humanities Summer School

NetworkX: Visualize or save

#Save g to the file my_graph.graphml in graphml format

#prettyprint will make it nice for a human to read

nx.write_graphml(g,"my_graph.graphml",prettyprint=True)

#Layout g with the Fruchterman-Reingold force-directed

#algorithm and save the result to my_graph.png

#with_labels will label each node with its id

import matplotlib.pyplot as plt

nx.draw_spring(g,with_labels=True)

plt.savefig("my_graph.png")

plt.clf() #Clear plot

Page 17: Oxford Digital Humanities Summer School

NetworkX: Odds and ends

#Read a graph from the file my_graph.graphml in graphml format

g=nx.read_graphml("my_graph.graphml")

#Create a (empty) directed graph

g=nx.DiGraph()

See http://networkx.github.io/documentation/latest/reference/

index.html for many more commands. Note that some commands are onlyavailable on directed or undirected graphs.

Page 18: Oxford Digital Humanities Summer School

Resources

Newman, M.E.J., Networks: An Introduction

Kadushin, C., Understanding Social Networks: Theories, Concepts, andFindings

De Nooy, W., et al., Exploratory Social Network Analysis with Pajek

Shneiderman B., and Smith, M., Analyzing Social Media Networks withNodeXL

Page 19: Oxford Digital Humanities Summer School

(Social) Network Analysis

Scott A. HaleOxford Internet Institute

http://www.scotthale.net/

17 July 2014