A Perspective on Graph Theory and Network Science

A Perspective on

Graph Theory and Network Science

Marko A. Rodriguez

http://markorodriguez.com

http://twitter.com/twarko

http://www.slideshare.net/slidarko

Santa Fe Public School District – Santa Fe, New Mexico – July 6, 2010

July 5, 2010

http://markorodriguez.com

http://twitter.com/twarko

http://www.slideshare.net/slidarko

Abstract

The graph/network domain has been driven by the creativity of numerousindividuals from disparate areas of the academic and the commercialsector. Examples of contributing academic disciplines include mathematics,physics, sociology, and computer science. Given the interdisciplinary natureof the domain, it is difficult for any single individual to objectively realizeand speak about the space as a whole. Any presentation of the ideas isultimately biased by the formal training and expertise of the individual. Forthis reason, I will simply present on the domain from myperspective—from my personal experiences. More specifically, from myperspective biased by cognitive and computer science.

This is an autobiographical lecture on my life (so far) withgraphs/networks.

The Graph/Network

The term graph is used primarily in mathematics and the term network is used primarily

in physics. Both refer to a type of structure in which there exists vertices (i.e. nodes,

dots) and edges (i.e. links, lines). There are numerous types of graphs/networks which

yield more or less expressivity (i.e. more or less structure).

The Purpose of a Graph for Mathematicians

• Mathematicians are concerned with the abstract structure of a graph.

• Mathematicians define operations to analyze and manipulate graphs.Moreover, they develop theorems based upon structural axioms.

The Purpose of a Network for Physicists• Physicists are concerned with modeling real-world structures with

networks.

• Physicists define algorithms that compress the information in a networkto more simple values (e.g. statistical analysis).

Much of the World has a Graphical/Network Structure• Social networks: define how persons interact (collaborators, friends,

kins).

• Biological networks: define how biological components interact(protein, food chains, gene regulation).

• Transportation networks: define how cities are joined by air and roadroutes.

• Dependency networks: define how software modules use each other.

• Communication networks: define the relationships between Internetrouters.

• Language networks: define the relationships between words.

The Tour

• University of California at San Diego (1997-2001)

• University of California at Santa Cruz (2001-2007)

• Vrije Universiteit Brussel (2004-2005)

• Los Alamos National Laboratory (2005-2010)

• AT&T Interactive (2010-Present)

Undergrad at the University of California at San Diego

• Studied Cognitive Science (B.S.) and Computer Music (Minor) at theUniversity of California at San Diego. (1997-2001)

Cognitive Science at UCSD

• Neural networks: simplified models of how the brain encodes andprocesses information.1

? Neural networks exclude seemingly non-relevant aspects of thebiological counterpart (e.g. neurotransmitters, axon/soma/dendritedistinctions).

? No two signals on the brain are ever the same, yet we perceive aconsistent (object-oriented) world.

? Can be generally applied to classification irrespective of the signalbeing “human oriented” (e.g. non-sensory information).

? Neural networks are usually trained through experience.

1Please see: http://arxiv.org/abs/0811.3584

http://arxiv.org/abs/0811.3584

Cognitive Science at UCSD

Sign

al fr

om th

e W

orld

Classification of the Signal

Neural Network

Mice cortical networks are grown on multi-electrode arrays in order to study the

information properties of the structure through its development (left – done at LANL

during my PostDoc). Artificial neural networks are simplified models of the sufficient

components needed to process and classify information (right).

Computer Music at UCSD

• Spatial compositions: focused on the composition of music whichaccounted for/represented sound in 3D space.

? Amplitude (loud/quiet), pitch (high/low), timbre (guitar/drum), butwhat about music beyond stereo (left/right)?

? Developed algorithms to “trick the ear” into hearing sounds atparticular points in space.

? Made use of a data flow sound processing language called Max/MSP(see http://cycling74.com/).∗ Data flow languages allow one to define “process graphs”

(dependencies between functions represented as a graph).

http://cycling74.com/

Computer Music at UCSD

My data flows programs (left) take/generate sound, process it algorithmically, and emit it

through a 6-channel circular surround sound system (right). My senior thesis was a live

concert using a computer music system I developed called Monkey Space Colony 6.

Graduate at the University of California at Santa Cruz

• Studied Computer Science (M.S. and Ph.D.) at the University ofCalifornia at Santa Cruz. (2001-2007)

Collective Intelligence at UCSC

• Collective decision making: applications of collective intelligence tothe design of techo-government architectures.2 (2001-2004)

? We do not have the same restrictions as our founding fathers(e.g. communication limited by space).

? Is it possible to remove the representative layer of government byleveraging expertise/representation in social networks?

? What does a modern day direct democracy look like?? Can any actively participating subset of the population yield an

accurate model of the population as a whole.? Maintaining fidelity in that subset model is the point of dynamically

distributed democracy.

2Please see: 1.) http://arxiv.org/abs/cs/0412047 2.) http://arxiv.org/abs/cs/0609034 3.)http://arxiv.org/abs/0901.3929 4.) http://escholarship.org/uc/item/04h3h1cr

http://arxiv.org/abs/cs/0412047



http://escholarship.org/uc/item/04h3h1cr

Collective Intelligence at UCSC

percentage of active citizens

error

100 90 80 70 60 50 40 30 20 10 0

0.00

0.05

0.10

0.15

0.20

dynamically distributed democracydirect democracy

4

percentage of active citizens

pro

port

ion o

f corr

ect decis

ions

100 90 80 70 60 50 40 30 20 10 0

0.50

0.65

0.80

0.95

dynamically distributed democracy

direct democracy

(n)

Fig. 5. The relationship between k and evotek for direct democracy (gray

line) and dynamically distributed democracy (black line). The plot providesthe proportion of identical, correct decisions over a simulation that was runwith 1000 artificially generated networks composed of 100 citizens each.

As previously stated, let x ! [0, 1]n denote the politicaltendency of each citizen in this population, where xi is thetendency of citizen i and, for the purpose of simulation, isdetermined from a uniform distribution. Assume that everycitizen in a population of n citizens uses some social network-based system to create links to those individuals that theybelieve reflect their tendency the best. In practice, these linksmay point to a close friend, a relative, or some public figurewhose political tendencies resonate with the individual. Inother words, representatives are any citizens, not politicalcandidates that serve in public office. Let A ! [0, 1]n!n denotethe link matrix representing the network, where the weight ofan edge, for the purpose of simulation, is denoted

Ai,j =

!1 " |xi " xj | if link exists0 otherwise.

In words, if two linked citizens are identical in their politicaltendency, then the strength of the link is 1.0. If their tendenciesare completely opposing, then their trust (and the strength ofthe link) is 0.0. Note that a preferential attachment networkgrowth algorithm is used to generate a degree distribution thatis reflective of typical social networks “in the wild” (i.e. scale-free properties). Moreover, an assortativity parameter is usedto bias the connections in the network towards citizens withsimilar tendencies. The assumption here is that given a systemof this nature, it is more likely for citizens to create links tosimilar-minded individuals than to those whose opinions arequite different. The resultant link matrix A is then normalizedto be row stochastic in order to generate a probability distribu-tion over the weights of the outgoing edges of a citizen. Figure6 presents an example of an n = 100 artificially generatedtrust-based social network, where red denotes a tendency of0.0, purple a tendency of 0.5, and blue a tendency of 1.0.

Given this social network infrastructure, it is possible to bet-ter ensure that the collective tendency and vote is appropriatelyrepresented through a weighting of the active, participatingpopulation. Every citizen, active or not, is initially provide with

Fig. 6. A visualization of a network of trust links between citizens. Eachcitizen’s color denotes their “political tendency”, where full red is 0, full blueis 1, and purple is 0.5. The layout algorithm chosen is the Fruchterman-Reingold layout.

1n “vote power” and this is represented in the vector ! ! Rn

+,such that the total amount of vote power in the population is1. Let y ! Rn

+ denote the total amount of vote power that hasflowed to each citizen over the course of the algorithm. Finally,a ! {0, 1}n denotes whether citizen i is participating (ai = 1)in the current decision making process or not (ai = 0). Thevalues of a are biased by an unfair coin that has probability kof making the citizen an active participant and 1"k of makingthe citizen inactive. The iterative algorithm is presented below,where # denotes entry-wise multiplication and " $ 1.

! % 0while

"i"ni=1 yi < " do

y % y + (! # a)! % ! # (1 " a)! % A!

end

In words, active citizens serve as vote power “sinks” inthat once they receive vote power, from themselves or froma neighbor in the network, they do not pass it on. Inactivecitizens serve as vote power “sources” in that they propagatetheir vote power over the network links to their neighborsiteratively until all (or ") vote power has reached activecitizens. At this point, the tendency in the active populationis defined as #tend = x · y. Figure 4 plots the error incurredusing dynamically distributed democracy (black line), wherethe error is defined as

etendk = |dtend

100 " #tendk |.

Next, the collective vote #votek is determined by a weighted

majority as dictated by the vote power accumulated by activeparticipants. Figure 5 plots the proportion of votes that aredifferent from what a fully participating population would

People do not vote for a representative. Instead, they maintain a ego-network of whose ideas they respect in

certain domains (e.g. health care, military, etc.). People in one’s network can be friends, family members,

scientists, public figures, etc. Any one, through the Internet, can vote on any decision. However, the

moment they abstain from voting, their vote power is transferred through their network (according to the

domain of decision). Power aggregates at those that participate in the current decision.

Visiting Researcher at the Vrije Universiteit Brussel

• Studied collective intelligence as a Visiting Researcher at the Centerfor Evolution, Complexity, and Cognition of the Vrije Universiteit Brussel.(2004-2005)

Collective Intelligence at the Vrije Universiteit Brussel

• Automating the scholarly process: Designed algorithms that exploitbibliographic networks in order to support the scholarly communicationprocess. (2004-2005)3

? Can the network of scholars, articles, journals, universities, conferences,funding sources, etc. be leveraged to algorithmically support thescholarly process?∗ Can you find me articles related to my interests?∗ Can you find me collaborators to work with me on my ideas?∗ Can you find me a venue to publish my work in?∗ Can you find me experts to peer-review a submitted article?∗ Can you find me people to talk to (and concepts to talk about) at

the conference I’m going to?

3Please see: 1.) http://arxiv.org/abs/cs/0601121 2.) http://arxiv.org/abs/cs/0605112 3.)http://arxiv.org/abs/0905.1594




Collective Intelligence at the Vrije Universiteit Brussel

Example: Determining experts to peer-review an article can be done automatically and

with a sensitivity to conflict of interest situations. The spreading activation algorithm

used is analogous, in many ways, to neural networks. Can we think of the networks we (as

a society) implicitly create as a some sort of “collective neural substrate?” Can we then

apply similar algorithms that are found in biological systems? Can our implicitly generated

networks serve as a substrate for problem-solving?

Graduate Researcher at Los Alamos National Laboratory

• Studied bibliometrics as a graduate student on the Digital LibraryResearch and Prototyping Team of the Los Alamos National Laboratory.(2005-2007)

Bibliometrics at Los Alamos National Laboratory

• Bibliometrics: the study of the scholarly process through the digitalfootprint left by scholars — (“the science of science”) (2005-2007)4

? Wrote my dissertation while with the Digital Library Research andPrototyping Team (Johan Bollen, Herbert Van de Sompel, and AlbertoPepe). A very fruitful time in my academic career.

? Continued my work with problem-solving in scholarly networks.? Studied how scholars use information by studying how they download

articles (see http://mesur.org).

4Please see: 1.) http://arxiv.org/abs/cs/0601030 2.) http://arxiv.org/abs/0708.1150

3.) http://arxiv.org/abs/0804.3791 4.) http://arxiv.org/abs/0801.2345 5.) http://

arxiv.org/abs/0807.0023 6.) http://dx.doi.org/10.1371/journal.pone.0004803 7.) http:

//arxiv.org/abs/0911.4223 8.) http://arxiv.org/abs/cs/0605110

http://mesur.org







http://dx.doi.org/10.1371/journal.pone.0004803




Bibliometrics at Los Alamos National Laboratory

Each vertex (node) is a particular journal. Colors denote the journal domain. A directed edge (link) denotes

that a scholar read an article in journal A then one in journal B. This map provides us a collectively

generated representation of the knowledge transfer between domains (i.e. “folksonomy” of domains).

Web of Data at Los Alamos National Laboratory

• Web of Data: the representation of the world’s data within the globalURI (super class of URL) address space.5

? For the most part, data is local to a computer with no easy way fordata on one computer to reference data on another.∗ The World Wide Web provided a way to link documents across

computers, but what about data?? By placing data “on the Web” in a similar manner to how we place

documents on the Web, we can turn the Web into a distributeddatabase.∗ This heterogenous network/graph of data opens the door to new

types of problem-solving.

5Please see: 1.) http://arxiv.org/abs/0904.0027 2.) http://arxiv.org/abs/0908.0373 3.)http://arxiv.org/abs/1006.1080 4.) http://arxiv.org/abs/0905.3378 5.) http://arxiv.org/

abs/0704.3395 6.) http://arxiv.org/abs/0802.3492 7.) http://arxiv.org/abs/0903.0194









Web of Data at Los Alamos National Laboratorydata set domain data set domain data set domain

audioscrobbler music govtrack government pubguide booksbbclatertotp music homologene biology qdos socialbbcplaycountdata music ibm computer rae2001 computerbbcprogrammes media ieee computer rdfbookmashup booksbudapestbme computer interpro biology rdfohloh socialchebi biology jamendo music resex computercrunchbase business laascnrs computer riese governmentdailymed medical libris books semanticweborg computerdblpberlin computer lingvoj reference semwebcentral socialdblphannover computer linkedct medical siocsites socialdblprkbexplorer computer linkedmdb movie surgeradio musicdbpedia general magnatune music swconferencecorpus computerdoapspace social musicbrainz music taxonomy referencedrugbank medical myspacewrapper social umbel generaleurecom computer opencalais reference uniref biologyeurostat government opencyc general unists biologyflickrexporter images openguides reference uscensusdata governmentflickrwrappr images pdb biology virtuososponger referencefoafprofiles social pfam biology w3cwordnet referencefreebase general pisa computer wikicompany businessgeneid biology prodom biology worldfactbook governmentgeneontology biology projectgutenberg books yago generalgeonames geographic prosite biology . . .


geospecies

freebase

dbpedia

libris

geneid

interpro

hgnc

symbol

pubmed

mgi

geneontology

uniprot

pubchem

unists

omim

homologene

pfam

pdb

reactome

chebi

uniparc

kegg

cas

uniref

prodomprosite

taxonomy

dailymed

linkedct

acm

dblprkbexplorer

laascnrs

newcastle

eprints

ecssouthampton

irittoulouseciteseer

pisa

resexibm

ieee

rae2001

budapestbme

eurecom

dblphannover

diseasome

drugbank

geonames

yago

opencyc

w3cwordnet

umbel

linkedmdb

rdfbookmashup

flickrwrappr

surgeradio

musicbrainz myspacewrapper

bbcplaycountdata

bbcprogrammes

semanticweborg

revyu

swconferencecorpus

lingvoj

pubguide

crunchbase

foafprofiles

riese

qdos

audioscrobbler

flickrexporter

bbcjohnpeel

wikicompany

govtrack

uscensusdata

openguides

doapspace

bbclatertotp

eurostat

semwebcentral

dblpberlin

siocsites

jamendo

magnatuneworldfactbook

projectgutenberg

opencalais

rdfohloh

virtuososponger

geospecies

freebase

dbpedia

libris

geneid

interpro

hgnc

symbol

pubmed

mgi

geneontology

uniprot

pubchem

unists

omim

homologene

pfam

pdb

reactome

chebi

uniparc

kegg

cas

uniref

prodomprosite

taxonomy

dailymed

linkedct

acm

dblprkbexplorer

laascnrs

newcastle

eprints

ecssouthampton

irittoulouseciteseer

pisa

resexibm

ieee

rae2001

budapestbme

eurecom

dblphannover

diseasome

drugbank

geonames

yago

opencyc

w3cwordnet

umbel

linkedmdb

rdfbookmashup

flickrwrappr

surgeradio

musicbrainz myspacewrapper

bbcplaycountdata

bbcprogrammes

semanticweborg

revyu

swconferencecorpus

lingvoj

pubguide

crunchbase

foafprofiles

riese

qdos

audioscrobbler

flickrexporter

bbcjohnpeel

wikicompany

govtrack

uscensusdata

openguides

doapspace

bbclatertotp

eurostat

semwebcentral

dblpberlin

siocsites

jamendo

magnatuneworldfactbook

projectgutenberg

opencalais

rdfohloh

virtuososponger

Each vertex (node) represents a data set. A directed edge (link) denotes that data set A

makes reference to data in data set B.


Web of Data

127.0.0.1 127.0.0.2 127.0.0.3

Application 1 Application 2 Application 3

structures structuresstructures

processes processes processes

127.0.0.1 127.0.0.2 127.0.0.3

Application 1 Application 2 Application 3

structures structures structures

processes processes processes

Data is currently in silos (left). For example, Amazon.com can only recommend other

Amazon.com products. What about recommending a job to take based upon the books

you read, the people you know, etc. (right). Can a collectively generated model of the

world help people to find their place in the life? (http://bit.ly/cLWL3F)

http://bit.ly/cLWL3F


urn:uuid:6e400b42

hasBlock

urn:uuid:4e0bada0

urn:uuid:51b8d4a0

hasLeft

urn:uuid:54e14d4c

urn:uuid:6425e5ec

hasURI

"1"^^xsd:int

urn:uuid:67bbd072

hasURI

"2"^^xsd:int

urn:uuid:4fa0f752

hasMethod

rdf:type demo:Human

"a"^^xsd:string

"example"^^xsd:string

hasMethodName

hasURI

trueInst urn:uuid: 610eb4b0

nextInst

nextInst

urn:uuid:0748e1c6

falseInst

nextInst

urn:uuid:62e8b8dc

nextInst

urn:uuid:008e999a

Block

Method

Equals

LocalDirect

Return

Return

LocalDirect

Block

Block

PushValueurn:uuid:5c4d5bc2

hasValue

urn:uuid:6d451a1e

hasValue

PushValue

LocalDirect

urn:uuid:51b8d4a0

Branch

nextInst

nextInst

hasRight

"marko"^^xsd:string

urn:uuid:5869b878

hasURI

LocalDirect

halt

Fhat

Instruction

programLocation

Frame

hasFrame

[0..*]

[0..1]

returnTop

ReturnStack

Instruction

rdf:firstrdf:rest

[0..1][0..1]

blockTop

[0..*]

FrameVariable

rdf:li

hasValue

rdfs:Resource

operandTop

OperandStack

rdfs:Resource

rdf:firstrdf:rest

[0..1]

[0..1]

[0..1]

RVM

[0..*]

hasSymbol

xsd:string

[1]

xsd:boolean[1]

forFrame[1]

fromBlock

Block

[1]

currentFrame

[0..1]

methodReuse

xsd:boolean[1]

[0..1]

BlockStack

Block

rdf:firstrdf:rest

[0..1]

[0..1]

[0..1]

A more esoteric body of work was developed at this time that dealt with the encoding of

not only data into the Web of Data, but also process. This included the distributed

representation of computing instructions (left) and virtual machines (right).

PostDoc Researcher at Los Alamos National Laboratory

• Studied graph theory and ethics as a Director’s Fellow PostDoc atthe Center for Nonlinear Studies of the Los Alamos National Laboratory.(2007-2010)

Path Algebra at Los Alamos National Laboratory

• Path Algebra: concerned with how to move through a graph in anintelligent, directed manner in order to solve problems using graphs.6

? The algebra contains a set of elements: vertices and edges.? The algebra contains a set of operations: traverse, filter, clip, merge,

split, not, etc.? The algebra provides a theory for how to develop graph traversal

engines (i.e. graph processors).

6Please see: 1.) http://arxiv.org/abs/0806.2274 2.) http://arxiv.org/abs/0803.4355 3.)http://gremlin.tinkerpop.com 4.) http://pipes.tinkerpop.com



http://gremlin.tinkerpop.com

http://pipes.tinkerpop.com

Path Algebra at Los Alamos National Laboratory

The general theme of controlling how a walker moves through a graph has numerous

applications including searching, ranking, scoring, recommendation, etc. within a graph.

Eudaemonics at Los Alamos National Laboratory

• Eudaemonics: an ethical theory stating that it is everyone’s moralresponsibility to be “happy” (i.e. to live engaged in the world). See thework of Aristotle and David L. Norton.7

? Are recommender systems evolving to become eudaemonic engines?∗ Movies (e.g. NetFlix), books (e.g. GoodReads), life partners

(e.g. Match.com), careers (e.g. Montster), etc.∗ Can we interrelate all this data and traverse it for problem-solving?

7Please see: 1.) http://arxiv.org/abs/0903.0200 2.) http://arxiv.org/abs/0904.0027



Graph Systems Architect at AT&T Interactive

• Work in theoretical and applied models of problem-solving with graphtraversals and graph databases. (2010-present)

Graphs at AT&T Interactive

• Graph Traversal: the development of theories and applications of graphtraversals in real-world problem-solving situations.8

? Continue to work on path algebra (extensions to include a non-matrixbased, ring theoretic model and a diffusion model).

? Continue to work on open source graph-related technologies to supportgraph related efforts at AT&Ti (see http://www.tinkerpop.com).

• Recommender Systems: the development of applications for real-time,“themed” recommendations (i.e. a problem-solving graph engine).

? AT&Ti maintains a collection of interesting data sets.? Make use of such data for numerous types of recommendation.

8Please see: 1.) http://arxiv.org/abs/1004.1001 2.) http://arxiv.org/abs/1006.2361

http://www.tinkerpop.com



Conclusion

• Graphs/networks touch numerous disciplines.

• Many aspects of the world can be modeled as a graph/network.

• Graph traversal algorithms show promise as a general-purposestyle/pattern for computing.

A Perspective on Graph Theory and Network Science

Technology

biological networks

language networks

communication networks

dependency networks

transportation networks

ucsd neural networks

computer science

articial neural networks