A Perspective on Graph Theory and Network Science Marko A. Rodriguez http://markorodriguez.com http://twitter.com/twarko http://www.slideshare.net/slidarko Santa Fe Public School District – Santa Fe, New Mexico – July 6, 2010 July 5, 2010
May 11, 2015
A Perspective on
Graph Theory and Network Science
Marko A. Rodriguez
http://markorodriguez.com
http://twitter.com/twarko
http://www.slideshare.net/slidarko
Santa Fe Public School District – Santa Fe, New Mexico – July 6, 2010
July 5, 2010
Abstract
The graph/network domain has been driven by the creativity of numerousindividuals from disparate areas of the academic and the commercialsector. Examples of contributing academic disciplines include mathematics,physics, sociology, and computer science. Given the interdisciplinary natureof the domain, it is difficult for any single individual to objectively realizeand speak about the space as a whole. Any presentation of the ideas isultimately biased by the formal training and expertise of the individual. Forthis reason, I will simply present on the domain from myperspective—from my personal experiences. More specifically, from myperspective biased by cognitive and computer science.
This is an autobiographical lecture on my life (so far) withgraphs/networks.
The Graph/Network
The term graph is used primarily in mathematics and the term network is used primarily
in physics. Both refer to a type of structure in which there exists vertices (i.e. nodes,
dots) and edges (i.e. links, lines). There are numerous types of graphs/networks which
yield more or less expressivity (i.e. more or less structure).
The Purpose of a Graph for Mathematicians
• Mathematicians are concerned with the abstract structure of a graph.
• Mathematicians define operations to analyze and manipulate graphs.Moreover, they develop theorems based upon structural axioms.
The Purpose of a Network for Physicists• Physicists are concerned with modeling real-world structures with
networks.
• Physicists define algorithms that compress the information in a networkto more simple values (e.g. statistical analysis).
Much of the World has a Graphical/Network Structure• Social networks: define how persons interact (collaborators, friends,
kins).
• Biological networks: define how biological components interact(protein, food chains, gene regulation).
• Transportation networks: define how cities are joined by air and roadroutes.
• Dependency networks: define how software modules use each other.
• Communication networks: define the relationships between Internetrouters.
• Language networks: define the relationships between words.
The Tour
• University of California at San Diego (1997-2001)
• University of California at Santa Cruz (2001-2007)
• Vrije Universiteit Brussel (2004-2005)
• Los Alamos National Laboratory (2005-2010)
• AT&T Interactive (2010-Present)
Undergrad at the University of California at San Diego
• Studied Cognitive Science (B.S.) and Computer Music (Minor) at theUniversity of California at San Diego. (1997-2001)
Cognitive Science at UCSD
• Neural networks: simplified models of how the brain encodes andprocesses information.1
? Neural networks exclude seemingly non-relevant aspects of thebiological counterpart (e.g. neurotransmitters, axon/soma/dendritedistinctions).
? No two signals on the brain are ever the same, yet we perceive aconsistent (object-oriented) world.
? Can be generally applied to classification irrespective of the signalbeing “human oriented” (e.g. non-sensory information).
? Neural networks are usually trained through experience.
1Please see: http://arxiv.org/abs/0811.3584
Cognitive Science at UCSD
Sign
al fr
om th
e W
orld
Classification of the Signal
Neural Network
Mice cortical networks are grown on multi-electrode arrays in order to study the
information properties of the structure through its development (left – done at LANL
during my PostDoc). Artificial neural networks are simplified models of the sufficient
components needed to process and classify information (right).
Computer Music at UCSD
• Spatial compositions: focused on the composition of music whichaccounted for/represented sound in 3D space.
? Amplitude (loud/quiet), pitch (high/low), timbre (guitar/drum), butwhat about music beyond stereo (left/right)?
? Developed algorithms to “trick the ear” into hearing sounds atparticular points in space.
? Made use of a data flow sound processing language called Max/MSP(see http://cycling74.com/).∗ Data flow languages allow one to define “process graphs”
(dependencies between functions represented as a graph).
Computer Music at UCSD
My data flows programs (left) take/generate sound, process it algorithmically, and emit it
through a 6-channel circular surround sound system (right). My senior thesis was a live
concert using a computer music system I developed called Monkey Space Colony 6.
Graduate at the University of California at Santa Cruz
• Studied Computer Science (M.S. and Ph.D.) at the University ofCalifornia at Santa Cruz. (2001-2007)
Collective Intelligence at UCSC
• Collective decision making: applications of collective intelligence tothe design of techo-government architectures.2 (2001-2004)
? We do not have the same restrictions as our founding fathers(e.g. communication limited by space).
? Is it possible to remove the representative layer of government byleveraging expertise/representation in social networks?
? What does a modern day direct democracy look like?? Can any actively participating subset of the population yield an
accurate model of the population as a whole.? Maintaining fidelity in that subset model is the point of dynamically
distributed democracy.
2Please see: 1.) http://arxiv.org/abs/cs/0412047 2.) http://arxiv.org/abs/cs/0609034 3.)http://arxiv.org/abs/0901.3929 4.) http://escholarship.org/uc/item/04h3h1cr
Collective Intelligence at UCSC
percentage of active citizens
error
100 90 80 70 60 50 40 30 20 10 0
0.00
0.05
0.10
0.15
0.20
dynamically distributed democracydirect democracy
4
percentage of active citizens
pro
port
ion o
f corr
ect decis
ions
100 90 80 70 60 50 40 30 20 10 0
0.50
0.65
0.80
0.95
dynamically distributed democracy
direct democracy
(n)
Fig. 5. The relationship between k and evotek for direct democracy (gray
line) and dynamically distributed democracy (black line). The plot providesthe proportion of identical, correct decisions over a simulation that was runwith 1000 artificially generated networks composed of 100 citizens each.
As previously stated, let x ! [0, 1]n denote the politicaltendency of each citizen in this population, where xi is thetendency of citizen i and, for the purpose of simulation, isdetermined from a uniform distribution. Assume that everycitizen in a population of n citizens uses some social network-based system to create links to those individuals that theybelieve reflect their tendency the best. In practice, these linksmay point to a close friend, a relative, or some public figurewhose political tendencies resonate with the individual. Inother words, representatives are any citizens, not politicalcandidates that serve in public office. Let A ! [0, 1]n!n denotethe link matrix representing the network, where the weight ofan edge, for the purpose of simulation, is denoted
Ai,j =
!1 " |xi " xj | if link exists0 otherwise.
In words, if two linked citizens are identical in their politicaltendency, then the strength of the link is 1.0. If their tendenciesare completely opposing, then their trust (and the strength ofthe link) is 0.0. Note that a preferential attachment networkgrowth algorithm is used to generate a degree distribution thatis reflective of typical social networks “in the wild” (i.e. scale-free properties). Moreover, an assortativity parameter is usedto bias the connections in the network towards citizens withsimilar tendencies. The assumption here is that given a systemof this nature, it is more likely for citizens to create links tosimilar-minded individuals than to those whose opinions arequite different. The resultant link matrix A is then normalizedto be row stochastic in order to generate a probability distribu-tion over the weights of the outgoing edges of a citizen. Figure6 presents an example of an n = 100 artificially generatedtrust-based social network, where red denotes a tendency of0.0, purple a tendency of 0.5, and blue a tendency of 1.0.
Given this social network infrastructure, it is possible to bet-ter ensure that the collective tendency and vote is appropriatelyrepresented through a weighting of the active, participatingpopulation. Every citizen, active or not, is initially provide with
Fig. 6. A visualization of a network of trust links between citizens. Eachcitizen’s color denotes their “political tendency”, where full red is 0, full blueis 1, and purple is 0.5. The layout algorithm chosen is the Fruchterman-Reingold layout.
1n “vote power” and this is represented in the vector ! ! Rn
+,such that the total amount of vote power in the population is1. Let y ! Rn
+ denote the total amount of vote power that hasflowed to each citizen over the course of the algorithm. Finally,a ! {0, 1}n denotes whether citizen i is participating (ai = 1)in the current decision making process or not (ai = 0). Thevalues of a are biased by an unfair coin that has probability kof making the citizen an active participant and 1"k of makingthe citizen inactive. The iterative algorithm is presented below,where # denotes entry-wise multiplication and " $ 1.
! % 0while
"i"ni=1 yi < " do
y % y + (! # a)! % ! # (1 " a)! % A!
end
In words, active citizens serve as vote power “sinks” inthat once they receive vote power, from themselves or froma neighbor in the network, they do not pass it on. Inactivecitizens serve as vote power “sources” in that they propagatetheir vote power over the network links to their neighborsiteratively until all (or ") vote power has reached activecitizens. At this point, the tendency in the active populationis defined as #tend = x · y. Figure 4 plots the error incurredusing dynamically distributed democracy (black line), wherethe error is defined as
etendk = |dtend
100 " #tendk |.
Next, the collective vote #votek is determined by a weighted
majority as dictated by the vote power accumulated by activeparticipants. Figure 5 plots the proportion of votes that aredifferent from what a fully participating population would
People do not vote for a representative. Instead, they maintain a ego-network of whose ideas they respect in
certain domains (e.g. health care, military, etc.). People in one’s network can be friends, family members,
scientists, public figures, etc. Any one, through the Internet, can vote on any decision. However, the
moment they abstain from voting, their vote power is transferred through their network (according to the
domain of decision). Power aggregates at those that participate in the current decision.
Visiting Researcher at the Vrije Universiteit Brussel
• Studied collective intelligence as a Visiting Researcher at the Centerfor Evolution, Complexity, and Cognition of the Vrije Universiteit Brussel.(2004-2005)
Collective Intelligence at the Vrije Universiteit Brussel
• Automating the scholarly process: Designed algorithms that exploitbibliographic networks in order to support the scholarly communicationprocess. (2004-2005)3
? Can the network of scholars, articles, journals, universities, conferences,funding sources, etc. be leveraged to algorithmically support thescholarly process?∗ Can you find me articles related to my interests?∗ Can you find me collaborators to work with me on my ideas?∗ Can you find me a venue to publish my work in?∗ Can you find me experts to peer-review a submitted article?∗ Can you find me people to talk to (and concepts to talk about) at
the conference I’m going to?
3Please see: 1.) http://arxiv.org/abs/cs/0601121 2.) http://arxiv.org/abs/cs/0605112 3.)http://arxiv.org/abs/0905.1594
Collective Intelligence at the Vrije Universiteit Brussel
Example: Determining experts to peer-review an article can be done automatically and
with a sensitivity to conflict of interest situations. The spreading activation algorithm
used is analogous, in many ways, to neural networks. Can we think of the networks we (as
a society) implicitly create as a some sort of “collective neural substrate?” Can we then
apply similar algorithms that are found in biological systems? Can our implicitly generated
networks serve as a substrate for problem-solving?
Graduate Researcher at Los Alamos National Laboratory
• Studied bibliometrics as a graduate student on the Digital LibraryResearch and Prototyping Team of the Los Alamos National Laboratory.(2005-2007)
Bibliometrics at Los Alamos National Laboratory
• Bibliometrics: the study of the scholarly process through the digitalfootprint left by scholars — (“the science of science”) (2005-2007)4
? Wrote my dissertation while with the Digital Library Research andPrototyping Team (Johan Bollen, Herbert Van de Sompel, and AlbertoPepe). A very fruitful time in my academic career.
? Continued my work with problem-solving in scholarly networks.? Studied how scholars use information by studying how they download
articles (see http://mesur.org).
4Please see: 1.) http://arxiv.org/abs/cs/0601030 2.) http://arxiv.org/abs/0708.1150
3.) http://arxiv.org/abs/0804.3791 4.) http://arxiv.org/abs/0801.2345 5.) http://
arxiv.org/abs/0807.0023 6.) http://dx.doi.org/10.1371/journal.pone.0004803 7.) http:
//arxiv.org/abs/0911.4223 8.) http://arxiv.org/abs/cs/0605110
Bibliometrics at Los Alamos National Laboratory
Each vertex (node) is a particular journal. Colors denote the journal domain. A directed edge (link) denotes
that a scholar read an article in journal A then one in journal B. This map provides us a collectively
generated representation of the knowledge transfer between domains (i.e. “folksonomy” of domains).
Web of Data at Los Alamos National Laboratory
• Web of Data: the representation of the world’s data within the globalURI (super class of URL) address space.5
? For the most part, data is local to a computer with no easy way fordata on one computer to reference data on another.∗ The World Wide Web provided a way to link documents across
computers, but what about data?? By placing data “on the Web” in a similar manner to how we place
documents on the Web, we can turn the Web into a distributeddatabase.∗ This heterogenous network/graph of data opens the door to new
types of problem-solving.
5Please see: 1.) http://arxiv.org/abs/0904.0027 2.) http://arxiv.org/abs/0908.0373 3.)http://arxiv.org/abs/1006.1080 4.) http://arxiv.org/abs/0905.3378 5.) http://arxiv.org/
abs/0704.3395 6.) http://arxiv.org/abs/0802.3492 7.) http://arxiv.org/abs/0903.0194
Web of Data at Los Alamos National Laboratorydata set domain data set domain data set domain
audioscrobbler music govtrack government pubguide booksbbclatertotp music homologene biology qdos socialbbcplaycountdata music ibm computer rae2001 computerbbcprogrammes media ieee computer rdfbookmashup booksbudapestbme computer interpro biology rdfohloh socialchebi biology jamendo music resex computercrunchbase business laascnrs computer riese governmentdailymed medical libris books semanticweborg computerdblpberlin computer lingvoj reference semwebcentral socialdblphannover computer linkedct medical siocsites socialdblprkbexplorer computer linkedmdb movie surgeradio musicdbpedia general magnatune music swconferencecorpus computerdoapspace social musicbrainz music taxonomy referencedrugbank medical myspacewrapper social umbel generaleurecom computer opencalais reference uniref biologyeurostat government opencyc general unists biologyflickrexporter images openguides reference uscensusdata governmentflickrwrappr images pdb biology virtuososponger referencefoafprofiles social pfam biology w3cwordnet referencefreebase general pisa computer wikicompany businessgeneid biology prodom biology worldfactbook governmentgeneontology biology projectgutenberg books yago generalgeonames geographic prosite biology . . .
Web of Data at Los Alamos National Laboratory
geospecies
freebase
dbpedia
libris
geneid
interpro
hgnc
symbol
pubmed
mgi
geneontology
uniprot
pubchem
unists
omim
homologene
pfam
pdb
reactome
chebi
uniparc
kegg
cas
uniref
prodomprosite
taxonomy
dailymed
linkedct
acm
dblprkbexplorer
laascnrs
newcastle
eprints
ecssouthampton
irittoulouseciteseer
pisa
resexibm
ieee
rae2001
budapestbme
eurecom
dblphannover
diseasome
drugbank
geonames
yago
opencyc
w3cwordnet
umbel
linkedmdb
rdfbookmashup
flickrwrappr
surgeradio
musicbrainz myspacewrapper
bbcplaycountdata
bbcprogrammes
semanticweborg
revyu
swconferencecorpus
lingvoj
pubguide
crunchbase
foafprofiles
riese
qdos
audioscrobbler
flickrexporter
bbcjohnpeel
wikicompany
govtrack
uscensusdata
openguides
doapspace
bbclatertotp
eurostat
semwebcentral
dblpberlin
siocsites
jamendo
magnatuneworldfactbook
projectgutenberg
opencalais
rdfohloh
virtuososponger
geospecies
freebase
dbpedia
libris
geneid
interpro
hgnc
symbol
pubmed
mgi
geneontology
uniprot
pubchem
unists
omim
homologene
pfam
pdb
reactome
chebi
uniparc
kegg
cas
uniref
prodomprosite
taxonomy
dailymed
linkedct
acm
dblprkbexplorer
laascnrs
newcastle
eprints
ecssouthampton
irittoulouseciteseer
pisa
resexibm
ieee
rae2001
budapestbme
eurecom
dblphannover
diseasome
drugbank
geonames
yago
opencyc
w3cwordnet
umbel
linkedmdb
rdfbookmashup
flickrwrappr
surgeradio
musicbrainz myspacewrapper
bbcplaycountdata
bbcprogrammes
semanticweborg
revyu
swconferencecorpus
lingvoj
pubguide
crunchbase
foafprofiles
riese
qdos
audioscrobbler
flickrexporter
bbcjohnpeel
wikicompany
govtrack
uscensusdata
openguides
doapspace
bbclatertotp
eurostat
semwebcentral
dblpberlin
siocsites
jamendo
magnatuneworldfactbook
projectgutenberg
opencalais
rdfohloh
virtuososponger
Each vertex (node) represents a data set. A directed edge (link) denotes that data set A
makes reference to data in data set B.
Web of Data at Los Alamos National Laboratory
Web of Data
127.0.0.1 127.0.0.2 127.0.0.3
Application 1 Application 2 Application 3
structures structuresstructures
processes processes processes
127.0.0.1 127.0.0.2 127.0.0.3
Application 1 Application 2 Application 3
structures structures structures
processes processes processes
Data is currently in silos (left). For example, Amazon.com can only recommend other
Amazon.com products. What about recommending a job to take based upon the books
you read, the people you know, etc. (right). Can a collectively generated model of the
world help people to find their place in the life? (http://bit.ly/cLWL3F)
Web of Data at Los Alamos National Laboratory
urn:uuid:6e400b42
hasBlock
urn:uuid:4e0bada0
urn:uuid:51b8d4a0
hasLeft
urn:uuid:54e14d4c
urn:uuid:6425e5ec
hasURI
"1"^^xsd:int
urn:uuid:67bbd072
hasURI
"2"^^xsd:int
urn:uuid:4fa0f752
hasMethod
rdf:type demo:Human
"a"^^xsd:string
"example"^^xsd:string
hasMethodName
hasURI
trueInst urn:uuid: 610eb4b0
nextInst
nextInst
urn:uuid:0748e1c6
falseInst
nextInst
urn:uuid:62e8b8dc
nextInst
urn:uuid:008e999a
Block
Method
Equals
LocalDirect
Return
Return
LocalDirect
Block
Block
PushValueurn:uuid:5c4d5bc2
hasValue
urn:uuid:6d451a1e
hasValue
PushValue
LocalDirect
urn:uuid:51b8d4a0
Branch
nextInst
nextInst
hasRight
"marko"^^xsd:string
urn:uuid:5869b878
hasURI
LocalDirect
halt
Fhat
Instruction
programLocation
Frame
hasFrame
[0..*]
[0..1]
returnTop
ReturnStack
Instruction
rdf:firstrdf:rest
[0..1][0..1]
blockTop
[0..*]
FrameVariable
rdf:li
hasValue
rdfs:Resource
operandTop
OperandStack
rdfs:Resource
rdf:firstrdf:rest
[0..1]
[0..1]
[0..1]
RVM
[0..*]
hasSymbol
xsd:string
[1]
xsd:boolean[1]
forFrame[1]
fromBlock
Block
[1]
currentFrame
[0..1]
methodReuse
xsd:boolean[1]
[0..1]
BlockStack
Block
rdf:firstrdf:rest
[0..1]
[0..1]
[0..1]
A more esoteric body of work was developed at this time that dealt with the encoding of
not only data into the Web of Data, but also process. This included the distributed
representation of computing instructions (left) and virtual machines (right).
PostDoc Researcher at Los Alamos National Laboratory
• Studied graph theory and ethics as a Director’s Fellow PostDoc atthe Center for Nonlinear Studies of the Los Alamos National Laboratory.(2007-2010)
Path Algebra at Los Alamos National Laboratory
• Path Algebra: concerned with how to move through a graph in anintelligent, directed manner in order to solve problems using graphs.6
? The algebra contains a set of elements: vertices and edges.? The algebra contains a set of operations: traverse, filter, clip, merge,
split, not, etc.? The algebra provides a theory for how to develop graph traversal
engines (i.e. graph processors).
6Please see: 1.) http://arxiv.org/abs/0806.2274 2.) http://arxiv.org/abs/0803.4355 3.)http://gremlin.tinkerpop.com 4.) http://pipes.tinkerpop.com
Path Algebra at Los Alamos National Laboratory
The general theme of controlling how a walker moves through a graph has numerous
applications including searching, ranking, scoring, recommendation, etc. within a graph.
Eudaemonics at Los Alamos National Laboratory
• Eudaemonics: an ethical theory stating that it is everyone’s moralresponsibility to be “happy” (i.e. to live engaged in the world). See thework of Aristotle and David L. Norton.7
? Are recommender systems evolving to become eudaemonic engines?∗ Movies (e.g. NetFlix), books (e.g. GoodReads), life partners
(e.g. Match.com), careers (e.g. Montster), etc.∗ Can we interrelate all this data and traverse it for problem-solving?
7Please see: 1.) http://arxiv.org/abs/0903.0200 2.) http://arxiv.org/abs/0904.0027
Graph Systems Architect at AT&T Interactive
• Work in theoretical and applied models of problem-solving with graphtraversals and graph databases. (2010-present)
Graphs at AT&T Interactive
• Graph Traversal: the development of theories and applications of graphtraversals in real-world problem-solving situations.8
? Continue to work on path algebra (extensions to include a non-matrixbased, ring theoretic model and a diffusion model).
? Continue to work on open source graph-related technologies to supportgraph related efforts at AT&Ti (see http://www.tinkerpop.com).
• Recommender Systems: the development of applications for real-time,“themed” recommendations (i.e. a problem-solving graph engine).
? AT&Ti maintains a collection of interesting data sets.? Make use of such data for numerous types of recommendation.
8Please see: 1.) http://arxiv.org/abs/1004.1001 2.) http://arxiv.org/abs/1006.2361
Conclusion
• Graphs/networks touch numerous disciplines.
• Many aspects of the world can be modeled as a graph/network.
• Graph traversal algorithms show promise as a general-purposestyle/pattern for computing.