Top Banner
Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and Bioconductor", Gentleman, Carey, Huber, Irizarry, Dudoit. Springer Verlag.
73

Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Dec 24, 2015

Download

Documents

Arnold Austin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Graphs and networksWolfgang HuberVincent Carey

Robert GentlemanSeth Falcon

Based on "Bioinformatics and Computational Biology Solutions using R and Bioconductor",

Gentleman, Carey, Huber, Irizarry, Dudoit. Springer Verlag.

Page 2: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Graphs

Set of nodes and set of edges.

Nodes: objects of interest

Edges: relationships between them

A useful abstraction to talk about relationships and interactions (think of apples, pears, fingers, integer numbers)

Edges may have weights, directions, types

Page 3: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Use of graphs in biology

Knowledge representation

signal transduction, regulatory or metabolic networks; cartoons or more formalized

Gene Ontology

Graph-like data (“interactions”)

Y2H, APMS, ChIP-chip

Statistical models that use (sparse) graphs

Bayesian networks, phylogenetic trees

Page 4: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

“real” network measured networkE

xper

imen

t

Graph-like data: true state of nature vs experimental measurement

Statistical inference

Page 5: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Graph-like data: uncertainty

Distinguish between the true, underlying property that you want to measure, and the actual result of a measurement (experiment)

1. False positive edges2. False negative edges (were tested, were not found, but are there in nature)3. Untested edges (were not tested, are not in your data, but are there in nature)

Uncertainty is not usually considered in mainstream graph theory, but cannot be ignored in bioinformatics

applications.

Page 6: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

The “real” network

...is a mathematical model of nature, not to be confused with nature itself.

E.g. we can model protein-protein interactions as yes/no relationships, even though at closer look they can have a continuum of affinities, can be dynamic and can depend on the environment.

“All models are wrong, some are useful”

Page 7: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Motivating examples

Transcription factor graphsPathway graphs

GOLiterature graphs

Protein-Complex graphs

Page 8: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Transcription factor interactions

Nodes = transcription factors

Directed edge: X regulates transcription of Y

Page 9: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Transcriptional regulatory networksfrom "ChIP on chip" (chromatin immunoprecipitation)

regulator := a transcription factor (TF) or a ligand of a TFtag: c-myc epitope

106 microarrayssamples: enriched (tagged-regulator + DNA-promoter)probes: cDNA of all promoter regionsspot intensity ~ affinity of a promotor to a certain regulator

Lee et al. Science 2002

Page 10: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Transcriptional regulatory network: a bipartite graph

1

1

1

1

1

1

1

106 regulators (TFs)

6270

pro

mo

ter

reg

ion

s

regulators

promoters

Page 11: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Application: network motifs

Page 12: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

A pathway graph

Page 13: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Machine-readable pathway databases

KEGG

reactome

BioCarta (biocarta.com)

National Cancer Institute cMAP

Page 14: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Gene Ontology (GO)

A structed vocabulary to describe molecular function of gene products, biological processes, and cellular components.

Plus

A set of "is a", "is part of" relationships between these terms

Directed acyclic graph

Page 15: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

GO graphs

>tfG=GOGraph("GO:0003700", GOMFPARENTS)

Page 16: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Gene-Literature graphs

DKC1

Page 17: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

The bipartite gene-literature graph: actor and event size adjustment

actors: genesactor size: number of papers that a gene appears inevent: paperevent size: number of genes that appear in a paper

Example: R. Strausberg et al. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. PNAS 99:16899–903, 2002

cites 15,000 genes

Page 18: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Closing gene lists with literature

Boundary of gene list L: set of all genes that have co-citation (above threshold weight) with genes in L.

Gene 1

Gene 2

Gene 3

Gene 5

Gene 6

Gene 7

Gene 4

Page 19: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Graphs: vocabulary

Directed, undirected graphsAdjacent nodesAccessible nodesSelf-loopMulti-edgeNode degreeWalk: alternating sequence of nodes and incident edgesClosed walkDistance between nodes, shortest walkTrail: walk with no repeated edgesPath: trail with no repeated nodes (except possibly first/last)CycleConnected graphWeakly connected directed graph (see next page)

Page 20: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Strong and weak connectivity

Page 21: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Connectivity: minimum number of edges whose removal results in disconnected graph

Clique: every pair of nodes joined by an edge

Graphs: vocabularyCut: remove edges to disconnect a graphCut-set: remove nodes - " -

Page 22: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Special types of graphs

Page 23: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Bipartite graph

Page 24: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Bipartite graphs

AG adjacency matrix (n x m) of a bipartite graph G with

node sets U, V

One mode graphs

AU = AGt AG

AV = AG AG

t

(Boolean algebra)

Page 25: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Multigraphs

Can have different types of edges

Page 26: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Hypergraphs

:= set of Nodes + set of hyperedges

A hyperedge is a set of nodes (can be more than 2)

A directed hyperedge: pair (tail and head) of sets of nodes

Page 27: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Directed acyclic graphs

Useful for representing hierarchies and partial orderings (e.g. in time, from general to special, from cause to effect)

Many applications:GOMeSHGraphical models

Page 28: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Random Edge Graphs

n nodes, m edges

p(i,j) = 1/m

with high probability:

m < n/2: many disconnected components

m > n/2: one giant connected component: size ~ n.

(next biggest: size ~ log(n)).

degrees of separation: log(n).

Erdös and Rényi 1960

Page 29: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Random edge graph

100 nodes 50 edges

degree distribution

Page 30: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Random graphs versus permutation graphs

For statistical inference, one can consider null hypotheses based on aforementioned random graph models; and ones based on node permutation of data graphs.

The second is often more appropriate.

Page 31: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Cohesive subgroups

For data graphs, the concept of clique is usually too restrictive (false negative or untested edges)

n-clique: distance between all members is <=n. (Clique: n=1)

k-plex: maximal subgraph G in which each member is neighbour of at least |G|-k others. (Clique: k=1)

k-core: maximal subgraph G in which each member is neighbour of at least k others. (Clique: k=|G|-1)

After: Social Network Analysis, Wasserman and Faust (1994)

Page 32: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

A Graph Theoretic Algorithm for Estimating ProteinComplex Membership using Data from AffinityPurication - Mass Spectrometry Technology

Denise Scholtens and Robert Gentleman

Page 33: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Two Types of Protein Relationship

• AP-MS (Affinity Purification - Mass Spectrometry )

– Measures Complex Comembership

• Gavin, et al. (Nature, 2002)

– TAP : Tandem Affinity Purification

• Ho, et al. (Nature, 2002)

– HMS-PCI: High-throughput Mass Spectromic Protein Complex Identification

• Y2H (Yeast Two Hybrid)

– Measures Physical Interaction

• Ito, et al. (PNAS, 1998)• Uetz, et al. (Nature, 2000)

Page 34: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

AP-MS data:

Using a bait protein, AP-MS technology finds hit proteins that are comembers of at least one complex with the bait.

Y2H data:

Y2H technology finds pairs of physically interacting proteins.

(one purification)

bait

hits

Page 35: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

AP-MS data: Y2H data:

We want to estimate the bipartite protein complex membership graph, A:

Estimation of A requires estimation of k, the number of complexes.

Page 36: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

The strategy

1. Some proteins participate in more than one complex

2. In an AP-MS experiment, some proteins are used as baits and some proteins are only ever found as hits

3. Graph theoretic aspects of the model:

• Bipartite graph for complex membership (A)

• Relationship of complex membership (A) to complex comembership (Y) assayed in an AP-MS experiment (Z)

• AP-MS and Y2H are different technologies that measure different relationships between proteins

4. Statistical aspects: three types of errors: false positive, false negative (assayed), false negative (unassayed)

Page 37: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

PP2A

Heterotrimeric complex consisting of:

Tpd3 - regulatory A subunit

Rts1 or Cdc55- regulatory B subunits

Pph21 or Pph22- catalytic subunits

Jiang and Broach (1999). EMBO.

1. Some proteins participate in more than one complex

Gavin, et al. (2002)Rgraphviz plot ofyTAP C151

Bader & Hogue (2002)Portion of Figure 2: Overlap of the spoke models of TAP and HMS-PCI.

Jansen, et al. (2003)PIT Bayesian Network, LR>600

http://genecensus.org/intint

Tpd3

Pph21

Myo5

Cdc55

Cdc11

Pph22

Cdc10

Page 38: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

1. Some proteins participate in more than one complex

PP2A

Heterotrimeric complex consisting of:

Tpd3 - regulatory A subunit

Rts1 or Cdc55- regulatory B subunits

Pph21 or Pph22- catalytic subunits

Jiang and Broach (1999). EMBO.

apComplex algorithm detects:

Zds1 and Zds2 (known cell-cycle regulators) only exist in complexes with the Cdc55-Pph22 trimer!

Page 39: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

2. Graph theoretic paradigm to allow for succinct expression of constructs involved

•Bipartite graph for complex membership •Relationship of complex membership (A) to complex comembership (Y) assayed in an AP-MS experiment (Z)•AP-MS and Y2H are different technologies that measure different relationships between proteins

We want to estimate PCMGusing AP-MS assays of CCG

The Connection: Maximal Complete SubgraphsComplete Subgraph: set of n nodes for which all n(n-1) directed edges existMaximal Complete Subgraph: complete subgraph that is not contained in

any other complete subgraph

Page 40: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

2. Graph theoretic paradigm to allow for succinct expression of constructs involved

•Relationship of complex membership (A) to complex comembership (Y) assayed in an AP-MS experiment (Z)

Y represents “ideal” complex comembership observations from perfectly sensitive and perfectly specific AP-MS technology. Y depends on the baits that are used in an experiment. Y is assayed by AP-MS technology.The Connection: Maximal BH-Complete SubgraphsBH-Complete Subgraph: set of n bait nodes and m hit-only nodes for which

all n(n-1)+nm directed edges existMaximal BH-Complete Subgraph: BH-complete subgraph that is not

contained in any other complete subgraph

Page 41: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

3. Statistical paradigm to allow for false positive and false negative observations

Z represents actual observations using AP-MS technology.We will look for sets of proteins that form maximalBH-complete subgraphs with anallowance for false positive and false negative observations.

Page 42: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

AP-MS data for N bait proteins and M hit proteins→ noisy directed graph Z

Model: Z = AA' +

We want to estimate A.

Start with an initial estimate for A, and then refine that estimate according to a two component probability measure:

In summary…

P(Z |A, μ,α)=L(Z|Y=A A', μ,α) C(Z|A , μ,α))usual likelihood regularization/penalty term

(no. of complexes)

Page 43: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Software for graphs and networks

Here I will focus on what is available from the Bioconductor project, and through it (BGL, graphviz)

There are many other software packages, e.g.LEDAcytoscape

Page 44: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

graph, RBGL, Rgraphviz

graph basic class definitions and functionality

RBGL interface to graph algorithms

Rgraphviz rendering functionality Different layout algorithms. Node plotting, line type, color etc. can be controlled by the user.

Page 45: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Representation of graphs:

From-To matrix

from to [1,] "a" "b"[2,] "b" "c"[3,] "c" "d"[4,] "d" "b"

Page 46: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Representation of graphs:

Adjacency matrix (naive)

a b c da 0 1 0 0b 0 0 1 0c 0 0 0 1d 0 1 0 0

Page 47: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Representation of graphs:

Adjacency matrix (sparse)> aM a b c da 0 1 0 0b 0 0 1 0c 0 0 0 1d 0 1 0 0

> as.matrix.csr(am)An object of class ”matrix.csr”Slot "ra":[1] 1 1 1 1

Slot "ja":[1] 2 3 4 2

Slot "ia":[1] 1 2 3 4 5

Slot "dimension":[1] 4 4

Page 48: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Representation of graphs:

Node edge list> class(g)[1] "graphNEL"attr(,"package")[1] "graph"

> nodes(g)[1] "a" "b" "c" "d"

> edges(g)$a[1] "b"

$b[1] "c"

$c[1] "d"

$d[1] "b"

Page 49: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Representation of graphs:

package graph

From-To matrixAdjacency matrix (naive)Adjacency matrix (sparse)Node-edge lists

They are equivalent, but may be different in performance and convenience for different applications.

Can coerce between the representations

Page 50: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Creating our first graph

> library("graph"); library(Rgraphviz)

> myNodes = c("s", "p", "q", "r")

> myEdges = list(s = list(edges = c("p", "q")), p = list(edges = c("p", "q")), q = list(edges = c("p", "r")), r = list(edges = c("s")))

> g = new("graphNEL", nodes = myNodes, edgeL = myEdges, edgemode = "directed")

> plot(g)

Page 51: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Querying nodes, edges, degree

> nodes(g)[1] "s" "p" "q" "r"

> edges(g)$s[1] "p" "q"$p[1] "p" "q"$q[1] "p" "r"$r[1] "s"

> degree(g)$inDegrees p q r1 3 2 1$outDegrees p q r2 2 2 1

Page 52: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

adjacent and accessible nodes

> adj(g, c("b", "c"))$b[1] "b" "c"$c[1] "b" "d"

> acc(g, c("b", "c"))$ba c d3 1 2

$ca b d2 1 1

Page 53: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Graph manipulation

> g1 <- addNode("e", g)

> g2 <- removeNode("d", g)

> ## addEdge(from, to, graph, weights)

> g3 <- addEdge("e", "a", g1, pi/2)

> ## removeEdge(from, to, graph)

> g4 <- removeEdge("e", "a", g3)

> identical(g4, g1)

[1] TRUE

Page 54: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Elementary computations on IMCA pathway

> library("graph")> data("integrinMediatedCellAdhesion")> acc(IMCAGraph, "SOS")Ha-Ras Raf MEK 1 2 3 ERK MYLK MYO 4 5 6F-actin cell proliferation 7 5

Page 55: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.
Page 56: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

GXL: graph exchange language

<gxl> <graph edgemode="directed" id="G"> <node id="A"/> <node id="B"/> <node id="C"/> … <edge id="e1" from="A" to="C"> <attr name="weights"> <int>1</int> </attr> </edge> <edge id="e2" from="B" to="D"> <attr name="weights"> <int>1</int> </attr> </edge> …</graph></gxl>

from graph/GXL/kmstEx.gxl

GXL (www.gupro.de/GXL)

is "an XML sublanguage

designed to be a standard exchange format for graphs". The graph package

provides tools for im- and exporting

graphs as GXL

Page 57: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

RBGL: interface to the Boost Graph Library

Connected componentscc = connComp(rg) table(listLen(cc)) 1 2 3 4 15 18 36 7 3 2 1 1

Choose the largest componentwh = which.max(listLen(cc)) sg = subGraph(cc[[wh]], rg)

Depth first searchdfsres = dfs(sg, node = "N14")nodes(sg)[dfsres$discovered] [1] "N14" "N94" "N40" "N69" "N02" "N67" "N45" "N53" [9] "N28" "N46" "N51" "N64" "N07" "N19" "N37" "N35" [17] "N48" "N09"

rg

Page 58: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

depth / breadth first search

dfs(sg, "N14")bfs(sg, "N14")

Page 59: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

connected componentssc = strongComp(g2)

nattrs = makeNodeAttrs(g2, fillcolor="")

for(i in 1:length(sc)) nattrs$fillcolor[sc[[i]]] =

myColors[i]

plot(g2, "dot", nodeAttrs=nattrs)

wc = connComp(g2)

Page 60: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

shortest path algorithms

Different algorithms for different types of graphs o all edge weights the sameo positive edge weightso real numbers

…and different settings of the problemo single pairo single sourceo single destinationo all pairs

Functionsbfsdijkstra.spsp.betweenjohnson.all.pairs.sp

Page 61: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

shortest path

1

set.seed(123)rg2 = randomEGraph(nodeNames, edges = 100)fromNode = "N43"toNode = "N81"sp = sp.between(rg2,

fromNode, toNode)

sp[[1]]$path [1] "N43" "N08" "N88" [4] "N73" "N50" "N89" [7] "N64" "N93" "N32" [10] "N12" "N81"

sp[[1]]$length [1] 10

Page 62: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

connectivity

Consider graph g with single connected component.Edge connectivity of g: minimum number of edges in g that can be cut to produce a graph with two components. Minimum disconnecting set: the set of edges in this cut.

> edgeConnectivity(g)$connectivity[1] 2

$minDisconSet$minDisconSet[[1]][1] "D" "E"

$minDisconSet[[2]][1] "D" "H"

Page 63: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.
Page 64: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Rgraphviz: the different layout engines

Page 65: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Rgraphviz: the different layout engines

Page 66: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

ImageMap

lg = agopen(g, …)

imageMap(lg, con=file("imca-frame1.html", open="w") tags= list(HREF = href, TITLE = title, TARGET = rep("frame2", length(AgNode(nag)))), imgname=fpng, width=imw, height=imh)

Show drosophila interaction network example

Page 67: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Combining R graphics and graphviz: custom node drawing functions

Page 68: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Combining: graphviz layout and R plot

Page 69: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Using GO to interprete gene lists

Page 70: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Using GO to interprete gene lists

Packages: Gostats, Rgraphviz

Page 71: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

A pathway graph

Page 72: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

A pathway graph

Page 73: Graphs and networks Wolfgang Huber Vincent Carey Robert Gentleman Seth Falcon Based on "Bioinformatics and Computational Biology Solutions using R and.

Acknowledgements

Vince CareySeth FalconRobert GentlemanJeff GentryLi LongDenise Scholtens

Bioconductor developers