Modularity in Biological networks. Hypothesis: Biological function are carried by discrete functional modules. Hartwell, L.-H., Hopfield, J. J., Leibler,

Modularity in Biological networks

Hypothesis: Biological function are carried by discrete functional modules.

Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature, 1999.

Question: Is modularity a myth, or a structural property of biological networks?(are biological networks fundamentally modular?)

Modularity in Cellular Networks

Traditional view of modularity:

Modularity in cell biology

Definition of a module

• Loosely linked island of densely connected nodes

• Groups of co-expressed genes

Concept of modules in a network

Concept of modules in a network

Definition of a module

Computational analysis of modular structures

Data clustering approach

Concept of data clustering analysis

• Partitioning a data set into groups so that points in one group are similar to each other and are as different as possible from the points in other groups.

• The validity of a clustering is often in the eye of beholder.

Concept of data clustering analysis

• In order to describe two data points are similar or not, we need to define a similarity measure.

• We also need a score function for our objectives.

• A clustering algorithm can be used to partition the data set with optimized score function.

Types of clustering algorithms

• Partition-based clustering algorithms

• Hierarchical clustering algorithms

• Probabilistic model-based clustering algorithms

Partitioning problem

• Given the set of n nodes network D={x(1),x(2), ,x(n)}, our task is to find K cluste∙∙∙rs C={C1,C2, ,C∙∙∙ K} such that each node x(i) is assigned to a unique cluster Ck with optimized score function S(C1,C2, ,C∙∙∙

K).

Community structure of biological network

Community 1

Community 2

Community 3

Score function for network clustering

• To maximize the intra group connections as many as possible and to minimize the inter group connection as few as possible.

Spectral analysis clustering algorithm

Adjacency Matrix

• Aij= 1 if ith protein interacts with jth protein

• Aij=0 otherwise

• Aij=Aji (undirected graph)

• Aij is a sparse matrix, most elements of Aij are zero

0

0

Spectral analysis

Algorithm (Spectral analysis)

• Randomly assign a vector X=(X1,X2,…,Xn)

• Iterate X(k+1)=AX(k) untill it converges

• Try another vector which is perpendicular to previous found eigenspace

Topological Structure

Original Network Hidden Topological Structure

An example

Protein-protein interaction network of Saccharomyces cerevisiae

Assign 80000 interactions of 5400 yeast proteins a confidence

valueWe take 11855 interactions with high and medium confidence among 2617 proteins with 353 unknown function

proteins.

Data source

Quasi-cliqueQuasi-bipartite

Positive eigenvalue negative eigenvalue

• With the spectral analysis, we obtain 48 quasi-cliques and 6 quasi-bipartites.

• There are annotated proteins, unannotated and unknown proteins within a quasi-clique

Application—function prediction

Hierarchical clustering algorithm

• A similarity distance measure between node i and j, d(i,j)

• The similarity measure can be let the network to be a weighted network Wij.

Types of hierarchical clustering

• Agglomerative hierarchical clustering

• Divisive hierarchical clustering

Properties of similarity measure

• d(i,j)≥0

• d(i,j)=d(j,i)

• d(i,j)≤d(i,k)+d(k,j)

Similarity measure for agglomerative clustering

• Correlation

• Shortest path length

• Edge betweenness

How good is agglomerative clustering ?

Hierarchical tree (Dendrogram)

threshold

Cluster 1Cluster 2

Single link

Distance between clusters

Cluster 1Cluster 2

Complete link

Distance between clusters

0203.429.55

205.3539.5

03.45.305.15.2

29.555.102

539.55.220

D

x2

x3

x1

x4

x5

1.5 2.0 2.2 3.5

Single link

Divisive hierarchical clustering

M.E.J., Newman and M. Girvan, Phys. Rev. E 69, 026113, (2004)

Definition of edge betweeness

i

j

5

2

and i node connectingpath ofnumber

k edge through passingpath ofnumber ),(

jjiBk

Definition of edge betweeness

ji

k jiBk,

),( edge of sbetweennes edge

jik jiB

NNk

,

),()3)(2(

2 edge of sbetweennes edge scaled

Calculation of edge betweenness

Quantitative measurement of network modularity

Modularity Q

ji

e

ea

aeQ

ij

jiji

iiii

and module

connectingnetwork in edges offraction theis

2

Threshold selection

Karate club network

Karate club network

Examples of agglomerative hierarchical

clustering

Can we identify the modules?

),min(

),(),(

jiT kk

jiJjiO J(i,j): # of nodes both i and j link to; +1 if there is a direct (i,j) link

Modules in the E. coli metabolismE. Ravasz et al., Science, 2002

Pyrimidine metabolism

Yeast signaling proteins in MIPS

i,jl

lA

ij

ij

ij

proteinbetween path shortest :

12

PNAS, vol.100, pp.1128, (2003).

Spotted microarray for Saccharomyces cerevisiae

Similarity measure

Regulatory module network

Genome Biology, 9, R2, (2008).

Modularity in Biological networks. Hypothesis: Biological function are carried by discrete functional modules. Hartwell, L.-H., Hopfield, J. J., Leibler,

Documents

network slide

module slide

eigenspace slide

spectral analysis slide

data source slide

topological structure

quasiclique slide

network clustering