Modularity in Biological networks
Dec 20, 2015
Hypothesis: Biological function are carried by discrete functional modules.
Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature, 1999.
Question: Is modularity a myth, or a structural property of biological networks?(are biological networks fundamentally modular?)
Modularity in Cellular Networks
Traditional view of modularity:
Definition of a module
• Loosely linked island of densely connected nodes
• Groups of co-expressed genes
Concept of data clustering analysis
• Partitioning a data set into groups so that points in one group are similar to each other and are as different as possible from the points in other groups.
• The validity of a clustering is often in the eye of beholder.
Concept of data clustering analysis
• In order to describe two data points are similar or not, we need to define a similarity measure.
• We also need a score function for our objectives.
• A clustering algorithm can be used to partition the data set with optimized score function.
Types of clustering algorithms
• Partition-based clustering algorithms
• Hierarchical clustering algorithms
• Probabilistic model-based clustering algorithms
Partitioning problem
• Given the set of n nodes network D={x(1),x(2), ,x(n)}, our task is to find K cluste∙∙∙rs C={C1,C2, ,C∙∙∙ K} such that each node x(i) is assigned to a unique cluster Ck with optimized score function S(C1,C2, ,C∙∙∙
K).
Score function for network clustering
• To maximize the intra group connections as many as possible and to minimize the inter group connection as few as possible.
Adjacency Matrix
• Aij= 1 if ith protein interacts with jth protein
• Aij=0 otherwise
• Aij=Aji (undirected graph)
• Aij is a sparse matrix, most elements of Aij are zero
Algorithm (Spectral analysis)
• Randomly assign a vector X=(X1,X2,…,Xn)
• Iterate X(k+1)=AX(k) untill it converges
• Try another vector which is perpendicular to previous found eigenspace
Assign 80000 interactions of 5400 yeast proteins a confidence
valueWe take 11855 interactions with high and medium confidence among 2617 proteins with 353 unknown function
proteins.
Data source
• With the spectral analysis, we obtain 48 quasi-cliques and 6 quasi-bipartites.
• There are annotated proteins, unannotated and unknown proteins within a quasi-clique
Hierarchical clustering algorithm
• A similarity distance measure between node i and j, d(i,j)
• The similarity measure can be let the network to be a weighted network Wij.
Types of hierarchical clustering
• Agglomerative hierarchical clustering
• Divisive hierarchical clustering
Similarity measure for agglomerative clustering
• Correlation
• Shortest path length
• Edge betweenness
0203.429.55
205.3539.5
03.45.305.15.2
29.555.102
539.55.220
D
x2
x3
x1
x4
x5
1.5 2.0 2.2 3.5
Single link
Definition of edge betweeness
i
j
5
2
and i node connectingpath ofnumber
k edge through passingpath ofnumber ),(
jjiBk
Definition of edge betweeness
ji
k jiBk,
),( edge of sbetweennes edge
jik jiB
NNk
,
),()3)(2(
2 edge of sbetweennes edge scaled
Quantitative measurement of network modularity
Modularity Q
ji
e
ea
aeQ
ij
jiji
iiii
and module
connectingnetwork in edges offraction theis
2
Can we identify the modules?
),min(
),(),(
jiT kk
jiJjiO J(i,j): # of nodes both i and j link to; +1 if there is a direct (i,j) link
Yeast signaling proteins in MIPS
i,jl
lA
ij
ij
ij
proteinbetween path shortest :
12
PNAS, vol.100, pp.1128, (2003).