Dynamic Networks: How Networks Change with Time? Vahid Mirjalili CSE 891
Jan 18, 2018
Dynamic Networks:How Networks Change with Time?
Vahid MirjaliliCSE 891
Overview• Introduction• Methodology
– DHAC: clustering in a single snapshot– MATH-EM: Cluster matching in different time
frames• Results• Discussion• Further improvement
Motivation• To infer the dynamic state of a cell in
response to physiological changes• Two algorithms used:
DHAC: Dynamic Hierarchal Agglomerative Clustering for clustering time-evolving networks
MATH-EM: for matching corresponding clusters across time-points
Background• Current biological networks are static• Experimental methods:
Protein abundance (mass spec.) (mainly available for high abundant proteins)
Transcript abundance (more readily available)• Previous works: combining transcript
abundance and interaction networks to create a moving cell
Dynamic Networks• Probabilistic framework• The number of proteins can increase or
decrease at each time-point• Protein can switch interacting partners• Complexes can grow/shrink• Reveals temporal regulation of cell protein
state
HAC: Hierarchal Agglomerative Clustering
• Agglomerative = “bottom up” approach• Divisive = “top down” approach
HAC Features• Maximizes the likelihood of a hierarchal stochastic
block model• Automatic selection of model size• Multi-scale networks• Outperforms other methods in link prediction
Extending HAC to dynamic networks:
• How complexes inferred at one time point correspond to other time points
• Transitions of a protein require dynamic coupling between network snapshots
DHAC:• Converting likelihood modularity from
maximum likelihood to fully Bayesian statistics
• Kernelize likelihood modularity with an adaptive bandwidth to couple network clusters at different time points
Dynamic Network Clustering {G(t) = (V(t), E(t)), t= 1 .. T}V: proteins E: (undirected, unweighted) protein-protein interactions
• Goal: find the stochastic block models• {M(t) t=1 .. T} M(t): network generative model for
G(t)
• Introducing coupling between time points improves dynamic network clustering
DHAC: notations ijijk h
ijeij
nkMGP )1(),,|( probability of a structure model M
The probability that a vertex is in cluster k k 11
K
kk
jviujiij ,for ]1,0[
jviuuvij en
,
clusters:,;nodes:,where
jivu
)10( : vu, nodes between edge if: or euv
edges possible ofnumber Total:
(holes) edges existing-non ofNumber :
j & icluster between edges existing ofNumber :
ij
ij
ij
ijijij
t
h
n
thn
Merging Clusters• To merging clusters 1 &2 into 1’:
Maximum likelihood
Bayesian
MLj
MLj
MLj
nn
ns
PPP
nnn
21
'1
21
'112 lnln
21
'1
ij
ijij
tij
hij
eijML
ij the
P
2,1 21
'1
221211
'1112 ln
jBj
Bj
Bj
BBB
Bs
PPP
PPPP
Kernelization
sT
s
K stw 121
12 ,
• Kernel reweighting: to couple nearby snapshots
width withfunction
basis radial Guassian:,stw
sT
s
K stw 121
12 ,
kernelized :Ksnapshot single:s
DHAC Algorithmfor t=1:T do• Set each vertex to be a single cluster• Let be cumulative model comparison score• Compute merging scores of pairs having an edge or a shared
neighbor• repeat• Pick a pair i,j of maximum • Update scores of affected pairs after merging i,j• Merge i,j to i'• Compute merging scores i',j for all j with or• Update• until no pairs left• output at which was maximumend for
0cum
),( tKij
0' jie 0' k
kjki ee),(),( tt K
ijcumcum
),( tM ),( tcum
Cluster Matching Algorithm• Searching through time-frames to see how
complexes evolve• Goal: to find the most probable matching
of cluster i to a global index k
Results● Drosophila development (gene expression
data available)
DHAC-local: variable bandwidth
DHAC-const: constant bandwidth
Yeast Metabolic Cycle
Yeast Results• Yeast results identify protein complexes
with asynchronous gene expression• 31 dynamic protein complexes were
recovered• Many of the complexes have cluster-
specific gene-ontology with P-value<0.05• Some of the complexes disappear and
then reappear across time-points
Discussion• DHAC scales as O(EJ ln(V))• Networks with 2000 vertices take up to 5
min.• A full genome network (10000 to 100000
vertices) can be analyzed in a day or a week• This methods permits proteins to switch
between complexes over time• A natural multi-scale complexes, sub-
complexes and proteins
Further improvement• Information from pathway to complex to
sub-complex to finer structures could be used
• Lack a method to match the dynamically evolving hierarchical structures over snapshots
• They only focused on the bottom level complexes, rather than the hierarchical structure
MATCH-EM• Goal: Match similar groups across time-
points• Find the mapping of each cluster to a
global index
otherwise0
ktoisassignediclusterif1)(tikz 1)(
k
tikz
There is one and only one global index for cluster i
ukv The probability that vertex u is in global index k
1k
ukv
The assignment matrix
The matching probability under consistent indexing
)(tijn
Number of shared vertices between cluster i at time t, and cluster j at time t+1
K
k
K
k
T
t Si Sj
zznkk
K
k
T
t Si Cu
zuk
tijt
t t
tkj
tik
tij
t i
tikvvzMP
1 1
1
1
1 1
)(
1
)1()()(
)(
),|}{},({
kk Probability that a vertex can make a transition from k to k’ between two consecutive snapshots
• Update:
T
t Sii
tikuk
t
CuIzv1
)( }{
1
1
)1()(
1
T
t Si Sj
tkj
tikijkk
t t
zzn
Experimental Data• Combining Gene expression time series with
static protein interaction networks• The presence of a protein is assumed to be
related to the transcriptional abundance of the corresponding transcript at a nearby time
• N x T matrix: transcription levels of N genes across T time points
• The dynamics of the networks is generated from the transcription matrix, under the assuming that proteins in a complex have correlated gene expression profiles
Results: Held-out link prediction
• Randomly select two vertices, and remove the edge
• After clustering, vertex u is assigned to group i, and vertex v to cluster j
• The maximum likelihood probability that u-v were connected:
connectednotif0connectedif1
uve
)()(
)()(
tij
tij
tijt
uv hee
e
AUPRC: area under the curve of Precision-Recall-Curve
AUROC: area under the curve of receiver-operating-characteristics (generated by true-positive-rate and false-positive-rate)
TNFPFPFPR
FNTPTPTPR
FNTPTPcall
FPTPTPecision
RePr
Yeast Metabolic Cycle• Three dominant metabolic states:
1. Reductive Building: 977 genes RB2. Reductive Charging: 1510 genes RC3. Oxidative: 1023 genes OX
• 36 snapshots• Preprocessing: iterative degree cutoff,
reducing the number of proteins from 1380 to 480±14
Macro-view of YMC
RB phase
OX phase
RC phase
Micro-views of YMC dynamicsCluster #7: mitochondrial ribosome complex
1. RSMs: ribosomal small subunits of mitochondria2. MRPs: mitochondrial ribosomal proteins
• RSM22 is active at t=9, 20 & 32, while other proteins are not transcribed
• Methylation of 3’-end of rRNA of small mitochondrial subunit is requred for the assembly and stability of mitochindrial ribosome
• Deleting RSM22 yields a viable cell with non-functional mitochondria
• Hypothesis Early expression of RSM22 provide the methylation activity required for the assembly of small sub-units of mitochondrial ribosome
Cluster #7: mitochondrial ribosomal complex
Average expression levels during the three main phases
Cluster #16: nuclear pore• Active at t=9, 20 & 32• Most genes are OX-responsive• Combines with subunits of other
complexes• The co-expressed cores:
– Nuclear pore complex (NPC)– Karyopherin proteins (KAP)
Micro-views of YMC dynamics
Cluster #16: nuclear pore complex
During OX phase, SRP1 and SXM1 Are additionally recruited
What we learned from YMC?
• RRP4 and RRP42 are part of exosome that edit RNA molecules, they transition between the nuclear pore and other complexes
• RNA processing is tightly coupled to transport through the nuclear pore to cytoplasm
• Dynamic reorganization of the nuclear pore occurs during the metabolic cycle