Dynamic Networks: How Networks Change with Time? Vahid Mirjalili CSE 891.

Dynamic Networks:How Networks Change with Time?

Vahid MirjaliliCSE 891

Overview• Introduction• Methodology

– DHAC: clustering in a single snapshot– MATH-EM: Cluster matching in different time

frames• Results• Discussion• Further improvement

Motivation• To infer the dynamic state of a cell in

response to physiological changes• Two algorithms used:

DHAC: Dynamic Hierarchal Agglomerative Clustering for clustering time-evolving networks

MATH-EM: for matching corresponding clusters across time-points

Background• Current biological networks are static• Experimental methods:

Protein abundance (mass spec.) (mainly available for high abundant proteins)

Transcript abundance (more readily available)• Previous works: combining transcript

abundance and interaction networks to create a moving cell

Dynamic Networks• Probabilistic framework• The number of proteins can increase or

decrease at each time-point• Protein can switch interacting partners• Complexes can grow/shrink• Reveals temporal regulation of cell protein

state

HAC: Hierarchal Agglomerative Clustering

• Agglomerative = “bottom up” approach• Divisive = “top down” approach

HAC Features• Maximizes the likelihood of a hierarchal stochastic

block model• Automatic selection of model size• Multi-scale networks• Outperforms other methods in link prediction

Extending HAC to dynamic networks:

• How complexes inferred at one time point correspond to other time points

• Transitions of a protein require dynamic coupling between network snapshots

DHAC:• Converting likelihood modularity from

maximum likelihood to fully Bayesian statistics

• Kernelize likelihood modularity with an adaptive bandwidth to couple network clusters at different time points

Dynamic Network Clustering {G(t) = (V(t), E(t)), t= 1 .. T}V: proteins E: (undirected, unweighted) protein-protein interactions

• Goal: find the stochastic block models• {M(t) t=1 .. T} M(t): network generative model for

G(t)

• Introducing coupling between time points improves dynamic network clustering

DHAC: notations ijijk h

ijeij

nkMGP )1(),,|( probability of a structure model M

The probability that a vertex is in cluster k k 11

K

kk

jviujiij ,for ]1,0[

jviuuvij en

,

clusters:,;nodes:,where

jivu

)10( : vu, nodes between edge if: or euv

edges possible ofnumber Total:

(holes) edges existing-non ofNumber :

j & icluster between edges existing ofNumber :

ij

ij

ij

ijijij

t

h

n

thn

Merging Clusters• To merging clusters 1 &2 into 1’:

Maximum likelihood

Bayesian

MLj

MLj

MLj

nn

ns

PPP

nnn

21

'1

21

'112 lnln

21

'1

ij

ijij

tij

hij

eijML

ij the

P

2,1 21

'1

221211

'1112 ln

jBj

Bj

Bj

BBB

Bs

PPP

PPPP

Kernelization

sT

s

K stw 121

12 ,

• Kernel reweighting: to couple nearby snapshots

width withfunction

basis radial Guassian:,stw

sT

s

K stw 121

12 ,

kernelized :Ksnapshot single:s

DHAC Algorithmfor t=1:T do• Set each vertex to be a single cluster• Let be cumulative model comparison score• Compute merging scores of pairs having an edge or a shared

neighbor• repeat• Pick a pair i,j of maximum • Update scores of affected pairs after merging i,j• Merge i,j to i'• Compute merging scores i',j for all j with or• Update• until no pairs left• output at which was maximumend for

0cum

),( tKij

0' jie 0' k

kjki ee),(),( tt K

ijcumcum

),( tM ),( tcum

Cluster Matching Algorithm• Searching through time-frames to see how

complexes evolve• Goal: to find the most probable matching

of cluster i to a global index k

Results● Drosophila development (gene expression

data available)

DHAC-local: variable bandwidth

DHAC-const: constant bandwidth

Yeast Metabolic Cycle

Yeast Results• Yeast results identify protein complexes

with asynchronous gene expression• 31 dynamic protein complexes were

recovered• Many of the complexes have cluster-

specific gene-ontology with P-value<0.05• Some of the complexes disappear and

then reappear across time-points

Discussion• DHAC scales as O(EJ ln(V))• Networks with 2000 vertices take up to 5

min.• A full genome network (10000 to 100000

vertices) can be analyzed in a day or a week• This methods permits proteins to switch

between complexes over time• A natural multi-scale complexes, sub-

complexes and proteins

Further improvement• Information from pathway to complex to

sub-complex to finer structures could be used

• Lack a method to match the dynamically evolving hierarchical structures over snapshots

• They only focused on the bottom level complexes, rather than the hierarchical structure

MATCH-EM• Goal: Match similar groups across time-

points• Find the mapping of each cluster to a

global index

otherwise0

ktoisassignediclusterif1)(tikz 1)(

k

tikz

There is one and only one global index for cluster i

ukv The probability that vertex u is in global index k

1k

ukv

The assignment matrix

The matching probability under consistent indexing

)(tijn

Number of shared vertices between cluster i at time t, and cluster j at time t+1

K

k

K

k

T

t Si Sj

zznkk

K

k

T

t Si Cu

zuk

tijt

t t

tkj

tik

tij

t i

tikvvzMP

1 1

1

1

1 1

)(

1

)1()()(

)(

),|}{},({

kk Probability that a vertex can make a transition from k to k’ between two consecutive snapshots

• Update:

T

t Sii

tikuk

t

CuIzv1

)( }{

1

1

)1()(

1

T

t Si Sj

tkj

tikijkk

t t

zzn

Experimental Data• Combining Gene expression time series with

static protein interaction networks• The presence of a protein is assumed to be

related to the transcriptional abundance of the corresponding transcript at a nearby time

• N x T matrix: transcription levels of N genes across T time points

• The dynamics of the networks is generated from the transcription matrix, under the assuming that proteins in a complex have correlated gene expression profiles

Results: Held-out link prediction

• Randomly select two vertices, and remove the edge

• After clustering, vertex u is assigned to group i, and vertex v to cluster j

• The maximum likelihood probability that u-v were connected:

connectednotif0connectedif1

uve

)()(

)()(

tij

tij

tijt

uv hee

e

AUPRC: area under the curve of Precision-Recall-Curve

AUROC: area under the curve of receiver-operating-characteristics (generated by true-positive-rate and false-positive-rate)

TNFPFPFPR

FNTPTPTPR

FNTPTPcall

FPTPTPecision

RePr

Yeast Metabolic Cycle• Three dominant metabolic states:

1. Reductive Building: 977 genes RB2. Reductive Charging: 1510 genes RC3. Oxidative: 1023 genes OX

• 36 snapshots• Preprocessing: iterative degree cutoff,

reducing the number of proteins from 1380 to 480±14

Macro-view of YMC

RB phase

OX phase

RC phase

Micro-views of YMC dynamicsCluster #7: mitochondrial ribosome complex

1. RSMs: ribosomal small subunits of mitochondria2. MRPs: mitochondrial ribosomal proteins

• RSM22 is active at t=9, 20 & 32, while other proteins are not transcribed

• Methylation of 3’-end of rRNA of small mitochondrial subunit is requred for the assembly and stability of mitochindrial ribosome

• Deleting RSM22 yields a viable cell with non-functional mitochondria

• Hypothesis Early expression of RSM22 provide the methylation activity required for the assembly of small sub-units of mitochondrial ribosome

Cluster #7: mitochondrial ribosomal complex

Average expression levels during the three main phases

Cluster #16: nuclear pore• Active at t=9, 20 & 32• Most genes are OX-responsive• Combines with subunits of other

complexes• The co-expressed cores:

– Nuclear pore complex (NPC)– Karyopherin proteins (KAP)

Micro-views of YMC dynamics

Cluster #16: nuclear pore complex

During OX phase, SRP1 and SXM1 Are additionally recruited

What we learned from YMC?

• RRP4 and RRP42 are part of exosome that edit RNA molecules, they transition between the nuclear pore and other complexes

• RNA processing is tightly coupled to transport through the nuclear pore to cytoplasm

• Dynamic reorganization of the nuclear pore occurs during the metabolic cycle

Dynamic Networks: How Networks Change with Time? Vahid Mirjalili CSE 891.

Documents