Inference in DBNs with non-disjoint clusters Matthieu Pichené
Inference in DBNs with non-disjoint clusters
Matthieu Pichené
Introduction
Apoptosis pathway
Mcl1
Mcl1
Method
simulations Analysis
MATHEMATICAL FORMALISM
BIOLOGICAL SYSTEM
Method
Method
Method(Approximate) abstrac1on
of the low level biochemical model
DBNs
ES
S
E
P
S + E <—> ES —> P + E
t0 t1 t2 t3
k+1
k-1
k+2
{1 2 3 4 5
{1 2 3 4 5
{1 2 3 4 5
{1 2 3 4 5
Every specie at time point t is a random
variable over a discrete
number of values.
Number of configurations at each time point: ValuesSpecies
DBNs
ES
S
E
P
t0 t1 t2 t3
+CPT
S ES
E P
k+1
k-1
k+2S + E <—> ES —> P + E
CPTS + E <—> ES —> P + Ek+1
k-1
k+2
S S E ES Pr1 1 1 1 0.11 2 1 2 0.22 2 3 3 0.1…
SES S E ES P Pr112…
E S E ES Pr112…
P ES P Pr112…
ES
E P
DBNs
ES
S
E
P
t0 t1 t2 t3
+CPT
S ES
E P
k+1
k-1
k+2S + E <—> ES —> P + E
Complexity of exact inference: at least ValuesSpecies
DBNs
• We need an approximation. Express configurations as product of probabilities
• Simplest idea : Consider all species independent ( Factored Frontier )
Factored Frontier
ES
S
E
P
t0 t1 t2 t3
k+1
k-1
k+2
Hypothesis : Independent
S + E <—> ES —> P + E
complexity of FF inference: Species x ValuesNbPar+1
Pt2(P=h)= f(Pt1(P),Pt1(ES),CPT)
Low accuracy
Clustered Factored Frontier
• Use of clusters containing the species that have the most mutual information
• Clusters may vary over time
• All sets of states for species in a clusters are calculated (that limits the length of clusters)
Clustered Factored Frontier
• Use information theory (Eric) to obtain the important relations
• We (Eric) chose the tree to minimize distance
• Tree implies cluster of size 2
R
R
L:R
L:R
R*
R*
R*:pC8
R
*:pC
8
C8
C8
Bar
Bar
Bid
Bid
C8:Bar
C
8:Ba
r
flip
flip
R*:flip
R
*:flip
pC8
pC8
pC3
pC3
C8:pC3
C
8:pC
3
C3:XIAP
C
3:XI
AP
C3U
C3 U
tBid:Mcl1
tBid
:Mcl
1
C8:Bid
C
8:Bi
d
tBid
tBid
C3
C3
XIAP
XIAP
Smac
Smac
Smacr
S
mac
r
Apop
Apop
Apop:XIAP
Apo
p:XI
AP
PARP
PAR
P
cPARP
cPAR
P
CyCm
CyC
m
Smacm
Smac
m
CyC
CyC
CyCr
CyC
r
Smac:XIAP
Sm
ac:X
IAP
Bax2:Bcl2
Bax
2:Bc
l2
Bcl2
Bcl2
Bax
Bax
Bax*m
Bax*
m
Bax*
Bax*
Bax2
Bax2
Mcl1
Mcl
1
Pore*
Pore
*
Bax4
Bax4
Bax4:M
B
ax4:
M
Bax*m:Bcl2
Bax*
m:B
cl2
Apaf*
A
paf*
pC9
pC
9
Apaf
Apaf
Bax4:Bcl2
Bax
4:Bc
l2
Apop:pC3
Apo
p:pC
3
C3:PARP
C
3:PA
RP
tBid:Bax
tBi
d:Ba
x
M*:CyCm
M
*:CyC
m
M*:Smacm
M*:S
mac
m
CyC:Apaf
CyC
:Apa
f
pC6
pC6
Pore
Pore
C6
C6
C3:pC6
C
3:pC
6
C6:pC8
C
6:pC
8
136 238 5 61439 337 4 7404335464515 8113029345612132528272657492217191820162421514832333150554447525354 923104142
136 238 5 61439 337 4 7404335464515 8113029345612132528272657492217191820162421514832333150554447525354 923104142 0
0.5
1
1.5
2
2.5
3
Mutual information on the whole graph
Mutual Information on the Tree Approximation
R
R
L:R
L:R
R*
R*
R*:pC8
R
*:pC
8
C8
C8
Bar
Bar
Bid
Bid
C8:Bar
C
8:Ba
r
flip
flip
R*:flip
R
*:flip
pC8
pC8
pC3
pC3
C8:pC3
C
8:pC
3
C3:XIAP
C
3:XI
AP
C3U
C3 U
tBid:Mcl1
tBid
:Mcl
1
C8:Bid
C
8:Bi
d
tBid
tBid
C3
C3
XIAP
XIAP
Smac
Smac
Smacr
S
mac
r
Apop
Apop
Apop:XIAP
Apo
p:XI
AP
PARP
PAR
P
cPARP
cPAR
P
CyCm
CyC
m
Smacm
Smac
m
CyC
CyC
CyCr
CyC
r
Smac:XIAP
Sm
ac:X
IAP
Bax2:Bcl2
Bax
2:Bc
l2
Bcl2
Bcl2
Bax
Bax
Bax*m
Bax*
m
Bax*
Bax*
Bax2
Bax2
Mcl1
Mcl
1
Pore*
Pore
*
Bax4
Bax4
Bax4:M
B
ax4:
M
Bax*m:Bcl2
Bax*
m:B
cl2
Apaf*
A
paf*
pC9
pC
9
Apaf
Apaf
Bax4:Bcl2
Bax
4:Bc
l2
Apop:pC3
Apo
p:pC
3
C3:PARP
C
3:PA
RP
tBid:Bax
tBi
d:Ba
x
M*:CyCm
M
*:CyC
m
M*:Smacm
M*:S
mac
m
CyC:Apaf
CyC
:Apa
f
pC6
pC6
Pore
Pore
C6
C6
C3:pC6
C
3:pC
6
C6:pC8
C
6:pC
8
136 238 5 61439 337 4 7404335464515 8113029345612132528272657492217191820162421514832333150554447525354 923104142
136 238 5 61439 337 4 7404335464515 8113029345612132528272657492217191820162421514832333150554447525354 923104142 0
0.5
1
1.5
2
2.5
3
Species correlations (Eric)
Hypothesis :
Pr(St=h,ESt=l,Et=m,Pt=h) =
Pr(St=h,ESt=l) Pr(ESt=l, Et=m) Pr(ESt=l,Pt=h)
Pr2(ESt=l)
S ES E
P
Clustered Factored Frontierwe assume that relations not in tree are irrelevant
Apoptosis pathway
−1.5 −1 −0.5 0 0.5 1 1.5
−1.5
−1
−0.5
0
0.5
1
1.5
1
R
2
R*
3flip
4 pC8
5
C8
6
Bar
7 pC3
8 C3
9
pC6 10
C6
11XIAP
12
PARP
13
cPARP
14
Bid
15
tBid
16
Mcl1
17
Bax
18
Bax*
19
Bax*
m
20
Bax2
21Bax4
22
Bcl2
23
Pore
24
Pore*
25
CyCm
26
CyC
r
27
CyC
28
Smacm
29 Smacr
30 Smac
31 Apaf
32 Apaf*
33 pC9
34 Apop
35 C3U
36
L:R
37 R
*:flip
38
R
*:pC8
39
C8:Bar
40 C8:pC3
41
C3:pC6
42
C6:pC
8
43 C3:XIAP
44
C3:PARP
45 C8:Bid
46 tBid:Mcl1
47
tBid:Bax
48Bax*m:Bcl2
49
Bax
2:Bc
l2
50
Bax4:Bcl2
51 Bax4:M
52
M*:CyCm53
M*:Smacm54
CyC:Apaf
55
Apop:pC3
56
Apop:XIAP57
Sm
ac:X
IAP
Apoptosis pathway
Clustered Factored Frontier
ES
S
E
P
t0 t1 t2 t3
+CPT
S ES
E P
k+1
k-1
k+2S + E <—> ES —> P + E
Clustered Factored Frontier
ES
S
E
P
t0 t1 t2 t3
+CPT
S ES
E P
k+1
k-1
k+2S + E <—> ES —> P + E
Pt1(s’,es’)=Σs,es,e (Pt0(s,es,e)CPT(s,es,e,s’)CPT(s,es,e,es’))
How our algorithm work
Hypothesis :
Pr(St=h,ESt=l,Et=m,Pt=h) =
Pr(St=h,ESt=l) Pr(ESt=l, Et=m) Pr(ESt=l,Pt=h)
Pr2(ESt=l)
S ES E
P
How to compute P(parents(Cluster))
Proposition : P(Xp = vp, XL = VL, XR =VR) = P(Xp = vp, XL = VL) x P(Xp = vp, XR =VR)
P(Xp = vp)
p
L R
How to compute P(parents(Cluster))
Parent_Cluster= set of nodes necessary to use the CPTs.
How to compute P(parents(Cluster))
How to compute P(parents(Cluster))
How to compute P(parents(Cluster))
How to compute P(parents(Cluster))
How to compute P(parents(Cluster))
Independence between trees Complexity : Species x Values Parents_Cluster+1
Algorithm comparison
FF ClusteredFF Exact computation
Complexity Species x ValuesNbParents
Species x ValuesParents_Cluster+1 > ValuesSpecies
Accuracy Low ? but better than FF Exact
Conclusion
• Our program is currently still being written. Results will tell if the accuracy is good or not.
• After the first results are obtained we will upgrade it to accept bigger clusters and non-tree graphs
How our algorithm work
How our algorithm work
How our algorithm work
How our algorithm work
How our algorithm work
Order S x N
How our algorithm work
• For each time T groups of clusters are found
• Most efficient path is found to calculate each cluster
• Calculate probability using CPTs
• Results are saved, cluster probabilities are kept in memory
Clustered Factored Frontier
A
A*
A <—> A* CPT:
96.04% A = h , A* = l 0.04% A = l , A* = h 1.96% A = h , A* = h 1.96% A = l , A* = l 0.04% A = h , A* = l 96.04% A = l , A* = h 1.96% A = h , A* = h 1.96% A = l , A* = l
98% : A = h A* = l —> A = h 2% : A = h A* = l —> A = l 2% : A = l A* = h —> A = h 98% : A = l A* = h —> A = l 2% : A = h A* = l —> A* = h 98% : A = h A* = l —> A* = l 98% : A = l A* = h —> A* = h 2% : A = l A* = h —> A* = l
50% A = h A* = l 50% A = l A* = h :
Clustered Factored Frontier
A
A*
A <—> A* CPT:
53.04% A = h , A* = l 53.04% A = l , A* = h 1.96% A = h , A* = h 1.96% A = l , A* = l
98% : A = h A* = l —> A = h 2% : A = h A* = l —> A = l 2% : A = l A* = h —> A = h 98% : A = l A* = h —> A = l 2% : A = h A* = l —> A* = h 98% : A = h A* = l —> A* = l 98% : A = l A* = h —> A* = h 2% : A = l A* = h —> A* = l
50% A = h A* = l 50% A = l A* = h :