Reconstructing gene regulatory networks with probabilistic models Marco Grzegorczyk Dirk Husmeier
Jan 14, 2016
Reconstructing gene regulatory networks
with probabilistic models
Marco GrzegorczykDirk Husmeier
Regulatory network
Network unknown
High-throughput experiments
Postgenomic
data
Machine learning
Statistics
Overview
• Introduction
• Bayesian networks
• Comparative evaluation
• Integration of biological prior knowledge
• A non-homogeneous Bayesian network for non-stationary processes
• Current work
Overview
• Introduction
• Bayesian networks
• Comparative evaluation
• Integration of biological prior knowledge
• A non-homogeneous Bayesian network for non-stationary processes
• Current work
Elementary molecular biological processes
Description with differential equations
Rates
Concentrations
Kinetic parameters q
Given: Gene expression time series
Can we infer the correct gene regulatory network?
Parameters q known: Numerically integrate the differential equations for different hypothetical networks
Model selection for known parameters q
Gene expression time series predicted with different modelsMeasured gene
expression time series
Highest likelihood: best model
Compare
Model selection for unknown parameters q
Gene expression time series predicted with different modelsMeasured gene
expression time series
Highest likelihood: over-fitting
Bayesian model selection
Select the model with the highest posterior probability:
This requires an integration of the whole parameter space:
This integral is usually intractable
Marginal likelihoods for the alternative pathways
Computational expensive, network reconstruction ab initio unfeasible
Overview
• Introduction
• Bayesian networks
• Comparative evaluation
• Integration of biological prior knowledge
• A non-homogeneous Bayesian network for non-stationary processes
• Current work
Objective: Reconstruction of regulatory networks ab initio
Higher level of abstraction: Bayesian networks
Bayesian networks
A
CB
D
E F
NODES
EDGES
•Marriage between graph theory and probability theory.
•Directed acyclic graph (DAG) representing conditional independence relations.
•It is possible to score a network in light of the data: P(D|M), D:data, M: network structure.
•We can infer how well a particular network explains the observed data.
),|()|(),|()|()|()(
),,,,,(
DCFPDEPCBDPACPABPAP
FEDCBAP
Bayes net
ODE model
[A]= w1[P1] + w2[P2] + w3[P3] +
w4[P4] + noise
Linear model
A
P1
P2
P4
P3
w1
w4
w2
w3
Nonlinear discretized model
P1
P2
P1
P2
Activator
Repressor
Activator
Repressor
Activation
Inhibition
Allow for noise: probabilities
Conditional multinomial distribution
Model Parameters q
Integral analytically tractable!
Example: 2 genes 16 different network structures
Best network: maximum score
Identify the best network structure
Ideal scenario: Large data sets, low noise
Uncertainty about the best network structure
Limted number of experimental replications, high noise
Sample of high-scoring networks
Sample of high-scoring networks
Feature extraction, e.g. marginal posterior probabilities of the edges
Sample of high-scoring networks
Feature extraction, e.g. marginal posterior probabilities of the edges
High-confident edge
High-confident non-edge
Uncertainty about edges
Can we generalize this scheme to more than 2 genes?
In principle yes.
However …
Number of structures
Number of nodes
Complete enumeration unfeasible Hill climbing
increasesAccept move when
Configuration space of network structures
Local optimum
Configuration space of network structures
MCMC Local change
If accept
If accept with probability
Algorithm converges to
Madigan & York (1995), Guidici & Castello (2003)
Configuration space of network structures
Problem: Local changes small steps slow convergence, difficult to cross valleys.
Configuration space of network structures
Problem: Global changes large steps low acceptance slow convergence.
Configuration space of network structures
Can we make global changes that jump onto other peaks and are likely to be accepted?
Conventional scheme New scheme
MCMC trace plots
Plot of against iteration number
Overview
• Introduction
• Bayesian networks
• Comparative evaluation
• Integration of biological prior knowledge
• A non-homogeneous Bayesian network for non-stationary processes
• Current work
Cell membran
nucleus
Example: Protein signalling pathway
TF
TF
phosphorylation
-> cell response
Evaluation on the Raf signalling pathway
From Sachs et al Science 2005
Cell membrane
Receptor molecules
Inhibition
Activation
Interaction in signalling pathway
Phosphorylated protein
Flow cytometry data
• Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins
• 5400 cells have been measured under 9 different cellular conditions (cues)
• Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments
Simulated data or “gold standard” from the literature
Simulated data or “gold standard” from the literature
Simulated data or “gold standard” from the literature
From Perry Sprawls
ROC curve
5 FP counts
BN
GGM
RN
ROC curveFP
TP
Four different evaluation criteria
DGE UGE
TP for fixed FP
Area under the curve (AUC)
Synthetic data, observations
Relevance networksBayesian
networksGraphical Gaussian models
Synthetic data, interventions
Cytometry data, interventions
Overview
• Introduction
• Bayesian networks
• Comparative evaluation
• Integration of biological prior knowledge
• A non-homogeneous Bayesian network for non-stationary processes
• Current work
Can we complement microarray data with prior knowledge from public data bases like KEGG?
KEGG pathwayMicroarray data
How do we extract prior knowledge from a collection of KEGG pathways?
Total number of times the gene pair [i,j ] is included in the extracted pathways
Total number of edges i j that appear in the extracted pathways
=
Example: Extract 20 pathways, 10 contain [i,j ], 8 contain i j
B = 8/10 = 0.8i,j
Relative frequency of edge occurrence
Prior knowledge from KEGG
Raf network
0.25
00.5
0
0.5
0.87
0
1
0.5
0 0
0.5
0
10.71
0
0
Prior distribution over networks
Deviation between the network M and the prior knowledge B:
Prior knowledge ε [0,1]
Graph ε {0,1}
Hyperparameter
Hyperparameter β trades off data versus prior knowledge
KEGG pathwayMicroarray data
β
Hyperparameter β trades off data versus prior knowledge
KEGG pathwayMicroarray data
β small
Hyperparameter β trades off data versus prior knowledge
KEGG pathwayMicroarray data
β large
Sample networks and hyperparameters from the posterior distribution
Revision
Prior distribution
Marginal likelihood
Integral analytically tractable for Bayesian networks
Application to the Raf pathway:
Flow cytometry data and KEGG
ROC curveFP
TP
Four different evaluation criteria
DGE UGE
TP for fixed FP
Area under the curve (AUC)
β
Overview
• Introduction
• Bayesian networks
• Comparative evaluation
• Integration of biological prior knowledge
• A non-homogeneous Bayesian network for non-stationary processes
• Current work
Example: 4 genes, 10 time points
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Standard dynamic Bayesian network: homogeneous model
Our new model: heterogeneous dynamic Bayesian network. Here: 2 components
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Our new model: heterogeneous dynamic Bayesian network. Here: 3 components
We have to learn from the data:
• Number of different components
• Allocation of time points
Two MCMC strategies
q
k
h
Number of components (here: 3)
Allocation vector
Synthetic study: posterior probability of the number of components
Circadian clock in Arabidopsis thaliana Collaboration with the Institute of Molecular Plant
Sciences (Andrew Millar)
• Focus on 9 circadian genes.•2 time series T20 and T28 of microarray gene expression data from Arabidopsis thaliana.• Plants entrained with different light:dark cycles10h:10h (T20) and 14h:14h (T28)
macrophage
cytomegalovirus
Interferon gamma
Macrophage
Cytomegalovirus (CMV)
Interferon gamma IFNγ
InfectionTreatment
Collaboration with DPM
macrophage
IFNγ12 hour time course measuring total RNA
0 1 2 3 4 5 6 7 8 9 10 11 12
72 Agilent Arrays
Time series statistical analysis (using EDGE)
Clustering Analysis
30 min sampling
24 samples per group:
• Infection with CMV
• Pre-treatment with IFNγ
• IFNγ + CMV
CMV
Posterior probability of the number of components
IRF1
IRF2
IRF3
Literature “Known” interactions between three cytokines: IRF1, IRF2 and IRF3
Evaluation: Average marginal posterior probabilities of
the edges versus non-edges
Sample of high-scoring networks
IRF1
IRF2
IRF3
Gold standard known Posterior probabilities of true interactions
AUROC scores
New modelBGeBDe
Collaboration with the Institute of Molecular Plant
Sciences at Edinburgh University
2 time series T20 and T28 of microarray gene expression data from Arabidopsis thaliana.
- Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4,
ELF3, GI, PRR9, PRR5, and PRR3
- Both time series measured under constant light condition
at 13 time points: 0h, 2h,…, 24h, 26h
- Plants entrained with different light:dark cycles
10h:10h (T20) and 14h:14h (T28)
Circadian rhythms in Arabidopsis thaliana
Gene expression time series plots (Arabidopsis data T20 and T28)
T28 T20
Posterior probability of the number of components
Predicted network
Blue – activation
Red – inhibition
Black – mixture
three different line widths - thin = PP>0.5- medium = PP>0.75- fat = PP>0.9
Overview
• Introduction
• Bayesian networks
• Comparative evaluation
• Integration of biological prior knowledge
• A non-homogeneous Bayesian network for non-stationary processes
• Current work
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Standard dynamic Bayesian network: homogeneous model
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Heterogeneous dynamic Bayesian network
Heterogenous dynamic Bayesian network with node-specific breakpoints
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
X(1) X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7 X1,8 X1,9 X1,10
X(2) X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7 X2,8 X2,9 X2,10
X(3) X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7 X3,8 X3,9 X3,10
X(4) X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7 X4,8 X4,9 X4,10
Evaluation on synthetic data
X
Y(1) Y(2) Y(3)
f: three phase-shifted sinusoids
BGe
Heterogeneous BNet without/with nodespecific
breakpoints
AUROC
Four time series for A. thaliana under different experimental conditions (KAY,KDE,T20,T28)
Blue – activation
Red – inhibition
Black – mixture
three different line widths - thin = PP>0.5- medium = PP>0.75- fat = PP>0.9
Network obtained for merged data
KAY_LL KDE_LL T20 T28
datadata data datadata data
Monolithic Separate
Propose a compromise between the two
M1 M221
D1 D2
M*
MII
DI
. . .
Compromise between the two previous ways of combining the data
Original work with Adriano:
Poor convergence and mixing due too strong coupling effects.
Marco’s current work:
Improve convergence and mixing by weakening the coupling.
Mean absolute deviation of edge posterior probabilities (independent BN inference)
KAY KDE T20 T28
KAY --- 0.14 0.15 0.14
KDE 0.14 --- 0.19 0.15
T20 0.15 0.19 --- 0.10
T28 0.14 0.15 0.10 ---
Mean absolute deviation of edge posterior probabilities (coupled BN inference)
KAY KDE T20 T28
KAY --- 0.11 0.12 0.11
KDE 0.11 --- 0.13 0.11
T20 0.12 0.13 --- 0.06
T28 0.11 0.11 0.06 ---
Mean absolute deviation of edge posterior (independent BN - coupled BN)
KAY KDE T20 T28
KAY --- 0.03 0.03 0.03
KDE 0.03 --- 0.05 0.03
T20 0.03 0.05 --- 0.04
T28 0.03 0.03 0.04 ---
Summary
• Differential equation models
• Bayesian networks
• Comparative evaluation
• Integration of biological prior knowledge
• A non-homogeneous Bayesian network for non-stationary processes
• Current work
Adriano Werhli
Marco Grzegorzcyk
Thank you!
Any questions?