Oliver Stegle and Karsten Borgwardt: Computational Approaches for Analysing Complex Biological Systems, Page 1 GP Applications Oliver Stegle and Karsten Borgwardt Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology, Tübingen
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Oliver Stegle and Karsten Borgwardt: Computational Approaches for Analysing Complex Biological Systems, Page 1
GP ApplicationsOliver Stegle and Karsten Borgwardt
Machine Learning andComputational Biology Research Group,
Max Planck Institute for Biological Cybernetics andMax Planck Institute for Developmental Biology, Tübingen
Outline
Outline
O. Stegle & K. Borgwardt GP Applications Tubingen 1
Application1: modelling physiological time series
Outline
Application1: modelling physiological time seriesOverviewGaussian process prior for heart rateResults
Application 2: differential gene expressionOverviewA Gaussian process two-sample testExperimental Results on ArabidopsisDetecting Temporal Patterns of Differential Expression
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Summary
O. Stegle & K. Borgwardt GP Applications Tubingen 2
Application1: modelling physiological time series
Motivation
I Human heart rate is an important physiological trait.
I Measurement over long periods only viable with poor sensors.
I Motivation Gaussian process model for heart rate.
O. Stegle & K. Borgwardt GP Applications Tubingen 3
Application1: modelling physiological time series
Motivation
I Human heart rate is an important physiological trait.
I Measurement over long periods only viable with poor sensors.
I Motivation Gaussian process model for heart rate.
O. Stegle & K. Borgwardt GP Applications Tubingen 3
Application1: modelling physiological time series
The problemDataset
4 days of heart data
0 1000 2000 3000 4000 5000 6000 70000
50
100
150
200
250
300
t/min
hr/
bm
p
Features
I Different noise sources
I Two time scales, 24hrhythms
I Asymmetric around themean
I Auxiliary variablesindicative of noise
O. Stegle & K. Borgwardt GP Applications Tubingen 4
Differential Gene Expression in Time SeriesChallenges
I Time series expression profiles vary smoothly over time.
I Noisy observations – outliers.
I Multiple replicates.
I Few observations.
I Temporal patterns (intervals) of differential gene expression.
10 20 30 40Time/h
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
Log
expr
essi
onle
vel
TreatmentControl
Non-differentially expressed
10 20 30 40Time/h
−1
0
1
2
3
4
5
6
7
Log
expr
essi
onle
vel
TreatmentControl
Differentially expressed
O. Stegle & K. Borgwardt GP Applications Tubingen 21
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian process ModelModel comparison
I The basic idea – a comparison of two models:I The shared model: Expression levels are explained by a single process.I The independent model: Expression levels are explained by two
separate processes.
O. Stegle & K. Borgwardt GP Applications Tubingen 22
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian process ModelModel comparison
I The basic idea – a comparison of two models:I The shared model: Expression levels are explained by a single process.I The independent model: Expression levels are explained by two
separate processes.
O. Stegle & K. Borgwardt GP Applications Tubingen 22
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian process ModelModel comparison
I The basic idea – a comparison of two models:I The shared model: Expression levels are explained by a single process.I The independent model: Expression levels are explained by two
separate processes.
O. Stegle & K. Borgwardt GP Applications Tubingen 22
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian process modelBayesian Network: Shared Model
I Data in conditions A and Bobserved at N time points withR replicates.
I A Gaussian process priorincorporates beliefs aboutsmoothness.
I Noise is is modeled separatelyper-replicate, σA/Br .
O. Stegle & K. Borgwardt GP Applications Tubingen 23
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian process modelBayesian Network: Shared Model
I Data in conditions A and Bobserved at N time points withR replicates.
I A Gaussian process priorincorporates beliefs aboutsmoothness.
I Noise is is modeled separatelyper-replicate, σA/Br .
O. Stegle & K. Borgwardt GP Applications Tubingen 23
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian process modelBayesian Network: Shared Model
I Data in conditions A and Bobserved at N time points withR replicates.
I A Gaussian process priorincorporates beliefs aboutsmoothness.
I Noise is is modeled separatelyper-replicate, σA/Br .
O. Stegle & K. Borgwardt GP Applications Tubingen 23
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian process modelBayesian Network: Both Models
I The independent model follows in an analogous manner.
Shared model HS Independent model HI
O. Stegle & K. Borgwardt GP Applications Tubingen 24
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian Process ModelInference
I Models are compared using the Bayes factor
Score = log
Independent model︷ ︸︸ ︷P (DA,DB |HI)
P (DA,DB |HS)︸ ︷︷ ︸Shared model
.
I Writing out the GP models explicitly leads to
Score = logP (YA |HGP,T
A, )P (YB |HGP,TB, )
P (YA ∪YB |HGP,TA ∪TB, ).
(YA/B : expression levels in conditions A and B; TA/B : observation time points)
O. Stegle & K. Borgwardt GP Applications Tubingen 25
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian Process ModelInference
I Models are compared using the Bayes factor
Score = log
Independent model︷ ︸︸ ︷P (DA,DB |HI)
P (DA,DB |HS)︸ ︷︷ ︸Shared model
.
I Writing out the GP models explicitly leads to
Score = logP (YA |HGP,T
A,θI)P (YB |HGP,T
B,θI)
P (YA ∪YB |HGP,TA ∪TB,θS).
(YA/B : expression levels in conditions A and B; TA/B : observation time points)
O. Stegle & K. Borgwardt GP Applications Tubingen 25
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian Process ModelInference
I Models are compared using the Bayes factor
Score = log
Independent model︷ ︸︸ ︷P (DA,DB |HI)
P (DA,DB |HS)︸ ︷︷ ︸Shared model
.
I Writing out the GP models explicitly leads to
Score = logP (YA |HGP,T
A, θI )P (YB |HGP,T
B, θI )
P (YA ∪YB |HGP,TA ∪TB, θS ).
(YA/B : expression levels in conditions A and B; TA/B : observation time points)
O. Stegle & K. Borgwardt GP Applications Tubingen 25
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian Process ModelShared Model
I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is
P (f |Y,T,θK,θL) ∝N (f |0,KT(θK))
×∏
c∈{A,B}
R∏r=1
N∏n=1
pL(ycr,tn | ftn ,θL),
I Covariance funciton (kernel)
I Noise model
I Hyperparameters θS = {θK,θL} (length scale, noise levels)
I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N
(ycr,t
∣∣∣ f cr,t, (σcr)2), the
model is tractable in closed form.
O. Stegle & K. Borgwardt GP Applications Tubingen 26
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian Process ModelShared Model
I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is
P (f |Y,T,θK,θL) ∝N(f∣∣∣0, KT(θK)
)×
∏c∈{A,B}
R∏r=1
N∏n=1
pL(ycr,tn | ftn ,θL),
I Covariance funciton (kernel)
I Noise model
I Hyperparameters θS = {θK,θL} (length scale, noise levels)
I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N
(ycr,t
∣∣∣ f cr,t, (σcr)2), the
model is tractable in closed form.
O. Stegle & K. Borgwardt GP Applications Tubingen 26
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian Process ModelShared Model
I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is
P (f |Y,T,θK,θL) ∝N(f∣∣∣0, KT(θK)
)×
∏c∈{A,B}
R∏r=1
N∏n=1
pL(ycr,tn | ftn ,θL) ,
I Covariance funciton (kernel)
I Noise model
I Hyperparameters θS = {θK,θL} (length scale, noise levels)
I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N
(ycr,t
∣∣∣ f cr,t, (σcr)2), the
model is tractable in closed form.
O. Stegle & K. Borgwardt GP Applications Tubingen 26
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian Process ModelShared Model
I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is
P (f |Y,T, θK,θL ) ∝N(f∣∣∣0, KT(θK)
)×
∏c∈{A,B}
R∏r=1
N∏n=1
pL(ycr,tn | ftn ,θL) ,
I Covariance funciton (kernel)
I Noise model
I Hyperparameters θS = {θK,θL} (length scale, noise levels)
I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N
(ycr,t
∣∣∣ f cr,t, (σcr)2), the
model is tractable in closed form.
O. Stegle & K. Borgwardt GP Applications Tubingen 26
Application 2: differential gene expression A Gaussian process two-sample test
Gaussian Process ModelShared Model
I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is
P (f |Y,T, θK,θL ) ∝N(f∣∣∣0, KT(θK)
)×
∏c∈{A,B}
R∏r=1
N∏n=1
pL(ycr,tn | ftn ,θL) ,
I Covariance funciton (kernel)
I Noise model
I Hyperparameters θS = {θK,θL} (length scale, noise levels)
I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N
(ycr,t
∣∣∣ f cr,t, (σcr)2), the
model is tractable in closed form.
O. Stegle & K. Borgwardt GP Applications Tubingen 26
Application 2: differential gene expression A Gaussian process two-sample test
Robustness With Respect to Outliers
I Outliers in the expression profilecan obscure the regressionresults.
I A mixture noise-model accountsfor outliers.
I Inference in this model is doneusing Expectation Propagation.
O. Stegle & K. Borgwardt GP Applications Tubingen 27
Application 2: differential gene expression A Gaussian process two-sample test
Robustness With Respect to Outliers
I Outliers in the expression profilecan obscure the regressionresults.
I A mixture noise-model accountsfor outliers.
6 4 2 0 2 4 60.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
I Inference in this model is doneusing Expectation Propagation.
O. Stegle & K. Borgwardt GP Applications Tubingen 27
Application 2: differential gene expression A Gaussian process two-sample test
Robustness With Respect to Outliers
I Outliers in the expression profilecan obscure the regressionresults.
I A mixture noise-model accountsfor outliers.
6 4 2 0 2 4 60.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
I Inference in this model is doneusing Expectation Propagation.
O. Stegle & K. Borgwardt GP Applications Tubingen 27
Application 2: differential gene expression Experimental Results on Arabidopsis
Illustration of the Model ComparisonA Differentially Expressed Gene
I Shared model.
O. Stegle & K. Borgwardt GP Applications Tubingen 28
Application 2: differential gene expression Experimental Results on Arabidopsis
Illustration of the Model ComparisonA Differentially Expressed Gene
I Independent model.
O. Stegle & K. Borgwardt GP Applications Tubingen 28
Application 2: differential gene expression Experimental Results on Arabidopsis
Illustration of the Model ComparisonA Differentially Expressed Gene
I Model comparison.
O. Stegle & K. Borgwardt GP Applications Tubingen 28
Application 2: differential gene expression Experimental Results on Arabidopsis
Predictive Performance (RECOMB09)
I Data:I 30,336 Arabidopsis thaliana gene probesI Biotic stress: fungus infectionI 24 time points, 4 biological replicates
I Evaluation of alternative methods on 2000 randomly chosenhuman-labeled probes:
I GP no robustI GP robustI F-Test (FT) (MAANOVA package)I Timecourse (TC) (Tai and Speed)
O. Stegle & K. Borgwardt GP Applications Tubingen 29
Application 2: differential gene expression Experimental Results on Arabidopsis
Predictive Performance (RECOMB09)
I Data:I 30,336 Arabidopsis thaliana gene probesI Biotic stress: fungus infectionI 24 time points, 4 biological replicates
I Evaluation of alternative methods on 2000 randomly chosenhuman-labeled probes:
I GP no robustI GP robustI F-Test (FT) (MAANOVA package)I Timecourse (TC) (Tai and Speed)
O. Stegle & K. Borgwardt GP Applications Tubingen 29
Application 2: differential gene expression Experimental Results on Arabidopsis
Predictive Performance (RECOMB09)ROC Curves
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive rate
Tru
e po
sitiv
e ra
te
GP robust (AUC 0.985)GP standard (AUC 0.944)FT (AUC 0.859)TC (AUC 0.869)
O. Stegle & K. Borgwardt GP Applications Tubingen 30
Detecting Transition Points in Arabidopsis Microarray Time SeriesStart Times for Gene Categories
I This distribution can bebroke down into genecategories.
I WRKY Family oftranscription factors isknown to be involved instress response.
Distribution of differential start time
O. Stegle & K. Borgwardt GP Applications Tubingen 38
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Outline
Application1: modelling physiological time seriesOverviewGaussian process prior for heart rateResults
Application 2: differential gene expressionOverviewA Gaussian process two-sample testExperimental Results on ArabidopsisDetecting Temporal Patterns of Differential Expression
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Summary
O. Stegle & K. Borgwardt GP Applications Tubingen 39
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Motivation
I This part of the course is inspired and based on results of apublication by Neil D. Lawrence et al.Modelling transcriptional regulation using Gaussian processesftp://ftp.dcs.shef.ac.uk/home/neil/gpsim.pdf.
O. Stegle & K. Borgwardt GP Applications Tubingen 40
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Motivation
I Microarray technologies allow tomeasure mRNA levels.
I The functional proteins and theirconcentration levels remain unobserved.
I Motivation: Infer the hidden proteinconcentrations?
O. Stegle & K. Borgwardt GP Applications Tubingen 41
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Motivation
I Microarray technologies allow tomeasure mRNA levels.
I The functional proteins and theirconcentration levels remain unobserved.
I Motivation: Infer the hidden proteinconcentrations?
O. Stegle & K. Borgwardt GP Applications Tubingen 41
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Motivation
I Microarray technologies allow tomeasure mRNA levels.
I The functional proteins and theirconcentration levels remain unobserved.
I Motivation: Infer the hidden proteinconcentrations?
O. Stegle & K. Borgwardt GP Applications Tubingen 41
Application 3: Modeling transcriptional regulation using Gaussianprocesses
A single gene model
I The change in gene expression abundance yi for a gene i isapproximately described by a differential equation model of the form
dyi(t)
dt= Bi + Sif(t)−Diyi(t)
I f(t) regulatory transcription factor.I Bi basal transcription rate.I Si sensitivity of the gene to the transcription factor.I Di decay rate of the mRNA.
I Goal: infer the unobserved activation f(t) from mRNA measurementsof multiple target genes.
O. Stegle & K. Borgwardt GP Applications Tubingen 42
Application 3: Modeling transcriptional regulation using Gaussianprocesses
A single gene model
I The change in gene expression abundance yi for a gene i isapproximately described by a differential equation model of the form
dyi(t)
dt= Bi + Sif(t)−Diyi(t)
I f(t) regulatory transcription factor.I Bi basal transcription rate.I Si sensitivity of the gene to the transcription factor.I Di decay rate of the mRNA.
I Goal: infer the unobserved activation f(t) from mRNA measurementsof multiple target genes.
O. Stegle & K. Borgwardt GP Applications Tubingen 42
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Derivative observations
I The key to solving this problem are derivative observations.
I Given knowledge about the derivative of a function f we would like toinfer its function values:
dy =∂f(t)
∂t
I We wish to find the joint probability of function values and functionderivatives
cov(dyi, yj) =∂
∂ticov(yi, yj)
cov(dyi, dyj) =∂2
∂ti∂tjcov(yi, yj)
I Using these covariance functions we can combine functionobservations and derivatives as training data in GP regression.
O. Stegle & K. Borgwardt GP Applications Tubingen 43
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Derivative observations
I The key to solving this problem are derivative observations.
I Given knowledge about the derivative of a function f we would like toinfer its function values:
dy =∂f(t)
∂t
I We wish to find the joint probability of function values and functionderivatives
cov(dyi, yj) =∂
∂ticov(yi, yj)
cov(dyi, dyj) =∂2
∂ti∂tjcov(yi, yj)
I Using these covariance functions we can combine functionobservations and derivatives as training data in GP regression.
O. Stegle & K. Borgwardt GP Applications Tubingen 43
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Derivative observationsSquared exponential kernel
I For the squared exponential kernel we obtain:
cov(yi, yj) = k(ti, tj) = A2e−0.5(ti−tj)
2
L2
cov(dyi, yj) = −cov(yi, yj)(ti − tj)L2
cov(dyi, dyj) = cov(yi, yj)1
L2
[δi,j −
1
L2(ti − tj)2
]
O. Stegle & K. Borgwardt GP Applications Tubingen 44
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Derivative observationsExample
(From E. Solak et al.
Derivative observations in Gaussian Process models of dynamic systems)
O. Stegle & K. Borgwardt GP Applications Tubingen 45
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Back to the ODE model for gene regulation
dyi(t)
dt= Bi + Sif(t)−Diyi(t)
I An explicit solution of the ODE system can be derived (standard ODEtechniques)
yi(t) =BiDi
+ kie−Dit + Sie
−Dit
∫ t
0
f(u) exp(Diu)du
yi(t) =BiDi
+ Li[f ](t)
I Realizing that Li is a linear operator (like taking the derivative), wecan again evaluate the covariance between f(t) and Li[f ](t).
O. Stegle & K. Borgwardt GP Applications Tubingen 46
Application 3: Modeling transcriptional regulation using Gaussianprocesses
ODE model for gene regulationInference results
(From N. D. Lawrence et al.
Modelling transcriptional regulation using Gaussian processes)
O. Stegle & K. Borgwardt GP Applications Tubingen 47
Summary
Outline
Application1: modelling physiological time seriesOverviewGaussian process prior for heart rateResults
Application 2: differential gene expressionOverviewA Gaussian process two-sample testExperimental Results on ArabidopsisDetecting Temporal Patterns of Differential Expression
Application 3: Modeling transcriptional regulation using Gaussianprocesses
Summary
O. Stegle & K. Borgwardt GP Applications Tubingen 48
Summary
Summary
I The design and choice of covariance functions allows for flexiblemodeling tasks.
I Prior on heart rateI Derivative observations, ODE systems
I Model comparison using Gaussian processes.I Testing for differential gene expression.
O. Stegle & K. Borgwardt GP Applications Tubingen 49