doi: 10.1098/rsta.2007.2108 , 519-544 366 2008 Phil. Trans. R. Soc. A Stark Daniel Brewer, Martino Barenco, Robin Callard, Michael Hubank and Jaroslav course data Fitting ordinary differential equations to short time References elated-urls http://rsta.royalsocietypublishing.org/content/366/1865/519.full.html#r Article cited in: html#ref-list-1 http://rsta.royalsocietypublishing.org/content/366/1865/519.full. This article cites 31 articles, 2 of which can be accessed free Email alerting service here in the box at the top right-hand corner of the article or click Receive free email alerts when new articles cite this article - sign up http://rsta.royalsocietypublishing.org/subscriptions go to: Phil. Trans. R. Soc. A To subscribe to on October 16, 2012 rsta.royalsocietypublishing.org Downloaded from
27
Embed
Fitting ordinary differential equations to short time course data · Fitting ordinary differential equations to short time course data BY DANIEL BREWER 1,2,MARTINO BARENCO 1,2,ROBIN
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
doi: 10.1098/rsta.2007.2108, 519-544366 2008 Phil. Trans. R. Soc. A
StarkDaniel Brewer, Martino Barenco, Robin Callard, Michael Hubank and Jaroslav course dataFitting ordinary differential equations to short time
This article cites 31 articles, 2 of which can be accessed free
Email alerting service herein the box at the top right-hand corner of the article or click Receive free email alerts when new articles cite this article - sign up
http://rsta.royalsocietypublishing.org/subscriptions go to: Phil. Trans. R. Soc. ATo subscribe to
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
Fitting ordinary differential equations to shorttime course data
BY DANIEL BREWER1,2, MARTINO BARENCO
1,2, ROBIN CALLARD1,2,
MICHAEL HUBANK1,2
AND JAROSLAV STARK3,4,*
1Institute of Child Health, University College London, 30 Guilford Street,London WC1N 1EH, UK
2CoMPLEX, University College London, 4 Stephenson Way,London NW1 2HE, UK
3Department of Mathematics, and 4Centre for Integrative Systems Biology atImperial College, Imperial College London, London SW7 2AZ, UK
Ordinary differential equations (ODEs) are widely used to model many systems in physics,chemistry, engineering and biology. Often one wants to compare such equations withobserved time course data, and use this to estimate parameters. Surprisingly, practicalalgorithms for doing this are relatively poorly developed, particularly in comparisonwith thesophistication of numerical methods for solving both initial and boundary value problems fordifferential equations, and for locating and analysing bifurcations. A lack of good numericalfitting methods is particularly problematic in the context of systems biology where only ahandful of time points may be available. In this paper, we present a survey of existingalgorithms and describe themain approaches.We also introduce and evaluate a new efficienttechnique for estimating ODEs linear in parameters particularly suited to situations wherenoise levels are high and the number of data points is low. It employs a spline-basedcollocation scheme and alternates linear least squares minimization steps with repeatedestimates of the noise-free values of the variables. This is reminiscent of expectation–maximization methods widely used for problems with nuisance parameters or missing data.
Keywords: parameter estimation; ordinary differential equation; time series; splines;collocation; systems biology
On
*ALon
1. Introduction
Ordinary differential equations (ODEs) are one of the most popular frameworks fordescribing the temporal evolution of a wide variety of systems in physics, chemistry,engineering and biology (e.g. Gershenfeld 1999). Such models take the form
dx
dtZ f ðx; t;aÞ; ð1:1Þ
where x is a vector of variables evolvingwith time; f is a vector field; anda denotes an(optional) set of parameters. Once an ODE model has been built, a vast array of
Phil. Trans. R. Soc. A (2008) 366, 519–544
doi:10.1098/rsta.2007.2108
Published online 13 August 2007
e contribution of 14 to a Theme Issue ‘Experimental chaos II’.
uthor and address for correspondence: Department of Mathematics, Imperial College London,don SW7 2AZ, UK ([email protected]).
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
powerful analytical and numerical methods exists for exploring its properties. Anysuchmodelwill usually involve one ormoreparameters,a. In somecases, particularlyin physics where models are based on well-understood physical mechanisms, suchparameters may be derived from first principles, or measured directly. Increasingly,however, ODEs are being applied in disciplines such as cell and molecular biologywhere many parameters cannot be determined by either of these approaches.
In such situations, one can attempt to estimate the unknown parameters fromexperimentally measured data. In most cases, these data consist of time series, ortime courses, of repeated measurements of one or more experimental variables. Incell and molecular biology, these might, for instance, be mRNA or proteinconcentrations measured at several time points throughout an experiment. Onecan fit such data by systematically varying the parameters to determine a set ofparameters which minimize the difference between a solution of the differentialequation and the data (e.g. Gershenfeld 1999). One can also apply the samemethodology to build a ‘black-box’ model by starting with a general set of basisfunctions (such as polynomials or radial basis functions) and then estimatingtheir coefficients by minimizing the discrepancy between model and data. Inprinciple, such an approach can be used to deduce the network of interactionsbetween the variables (Stark et al. 2003a,b) which in turn can lead to morerefined mechanistic models.
A variety of approaches exists to fit data toODEmodels. Unfortunately,many ofthese are poorly documented in the literature, and may only be described in thecontext of specific applications in specialist publications. We therefore present anoverview of the key concepts below.We show that suchmethods can be classified bywhether theODE is solved using a conventional iterative numerical integrator suchas fourth-order Runge–Kutta, or whether a global solution is approximated usingsplines or related methods. This distinction is similar to that between shooting andcollocation methods for boundary value problems (Golub & Ortega 1992).Although global methods are now generally preferred for solving boundary valueproblems, they are poorly developed in the context of fitting ODEs to data. In suchcases, shooting is generally more popular and well known.
Most experiments in cell and molecular biology produce very short time courses,often with very noisy measurements. Shooting-type methods tend to performparticularly poorly in such situations; we show an example of this below. As ODEsbecome more widely applied in this area, particularly with the current rapid growthin systems biology, there is therefore an urgent need to develop alternative methodsmore suited to this type of data. The use of collocation-type algorithms appears tobe particularly attractive. We explain the main ideas behind this approach below,describe existing algorithms and present a new two-stage algorithm for ODEs thatare linear in parameters. To illustrate these ideas, we use a model of the p53 tumour-suppressor network, which is described briefly in appendix A.
2. Estimating parameters in ODEs
(a ) General principles of model fitting
Any method for estimating model parameters from data requires two mainingredients. We need to construct an error function EDðaÞ that quantifies thedifference between a model with parameters a and the data, and we need an
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
optimization method that finds the value of a that minimizes EDðaÞ. Except forerror functions that have particular features (such as those occurring in linearleast squares problems), the minimization stage requires an iterative approach.Typically, the minimization method can be chosen independently of theconstruction of the error function. In some cases, however, an integratedapproach can have advantages. This can occur, for instance, if it is sufficient tocompute only a rough estimate of E DðaÞ at each iteration of the minimizationscheme, rather than calculating E DðaÞ exactly.
A wide variety of standardminimization algorithms exist (e.g. Gershenfeld 1999).Where E DðaÞ has no, or only a few, local minima apart from the global minimum,then methods that iteratively step downhill, such as the Nelder–Mead simplexmethod (Nelder & Mead 1965 or see Press et al. 2002) or the Levenberg–Marquardtmethod (Marquardt 1963 or see Press et al. 2002) work well. If, on the other hand,E DðaÞ has a more complex landscape, stochastic search algorithms such assimulated annealing (Gershenfeld 1999 or Press et al. 2002), Markov Chain MonteCarlo (e.g. Gilks et al. 1996) or genetic algorithms (e.g. Gershenfeld 1999) are oftennecessary. These are usually computationally very demanding.
Attempting to estimate the parameters a of the ODE in equation (1.1) fromobserved data presents additional difficulties. These are mainly centred on theconstruction and efficient computation of a suitable error function. This isbecause we cannot directly determine how well a given set of data points
DZ fxðtiÞ : i Z 1;.; ng;
fits the ODE in equation (1.1). We shall describe two common strategies for theconstruction and minimization of appropriate error functions.
(b ) Solution-based approaches
By far the better known, and arguably statistically more valid, approach is tosolve equation (1.1) numerically to obtain an approximate solution uðtÞ such that
du
dtzf ðu; t;aÞ: ð2:1Þ
Since this is meant to provide a model for the data, the points xðtiÞ should be closeto the values uðtiÞ. It is thus natural to base the error function EDðaÞ on thedifference between uðtiÞ and xðtiÞ. The most common choice is to use the leastsquares error
EDðaÞZXniZ1
jjuðtiÞK xðtiÞjj2; ð2:2Þ
possibly weighted by the reciprocal of the noise level at each data point. ThesubscriptD here highlights that this is an error between the data and the solutionu,in order to distinguish this quantity fromother error functions defined below.Whenmeasurement errors are independently normally distributed, EDðaÞ will be thelogarithm of the likelihood of the data (or more strictly, log (likelihood)), andminimizing equation (2.2) is equivalent to maximum likelihood estimation of theparameters. It is also possible to use this approach even if we cannot measure all ofthe components of the state vector xðtÞ. In such a case, the norm in equation (2.2) is
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
just taken over themeasured components. As long as we have a sufficient number ofdata points to ensure that EDðaÞ has a non-degenerate minimum, it may still bepossible to estimate parameters successfully.
Observe that when f has linear dependence on its parameters, the errorfunction given by equation (2.2) is a quadratic form. The minimum in such a casecan be obtained efficiently in one step using algebraic methods such as QRdecomposition (e.g. Lawson & Hanson 1974). This is much faster and morerobust than the iterative optimization algorithms mentioned above. For ODEsthis possibility seems irrelevant, however, since even if the vector field f dependslinearly on the parameters a, the solution uðtÞ will depend nonlinearly on a.Nevertheless, we shall present below a new approach that makes use of linearityin parameters.
The most popular method of solving the differential equation is to use astandard numerical integration scheme such as fourth-order Runge–Kutta,possibly with adaptive step-size control (e.g. Press et al. 2002). Such an approachimmediately encounters a major potential obstacle: we very rarely know thecorrect initial condition xðt0Þ for all of the variables. Even if we are able tomeasure all of the components of xðt0Þ, such measurements will inevitably besubject to some error. Thus, in practice, we only have xðt0ÞZxðt0ÞCeðt0Þ whereeðt0Þ denotes the experimental error. If the differential equation is globally stable,and eðt0Þ is small, the difference between xðt0Þ and xðt0Þ may not be toosignificant. However, most models in the real world are nonlinear. In such a case,the dynamics can dramatically amplify small differences in the initial conditions.Attempting to estimate parameters for such systems using xðt0Þ as an initialcondition leads to poor results.
(c ) Shooting
A better approach is to regard the initial condition xðt0Þ as an additional set ofunknown parameters which are incorporated in the minimization scheme.We thusregard the error function Eða;xðt0ÞÞ as depending on both a and xðt0Þ. This typeof approach appears to have first been tried by Bellman et al. (1967) and similarideas appear in Swartz & Bremermann (1975). It is closely related to shootingmethods for boundary value problems, including methods used for finding periodicorbits and other special solutions (e.g. Golub & Ortega 1992; Kuznetsov 1995). Itcan work well if data are plentiful and noise levels are low. However, if we only havea few time points, as is typical of cell and molecular biology datasets, Eða;xðt0ÞÞcan have a large number of local minima separated by steep peaks and ridges.As we shall see below, it is difficult to find the global minimum in such cases.
One possible extension is to use multiple shooting methods where the solutionis broken down into a number of successive segments, with appropriate matchingat the joints (e.g. Kuznetsov 1995; Timmer et al. 2000). This can improve results,but our experience with even moderately complex models is that it can sufferfrom similar problems to simple shooting.
(d ) Collocation methods
An alternative to using iterative numerical integration is to represent thesolution globally using a set of convenient basis functions Bj : jZ0;.; p.A suitable choice is usually given by piecewise polynomials, usually of low order.
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
If we match values and derivatives at the joins between adjacent polynomials,globally smooth functions can be obtained, called splines. We can represent theseas linear combinations of the form
uðtÞZXpjZ0
bjBjðtÞ: ð2:3Þ
A variety of different types of spline are available, but often the most convenientare B-splines (De Boor 1978; Gershenfeld 1999). Such splines have minimalsupport with respect to degree, smoothness and partition, and any other splinefunction of a given degree, smoothness and domain partition can be representedas a linear combination of B-splines of that same degree and smoothness over thesame partition. B-splines have the great advantage that each Bj has compactsupport so that evaluating uðtÞ requires only summing over a small number ofBj s. More precisely, if we partition the domain using the knots fsj : jZ0;.; pg,we can recursively define basis functions of increasing order by
4ð0Þj ðtÞZ
0 t!sj
1 sj! t%sjC1
0 sjC1! t
and
8><>:
4ðiÞj ðtÞZ
tK sjsjCiK sj
4ðiK1Þj ðtÞC
sjCiC1Kt
sjCiC1K sjC1
4ðiK1ÞjC1 ðtÞ:
Note that the basis function at a given degree is obtained by interpolatingappropriate combinations of basis functions of one degree less. It is easy to seethat 4
ðiÞj ðtÞ is 0 outside the interval ½sj ; sjCiC1�. In applications, typically cubic
B-splines are used (with iZ3). We shall restrict ourselves to the case of uniform
spacing of the knots and define BjZ4ð3ÞjK2 in order to give a more symmetric form.
An explicit formula is given by
BjðtÞZ
0 t%sjK2
1
6h3ðtKsjK2Þ3 sjK2% t%sjK1
1
6C
1
2hðtKsjK1ÞC
1
2h2ðtKsjK1Þ2K
1
2h3ðtKsjK1Þ3 sjK1% t%sj
1
6C
1
2hðsjC1KtÞC 1
2h2ðsjC1KtÞ2K 1
2h3ðsjC1KtÞ3 sj% t%sjC1
1
6h3ðsjC2KtÞ3 sjC1% t%sjC2
0 tRsjC2
8>>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>>:
where h is the spacing between the knots sj. Observe that for sj% t%sjC1 the sumin equation (2.3) reduces to
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
Furthermore, evaluating at the knot point sj, we have
uðsjÞZ bjK1BjK1ðsjÞCbjBjðsjÞCbjC1BjC1ðsjÞZ1
6bjK1C
2
3bj C
1
6bjC1:
This can be written in matrix form as
uðsK1Þuðs0Þuðs1Þ«
uðspÞuðspC1Þ
0BBBBBBBBBB@
1CCCCCCCCCCA
Z1
6
4 1 0 . 0
1 4 1 . 0
0 1 4 . 0
«
0 . 1 4 1
0 . 0 1 4
0BBBBBBBBBB@
1CCCCCCCCCCA
bK1
b0
b1
«
bp
bpC1
0BBBBBBBBBB@
1CCCCCCCCCCA; ð2:4Þ
where uðsK1Þ and uðspC1Þ are dummy values required to give a well-determinedsystem. These determine the behaviour of u at the boundaries and can, forinstance, be chosen to set the second derivatives of u at the boundaries to 0(a so-called natural cubic B-spline). The matrix in (2.4) is explicitly invertible(though in practice one would solve equation (2.4) using a specialized LRdecomposition). This shows that we can either parametrize the spline uðtÞ usingthe coefficients bj , or using its value at the knot points uðs0Þ;uðs1Þ;.;uðspÞ. Thismakes it straightforward to obtain a spline that interpolates any particular set ofpoints. It is also possible to generalize this derivation to irregularly spaced knotpoints, which can have significant advantages in some applications.
The Bj are polynomials and hence using equation (2.3), we can easilydifferentiate u and substitute the derivative into equation (1.1). In general, sincewe only take a finite number of basis functions, we cannot expect this to besatisfied exactly everywhere, i.e. for every t. Instead, we choose a finite number ofso-called collocation points rk : kZ1;.; q, and require (1.1) to hold at thesepoints, so that
XpC1
jZK1
bjdBj
dtðrkÞZ f
XpC1
jZK1
bjBjðrkÞ; rk ;a !
; ð2:5Þ
for all kZ1;.; q. Note that the derivatives dBj=dt can be easily pre-computedfor any given set of basis functions and collocation points, so that equation (2.5)represents a system of algebraic equations for bZðbK1; b0;.; bpC1Þ, or for thevalues uðs0Þ;uðs1Þ;.;uðspÞ via equation (2.4). Note that equation (2.5) isindependent of the particular choice of spline, or indeed other basis functions.
Given an appropriate choice of splines and collocation points, the system (2.5)is well determined and can be solved to yield the coefficients of the approximatesolution of the differential equation. It turns out that this is equivalent to anappropriate implicit Runge–Kutta integration scheme. This approach isparticularly useful for boundary value problems (Villadsen & Stewart 1967)and today forms the basis of standard packages such as AUTO for finding periodicorbits and bifurcation points. The basic principle is to use the Newton method ora similar root finder to solve equation (2.5), together with appropriate side
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
conditions. These ensure that u is periodic (or homo-/heteroclinic for certainglobal bifurcations) and in the case of finding local bifurcations has particulareigenvalues. In the case of bifurcations, it is necessary for the root finder tosimultaneously vary both the coefficients bj and the parameters a. This bearsclose resemblance to parameter estimation from data and suggests thatcollocation-based methods may also be useful in the latter problem.
However, there is an important difference between the two problems. Whenfinding bifurcations, we simply want to solve a set of (nonlinear) equations madeup of equation (2.5) and whatever side conditions we need to specify thebifurcation. In the case of parameter fitting, the side conditions are replaced bythe minimization of ED(a) which has to be carried out simultaneously with thesolution of equation (2.5). There are a number of possible approaches to this.Observe that if u is given by equation (2.3) then ED has no direct dependence ona but rather is a function of b. From now on, we shall thus denote it by
EDðbÞZXniZ1
����XpC1
jZK1
bjBjðtiÞK xðtiÞ����2:
(i) Nested minimization and collocation
This is closely related to shooting, except that instead of integrating thedifferential equation using a method such as fourth-order Runge–Kutta, we usethe Newton method to solve equation (2.5) at each candidate value of theparameters a. This Newton solver is then nested inside an optimization methodthat iteratively minimizes EDðb;aÞ. This is potentially very slow, and we are notaware of this method appearing in the published literature.
(ii) Simultaneous minimization and collocation
It is possible to construct combinations of gradient-based minimization andNewton root solving which essentially simultaneously linearize both equation(2.5) and EDðbÞ. Given a reasonable initial guess, such an algorithm will rapidlyconverge to a minimum of EDðbÞ that satisfies equation (2.5). The first suchmethod appears to have been published by Baden & Villadsen (1982). Biegler(1984) independently initiated the application of collocation-based methods todynamic optimization, which includes parameter estimation as a special case. Hismethod uses sequential quadratic programming, a common algorithm forminimizing an objective function subject to equality constraints withoutrequiring satisfaction of the constraints at each iteration. This scheme solvesthe exact constraint such as equation (2.5) once, and then at subsequentiterations, it uses only the linearization of the constraint. This is combined with aquadratic approximation to the objective function EDðbÞ which is easilyminimized subject to the linearized constraint. It can be shown that such aniteration converges quadratically to the desired minimum. Biegler’s algorithmhas stimulated a range of variations and extensions (e.g. Tjoa & Biegler 1991;Esposito & Floudas 2000; Wang 2000; Li et al. 2005) and is widely used in thechemical engineering community.
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
(iii) Dual minimization
Observe that instead of using a root finder to solve equation (2.5), we couldminimize the difference between the r.h.s. and l.h.s.
EMðb;aÞZXqkZ1
����XpC1
jZK1
bjdBj
dtðrkÞKf
XpC1
jZK1
bjBjðrkÞ; rk ;a !����2 ð2:6Þ
with respect to bZðbK1; b0;.; bpC1Þ. We could also give a different weight toeach component of the ODE (e.g. Ramsay et al. 2007). We can think of EMðb;aÞas a measure of how well the spline uðtÞ, defined by b, satisfies the differentialequation (1.1). This approach has the advantage that we can take a largernumber q of collocation points. In that case, equation (2.5) will be over-determined, and no longer have a solution, but it is still possible to find aminimum of EMðb;aÞ. Indeed, in the limit q/N, we have
EMðb;aÞZð ����X
pC1
jZK1
bjdBj
dtðrÞKf
XpC1
jZK1
bjBjðrÞ; r ;a !����2dr : ð2:7Þ
The minimization of such a function is often used in the derivation of variouscollocation schemes (e.g. Golub & Ortega 1992). Given either version ofEMðb;aÞ, now our problem is to simultaneously minimize both EDðbÞ andEMðb;aÞ with respect to both b and a. This is most easily achieved byminimizing
~Eðb;aÞZEDðbÞClEMðb;aÞ:
Here l is a weighing factor that determines how much emphasis we place on thedata and the model, respectively. This is a particular attraction of this approach,since if we have high confidence in the model but the data are very noisy, we cantake l large, and conversely if the measurement error for the data is low but themodel is suspect, we can use a low l.
Brewer (2006) compares the Nelder–Mead simplex algorithm and simulatedannealing in the minimization ~Eðb;aÞ, with lZ1 and EMðb;aÞ defined byequation (2.6). A more sophisticated optimization scheme has recently beenpresented by Ramsay et al. (2007) for the case of equation (2.7). Ramsay et al.(2007) also analyse the behaviour of their method in the limit l/N.
An alternative algorithm, applicable when f is linear in parameters, wasintroduced in Brewer (2006) and is presented in §2e. This is motivated by theobservation that if the data are observed without error, so that xðtiÞZxðtiÞ, thenwe know that the rate of change of a solution at ti is precisely f ðxðtiÞ; ti;aÞ. Insuch a case, it would be reasonable to replace EMðb;aÞ by
EMðb;aÞZXniZ1
����XpC1
jZK1
bjdBj
dtðtiÞKf ðxðtiÞ; ti;aÞ
����2:The advantage of this is that EMðb;aÞ is a quadratic form (i.e. a linear leastsquares problem) in ðb;aÞ. Since EDðbÞ is also a quadratic form, the overall errorEDðbÞClEMðb;aÞ is a linear least squares function that can be minimized usinga single QR decomposition.
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
Of course, in general, the data are subject to measurement error. We can,however, still use a modified error function of the above type, with our bestestimate uðtiÞ of the real measurement replacing xðtiÞ in the above formula.Motivated by the expectation–maximization methods used in the case ofnuisance parameters or missing data (Dempster et al. 1977; Moon 1996), wealternatively minimize EDðbÞClEMðb;aÞ and generate a new estimate of thedata. Full details are given below.
(e ) Derivative approximation methods
All of the methods presented so far use the least squares error function inequation (2.2) which determines the discrepancy between the observed data anda numerically computed solution of the differential equation. An alternative is tominimize the discrepancy between the r.h.s. and l.h.s. of the differential equationat a selected number of data points. This gives an error function like
EDðaÞZXqkZ1
���� dudt ðrkÞKf ðuðrkÞ; rk ;aÞ����2: ð2:8Þ
This relationship between minimizing EDðaÞ and EDðaÞ under suitable conditionsis discussed by Baden & Villadsen (1982). Observe the close similarity betweenEDðaÞ and EMðb;aÞ in equation (2.6). In particular, if we chose an approximatesolution u given by equation (2.3), then EDðaÞZEMðb;aÞ. In such circum-stances, minimizing EDðaÞ will be equivalent to minimizing ~Eðb;aÞ in the limitl/N.
Note, however, that EDðaÞ has no direct dependence on the observed dataxðtiÞ. In the case of ~Eðb;aÞ, for a finite l, such dependence is obtained throughEDðbÞ. Alternatively, one can incorporate the data directly into EDðaÞ bychoosing the points uðrkÞ in equation (2.8) to be exactly or approximately thedata points xðtiÞ. The advantage of this is that it can be done without the need tosolve the differential equation (1.1). Instead, we simply need to estimate thederivative du=dt at the points of interest. This can be done by smoothing thedata xðtiÞ using an appropriate local polynomial (or globally using a spline), andthen differentiating the result.
This approach appears to have first been employed byvandenBosch&Hellinckx(1974), possibly inspired by collocation-basedmethods for parameter estimation inpartial differential equations (Seinfeld 1969). Baden & Villadsen (1982) suggestedan improvement and compared both schemes with the simultaneous minimizationofEDðbÞ and solution of equation (2.5), as described above. Swartz & Bremermann(1975) independently published an algorithm of this type and compared it with ashooting method. In contrast to van den Bosch & Hellinckx (1974) and Baden &Villadsen (1982) who used a global spline to smooth the data, Swartz &Bremermann (1975) employed local polynomials to either approximate orinterpolate several successive data points, without any attempt to match suchlocal fits at their intersections. Varah (1982) presented and evaluated an algorithmbased on their ideas but again using a global spline smoothing method.
Finally, in a modern systems biology context, where observations are onlyavailable for a very restricted number of time points, Barenco et al. (2006) usedLagrangian interpolation between nearby data points to estimate the required
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
derivative. This allowed them to estimate both parameters in a simple model ofgene expression and to estimate the activity of an unobserved transcriptionfactor (p53) driving the system. This, in turn, allowed the prediction ofpreviously unknown targets of p53.
3. A new efficient method for ODEs linear in parameters
(a ) Linearity in parameters
Our experience is that direct minimization of ~Eðb;aÞ can be time-consuming(Brewer 2006). One would thus like to use specific features of a particular class ofmodels to develop a more efficient algorithm. In particular, many models inphysics, chemistry, engineering and biology are linear in their parameters. This istrue for both models derived from underlying principles such as mass–actionmodels in chemistry and cell biology, and for data-driven ‘black-box’ models builtwith general set of basis functions (e.g. radial basis functions). Recall that if amodel is linear in its parameters, then the objective function is a quadratic formand a least squares estimate can be obtained in one step (i.e. non-iteratively) usingstandard linear algebra techniques such as QR decomposition (e.g. Lawson &Hanson 1974). This is much faster than the iterative minimization routinesrequired for models that are nonlinear in parameters.
It seems difficult, however, to make use of this for ODE models, since even ifthe vector field f ðx; t;aÞ depends linearly on the parameters a, the solution uðtÞwill typically exhibit nonlinear dependence. This makes it difficult to benefitfrom the linearity of f for solution-based approaches employing ED or ~E.
On the other hand, if f ðx; t;aÞ is linear in a then for fixed uðrkÞ in equation(2.8) the error function EDðaÞ is a quadratic form that can be minimized in theusual way using QR decomposition. In other words, equation (2.8) is a linearleast squares problem in a. This makes methods based on EDðaÞ particularlyattractive for ODEs that are linear in parameters. However, as Baden &Villadsen (1982) point out, EDðaÞ is the wrong objective function from alikelihood point of view. Our aim here, therefore, is to use the similarity betweenEDðaÞ and EMðb;aÞ highlighted above to derive an algorithm that approximatelyminimizes ~Eðb;aÞ making full use of the linearity of f ðx; t;aÞ with respect to a.
(b ) Overview of the new algorithm
As already indicated above, if the data are known precisely, we can replaceEMðb;aÞ by EMðb;aÞ which is a linear least squares objective function. We canemploy the same principle even with noisy data if we replace xðtiÞ in EMðb;aÞ byour best available estimate of the noise-free data values. More generally, we donot need to restrict to just the observed time points ti. In particular, if we haveestimates uðrkÞ of the variables at the collocation points rk, we obtain themodified model error function
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
which yields the overall
Eðb;a; uÞZEDðbÞClEMðb;a; uÞ: ð3:2ÞMotivated by expectation–maximization methods used in the case of nuisanceparameters ormissingdata (Dempster et al. 1977;Moon1996),we canattempt to fitthe model by generating a sequence of better and better estimates uðmÞ. At eachstage, this estimate is substituted into E which is minimized with respect to ðb;aÞ.The resulting b is used to generate a new estimate uðmC1Þ using equation (2.3). Wethus alternate estimating the expected value of the nuisance parameter with aminimization of the parameter of interest. Note, however, that our algorithmdiffersfrom a conventional expectation–maximization method where we would onlyminimize a, whereas here we simultaneously minimize both a and b.
(c ) Description of the algorithm
We iteratively generate a sequence of estimates ðbðmÞ;aðmÞÞ with uðmÞ given by
uðmÞðtÞZXpC1
jZK1
bðmÞj BjðtÞ: ð3:3Þ
To do this, we proceed as follows.
(i) The iteration is initialized with a spline uð0Þ obtained by smoothing thedata. More precisely,uð0Þ is given by equation (3.3) with b(0) the minimumof EDðbÞ.
(ii) For each mZ1; 2;.; we obtain ðbðmÞ;aðmÞÞ from uðmK1Þ by minimizingEðb;a; uðmK1ÞÞ with respect to ðb;aÞ. This is a linear least squares problemwhich is carried out in one step using QR decomposition.
(iii) We define uðmÞ from bðmÞ using equation (3.3).
(iv) If for some preset tolerance d we have jjbðmÞKbðmK1Þjj!d then terminate,otherwise return to step (ii).
(d ) Properties of the solution
When the iteration terminates, we have bðmÞZbðmK1Þ to some pre-specifiednumerical tolerance. Thus
EðbðmÞ;a; uðmK1ÞÞZ ~EðbðmÞ;aÞ:Since ðbðmÞ;aðmÞÞ minimizes Eðb;a; uðmK1ÞÞ with respect to ðb;aÞ, we see that aðmÞ
is a minimum of ~EðbðmÞ;aÞ with respect to a. In general, it does not appear to be
possible to ensure that ~Eðb;aÞ is also minimized with respect to b but in practice
the distinction between EM and EM appears to have negligible effect.
(e ) Implementation
The algorithm was implemented in CCC, using the TNT library (Pozo 2004).A weight of lZ1 and a stopping tolerance of dZ10K8 were used, unless otherwisestated. In evaluating the accuracy of the final spline to the model, we used the
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
integral form of EÐZEMðb;aÞ, as defined in equation (2.7). This was evaluated
using Simpson’s rule with 10 000 steps (Press et al. 2002).
4. Numerical results
(a ) A test model
As a specific example, we consider a simple four-component model of the core ofp53 gene regulatory network; see appendix A and Brewer (2006). This ODEdescribes the behaviour of active ATM (a), p53 (z), active p53 (x) and MDM2( y) after a cell experiences DNA damage
da
dtZKDATMa;
dz
dtZ pp53KDp53zK k1yzK k 2az;
dx
dtZKDp53xK k1yxCk 2az and
dy
dtZ pMDM2Ck3xKDMDM2yK k4ay:
In total, the model depends linearly on its nine parameters DATM; DMDM2; Dp53;pMDM2; pp53; k1; k 2; k3 and k4. We generated a simulated dataset for the parametervalues given in table 4a using a fourth-order Runge–Kutta scheme with step size of10K8 implemented in CCC (Press et al. 2002). Different sized datasets werecreated by sampling from this set at fixed intervals.
To simulate measurement noise, we added an independent Gaussian error toeach data point. This had mean 0 and variance s2, with s a constant value for alldata points. A number of different noise levels were used, ranging from sZ0 tosZ0.06. For each noise level, we report results averaged over 1000 independentrealizations, each of which was fitted to the model.
(b ) Results: shooting method using Nelder–Mead optimization
We implemented a shooting algorithm approach in CCC using a Runge–Kutta algorithm with adaptive step size control as the integrator (Fehlberg 1968;Cash & Karp 1990) and a Nelder–Mead simplex method as the optimizationroutine (Nelder & Mead 1965; Press et al. 2002). A starting simplex wasconstructed around an initial parameter estimate P0 by
Pi ZP0Czei;
where ei is the unit vector in the ith coordinate direction. Four different choicesof initial guess were tried, as shown in table 1.
We used a stopping criterion based on the fractional range of the simplex
Table 1. The points in parameter space used as the initial starting point P0 for the Nelder–Meadoptimization. (The first of these (A) consists of the true parameter values used to generate the dataand was used as a stability check for the algorithm.)
Table 2. Parameter estimates obtained using a Nelder–Mead-based shooting method. The firstcolumn indicates the starting point P0 for the simplex as in table 1. (The dataset consisted of allfour variables sampled at 1052 time points with no noise added (so that sZ0). The notation ‘it’indicates the number of iterations before convergence and ‘LSQ’ the final least squares error ED.Implementation constants were zZ10 and hZ10K10.)
point DATM DMDM2 Dp53 pMDM2 pp53 k 1 k 2 k 3 k4 it LSQ
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
where h is the required accuracy; lh is the highest value of the objective functionamong the vertices of the simplex; and l l is the lowest value.
Table 2 shows the results of this method for the four different initial startingpoints from table 1. This demonstrates that the parameter space contains manylocal minima, and unless the algorithm is started close to the true value, it isdifficult to recover the correct parameter estimates. Using the Powellminimization method (Acton 1990), instead of Nelder–Mead, produced similarresults (data not shown). Multiple local minima are common when applyingparameter estimation to ODE models (Esposito & Floudas 2000), especiallywhen the model is nonlinear and there are a large number of parameters. In thiscase, using any kind of local method, including other approaches such asgradient- or Hessian-based methods, will not be sufficient unless the startingparameter estimates are close to their true values.
(c ) Results: shooting method using simulated annealing
An alternative is to use a global minimization method such as simulatedannealing (Metropolis et al. 1953; Kirkpatrick et al. 1983; Kirkpatrick 1984;Gershenfeld 1999 or Press et al. 2002), Markov Chain Monte Carlo (e.g. Gilkset al. 1996) or genetic algorithms (e.g. Gershenfeld 1999). Such methods all havesome possibility of moving to a worse solution during a systematic search of theparameter space and hence are able to escape out of local minima. Simulatedannealing is probably the oldest and best established of these methods.This global minimization method slowly ‘cools’ the system, where the
Table 3. Parameter estimation using shooting with simplex simulated annealing. (Initialtemperatures of 10, 100 and 1000 correspond to initial acceptance percentages of 52, 68 and 71%(based on 1000 proposed steps after an initial transient of 100 steps). The total number of steps waschosen for its computational feasibility: 106 steps can require several days of computer time.)
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
‘temperature’ determines the probability that a step in parameter space thatworsens the error is accepted. Generally, the step is determined by taking arandom point from a Gaussian distribution with the mean set at the currentpoint but here a refinement is implemented that uses the Nelder–Mead algorithmto propose the step (Cardoso et al. 1996; Kvasnicka & Pospichal 1997; Torreset al. 1997; Press et al. 2002). The cooling scheme is an important factor in theeffectiveness of the algorithm. Here we employ the one recommended by Presset al. (2002), taking TZT0ð1Kk=KÞ4 where T0 is the initial temperature, k isthe total number of moves so far and K is the estimated number of movesrequired.
The above scheme was implemented in CCC. A number of different choices ofcooling parameters (starting temperature and rate of cooling) and initialconditions were applied to the 1052 time point dataset with no measurementerror (sZ0) using a number of different G5-based computers with processorspeeds between 1.6 and 2.0 GHz (tables 3 and 5). Convergence to the globalminimum only occurred in approximately one-quarter of the runs and thesesuccessful runs all started with the same initial parameters. The duration of the
Table 4. Evaluation of the algorithm presented in §3. (a) True parameter values. (b) Percentageerror for parameter estimates from a 1052 time point dataset with pZ19, qZ1052, lZ1 anduZ0.464. (c) Parameter estimates from a six time point dataset when pZ19, qZ29 and uZ0.464,where uZlq/n.
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
runs was consistently high, almost always taking more than a day to converge. Inorder to provide the best chance of success, the test was performed with a largedataset; a reduction in the dataset would render the parameter space morecomplex and make convergence to the optimal solution harder.
We conclude that simulated annealing can give good parameter estimates,but requires a lot of time tuning the minimization algorithm for success.Even when it does work, it is slow and is at the limit of practicality oncurrent popular hardware. These conclusions will apply to any globalminimization method that relies on single shooting to determine the errorfunction. Multiple shooting (e.g. Kuznetsov 1995; Timmer et al. 2000) canhelp to regularize the parameter space and generally provides a more robustalgorithm, but in our experience still retains many of the same problems asordinary shooting.
(d ) Results: novel collocation scheme for ODEs linear in parameters
Finally, we applied the algorithm presented in §3 for a variety of combinationsof noise levels and sizes of dataset. When the dataset is relatively large (1052time points), we obtain accurate parameter estimates using 22 splines (pZ19).In this case, except for Dp53, all estimated parameters are within 0.1% of theirtrue values (table 4b). The solution splines are virtually indistinguishable fromthe true time course (data not shown) and satisfy the model to a high degree ofaccuracy (EÐZ2:38!10K5). The algorithm took approximately 2 min on a
2.0 GHz G5 Power Macintosh when pZ19, nZ1052, qZn, lZ1 and sZ0.06.This is significantly faster than the shooting methods above. Furthermore, thespeed improves considerably as the amount of data is reduced (12 s requiredwhen nZ106).
As the size of the dataset is decreased, the accuracy of the estimates alsodeclines (figure 1). This occurs because the larger the dataset, the moreconstrained the spline is and hence the closer it will be to the ‘true’ solution andthe more accurate the estimates will be. However, the loss of accuracy is minimaldown to approximately nZ150. Even with very small amounts of data, theestimates are still reasonably accurate; when nZ14 the error is less than 7%which is perfectly usable in many systems biology contexts. This behaviour wasconsistent for each of the parameter estimates, but there were orders ofmagnitude differences in the error, ranging from less than 10K5 to 15% whennZ14 (see figure 7 in appendix B). This may reflect the relative contributions a
Figure 1. The relationship between the amount of data and the accuracy of the parameterestimates. This plot shows the accuracy when applied to datasets with between 14 (the minimumpossible in this situation) and 1052 time points, with pZ19, qZn, lZ1. Plots for individualparameters are given in figure 7 in appendix B.
D. Brewer et al.534
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
parameter has on the model solution. We used qZn throughout as preliminaryexperiments showed that when there was no error in the data, this consistentlyproduced good results in the minimum amount of processing time.
We next added varying levels of noise to the data, and found that thealgorithm continues to perform well (figure 2). The mean estimate generallyoccurs close to the true value and is always within one standard error. As sincreases, the estimates shift away from the true parameter value and this shift issignificantly greater in some parameters than others. As the error in the dataincreases, the positional constraints of the spline (i.e. the data points) lessresemble the true solution and so the optimal spline is likely to deviate, which inturn produces less accurate estimates.
The performance of the algorithm depends on a number of adjustable factors,including p, q and l. It is beyond the scope of this paper to look at these factors indetail (Brewer 2006), but here we will briefly examine two situations where theiroptimization is beneficial. When the amount of error in the data is large, it is nolonger appropriate to give equal weight to the spline being close to the data andthe spline satisfying the model. This is because the solution spline is unable torepresent the model solution accurately and so poor parameter estimates willresult. More weight can be placed on the spline satisfying the model by increasingthe number of collocation points and/or increasing l. This gives an improvementin the spline quality (figure 3) and hence the parameter estimates. Increasing thenumber of collocation points has the additional benefit of spreading the positionswhere the model needs to be satisfied, which is important where the amount ofdata is small. There is a limit to how much additional weight can be used: if it istoo large, the procedure fails to converge. In this case, the solution spline movesaway from the data points, becoming a worse estimate of the model solution ateach iteration. This occurs because the data no longer have a strong enough
Figure 2. (a,b) The effect of increasing measurement error on the accuracy of parameter estimates.Results are based on 1000 independent noise realizations with pZ19, qZnZ106, lZ1. The errorbars indicate the standard error of the parameter estimates. A s of 0.06 corresponds toapproximately 75% relative error for active p53 and 12% for the other components. For results ofthe remaining parameters see figure 6 in appendix B.
535Fitting short time course data
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
constraining influence on the spline. When there is a small amount of data, it isdifficult to get reasonable parameter estimates, but with the appropriateoptimization of the factors, it is possible to get good results (table 4c andfigure 4). At such low amount of data, convergence is extremely sensitive to thefactor values.
5. Discussion
Estimation of parameters in ODE models is of considerable importance in manymodelling fields. Increasingly in systems biology, this needs to be done for veryshort time courses with high levels of noise on the data. Traditional algorithmsare poorly suited to such problems.
Figure 3. The relationship between model error EÐ and (a) the number of collocation points q ; and(b) the ratio uZlq=n. Results are based on 1000 independent noise realizations with pZ19, sZ0:06and nZ106. In (a) we have lZ1 and in (b) qZ500. Error bars indicate standard deviations.
D. Brewer et al.536
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
We have presented a summary of the various approaches to parameter fittingin ODEs. We have then introduced a new algorithm for the case of modelslinear in parameters. It employs a spline-based collocation scheme andalternates linear least squares minimization steps with repeated estimates ofthe noise-free values of the variables. This has the advantage that fast,established linear algebra solvers can be used to provide optimal parametervalues. The proposed procedure also avoids the problems of the model becomingstiff which can hamper shooting-based methods, in particular, when applied tomodels that are linear in their parameters. Additionally, the proposedprocedure is effective at dealing with large-scale models and does not requirethe estimation of initial conditions.
Figure 4. The solution spline for active p53 produced by the algorithm on a small amount of datawith nZ6 and either pZ19, qZ29 and uZ0:464 or pZ11, qZ14 and uZ1:09.
537Fitting short time course data
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
We have shown that the proposed procedure produces reasonable estimates,even at low amounts of data and quite high amounts of error. The accuracy of theproposed procedure depends on how close the intermediary spline is to the truemodel solution. At low amounts of data and high error, there is less validinformation to constrain the spline and so the spline accuracy and hence theestimates suffer. By optimizing the algorithm factors, in particular the weightplaced on the spline satisfying the model, it is possible to significantly improvethe accuracy of the spline and hence the estimates.
In comparison with single shooting methods, there are many advantages to ournew algorithm. Firstly, it is more accurate in converging to reasonable parameterestimates across a wide range of dataset sizes and amounts of error. Many singleshooting global optimization methods rely on probabilistic steps but this canoften result in inconsistent results. Secondly, our procedure is considerablyfaster. This arises because the problem is linearized so that efficient linear algebrasolvers can be used and by using splines the model has been effectivelydiscretized, negating the need for costly integration. Also, our procedure issimple, with only three key factors that can be varied. In comparison, simulatedannealing requires numerous algorithm parameters to be optimized. Finally, thistechnique neither requires the estimation of parameter values to seed theoptimization nor the initial conditions of the dynamic system. This simplifies therequirements to get reasonable parameter estimates, which are of particularconcern at low amounts of data or when not much is known about the systemin advance.
Despite our procedure being limited to models that are linear in theirparameters, it is still a fast and useful tool that is applicable in many areas ofmodelling. Furthermore, there is potential for this method to be applied tosimplifications of more complex problems to provide good initial estimates thatcan then be refined using a more complex optimization methods applied to thefull model.
Figure 5. A schematic of a simple model of the core of the p53 gene regulatory network (equation(A 1)). ki are interaction rate constants that indicate the strength of the interaction between thetwo components joined by the arrow.
D. Brewer et al.538
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
D.M.B. held a CHRAT studentship by the ICH and M.B. was supported by UK Biotechnology andBiological Sciences Research Council (BBSRC) Exploiting Genomics Initiative grant(39/EGM16102). J.S. is supported by the BBSRC via the Centre for Integrative Systems Biologyat Imperial College (CISBIC), BB/C519670/1.
Appendix A. p53 model
p53 is a tumour-suppressor and has been described as the ‘guardian of thegenome’ (Lane 1992). It is part of a complex and extensive gene regulatorynetwork that integrates a variety of stress signals to produce a range ofeffects including apoptosis, growth arrest and DNA damage repair. Ofparticular importance is p53’s role in the decision to commence apoptosis,which is not well understood. p53 is known to play a vital role in preventingcancer; p53 is dysfunctional in the majority of cancer types (Soussi et al.2000) and more than 18 000 different p53 mutations have been found incancers (Bode & Dong 2004).
A simple model is proposed that includes the main protein regulatoryinteractions of the p53 network (figure 5). The following interactions aremodelled: through phosphorylation, active ATM enables the stabilization andhence activation of p53 (k 2); ATM phosphorylates MDM2 compromisingMDM2’s ability to ubiquitinate and bind p53, hence ATM increases the rate atwhich MDM2 is inactivated/degraded (k4); active p53 transcribes MDM2 (k3);and MDM2 encourages the degradation of both forms of p53 throughubiquitination and also prevents p53 acting as a transcription factor bybinding (k1). Active ATM drives the system and it is assumed that at time tZ0the active ATM level has been ‘kicked’ to a value away from equilibrium andthis decays exponentially according to the rate constant, DATM. The modelODEs are
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
ðA 1)
where
a Z active ATM concentration
z Z inactive p53 concentration
x Z active p53 concentration
y ZMDM2 concentration
ki Z interaction rate constant i
pq Z basal production rate of q
Dq Z basal degradation rate of q:
The following simplifications have been made.
—The path containing CHK2 was removed. This is because ATM/CHK2/p53 duplicates the behaviour of the more direct ATM/p53 and so the effectof the CHK2 pathway can be included in the interaction between activeATM and p53.
—Once MDM2 has bound to ARF or become inactivated through phosphoryl-ation, it is removed from the system (in effect degraded). This means thatonly one component of MDM2 (the active form) is required.
—Active ATM is the only protein that can convert p53 into its active state andthere is no ‘basal’ rate of activation.
— If MDM2 interacts with both inactive p53 and active p53, then it degradesthem at an equal rate.
— p53 forms a tetramer when activated before it can perform its function as atranscription factor. The details of this mechanism are ignored.
Table 5. The results from simulated annealing with downhill simplex parameter estimation. A range of initial temperatures, estimated counts andinitial points were used.
Figure 6. The effect of increasing measurement error on the accuracy of parameter estimates.Results are based on 1000 independent noise realizations with pZ19, qZnZ106, lZ1. The errorbars indicate the standard error of the parameter estimates. A s of 0.06 corresponds toapproximately 75% relative error for active p53 and 12% for the other components. (a) DATM, (b)DMDM2, (c) Dp53, (d ) pMDM2, (e) pp53, ( f ) k1, (g) k2, (h) k3 and (i ) k4.
541Fitting short time course data
Phil. Trans. R. Soc. A (2008)
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
0 200 400 600 800 1000no. of time points in dataset
(g)
0 200 400 600 800no. of time points in dataset
0.0001
0.001
0.01(h)
Figure 7. The relationship between the parameter estimates produced by the method in §3 and theamount of data ( pZ19, qZn, and lZ1).(a) DATM, (b) DMDM2, (c) Dp53, (d ) pMDM2, (e) pp53, ( f )k1, (g) k2, (h) k3 and (i ) k4.
D. Brewer et al.542
Phil. Trans. R. Soc. A (2008)
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
Lane, D. P. 1992 Cancer. p53, guardian of the genome. Nature 358, 15–16. (doi:10.1038/
358015a0)
Lawson, C. L. & Hanson, R. J. 1974 Solving least squares problems. Upper Saddle River, NJ:
Prentice-Hall.
Li, Z., Osborne, M. R. & Prvan, T. 2005 Parameter estimation of ordinary differential equations.
IMA J. Numer. Anal. 25, 264–285. (doi:10.1093/imanum/drh016)Marquardt, D. W. 1963 An algorithm for least-squares estimation of nonlinear parameters. SIAM
on October 16, 2012rsta.royalsocietypublishing.orgDownloaded from
Metropolis,N.,Rosenbluth,A.W.,Rosenbluth,M.N.,Teller,A.H.&Teller, E. 1953Equations of statecalculations by fast computing machine. J. Chem. Phys. 21, 1087–1091. (doi:10.1063/1.1699114)
Moon, T. 1996 The expectation–maximization algorithm. Signal Process. 13, 4760.Nelder, J. A. & Mead, R. 1965 A simplex method for function minimization. Comput. J. 7,
308–313.Pozo, R. 2004 Template numerical toolkit. See http://math.nist.gov/tnt.Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. 2002 Numerical recipes in
CCC: the art of scientific computing, 2nd edn. Cambridge, UK: Cambridge University Press.Ramsay, J. O., Hooker, G., Campbell, D. & Cao, J. 2007 Parameter estimation for differential
equations: a generalized smoothing approach. J. R. Stat. Soc. B 69, 741–796. (doi:10.1111/j.1467-9868.2007.00610.x)
Seinfeld, J. H. 1969 Identification of parameters in partial differential equations. Chem. Eng. J24, 65–74. (doi:10.1016/0009-2509(69)80009-6)
Soussi, T., Dehouche, K. & Beroud, C. 2000 p53 website and analysis of p53 gene mutations inhuman cancer: forging a link between epidemiology and carcinogenesis. Hum. Mutat. 15,105–13. (doi:10.1002/(SICI)1098-1004(200001)15:1!105::AID-HUMU19O3.0.CO;2-G)
Stark, J., Callard, R. & Hubank, M. 2003a From the top down: towards a predictive biology ofgene networks. Trends Biotechnol. 21, 290–293. (doi:10.1016/S0167-7799(03)00140-9)
Stark, J., Brewer, D., Barenco, M., Tomescu, D., Callard, R. & Hubank, M. 2003bReconstructing gene networks: what are the limits? Biochem. Soc. Trans. 31, 1519–1525.
Swartz, J. & Bremermann, H. 1975 Discussion of parameter estimation in biological modelling:Algorithms for estimation and evaluation of the estimates. J. Math. Biol. 1, 241–257. (doi:10.1007/BF01273746)
Timmer, J., Rust, H., Horbelt, W. & Voss, H. U. 2000 Parametric, nonparametric andparametric modelling of a chaotic circuit time series. Phys. Lett. A 274, 123–134. (doi:10.1016/S0375-9601(00)00548-X)
Tjoa, I. B. & Biegler, L. T. 1991 Simultaneous solution and optimization strategies forparameter-estimation of differential-algebraic equation systems. Ind. Eng. Chem. Res. 30,376–385. (doi:10.1021/ie00050a015)
Torres, F. M., Agichtein, E., Grinberg, L., Yu, G. W. & Topper, R. Q. 1997 A note on theapplication of the “Boltzmann simplex”—simulated annealing algorithm to globaloptimizations of argon and water clusters. Theochem.: J. Mol. Struct. 419, 85–95. (doi:10.1016/S0166-1280(97)00195-4)
van den Bosch, B. & Hellinckx, L. J. 1974 A new method for estimation of parameters indifferential equations. AIChE J. 20, 250–256. (doi:10.1002/aic.690200207)
Varah, J. M. 1982 A spline least squares method for numerical parameter estimation indifferential equations. SIAM J. Sci. Comput. 3, 28–46. (doi:10.1137/0903003)
Villadsen, J. V. & Stewart, W. E. 1967 Solution of boundary-value problems by orthogonalcollocation. Chem. Eng. Sci. 22, 1483–1501. (doi:10.1016/0009-2509(67)80074-5)
Wang, F. S. 2000 A modified collocation method for solving differential-algebraic equations.Appl. Math. Comput. 116, 257–278. (doi:10.1016/S0096-3003(99)00138-1)