1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika, pp. 1–20 C 2007 Biometrika Trust Printed in Great Britain Inferring Stochastic Dynamics from Functional Data By Nicolas Verzelen, Institut National de recherche Agronomique, 2, place Pierre Viala F-34060 Montpellier, FRANCE. [email protected]Wenwen Tao and Hans-Georg M¨ uller Department of Statistics, University of California, Davis, One Shields Avenue, Davis, California, 95616, U.S.A. [email protected][email protected]Summary In most current data modelling for time-dynamic systems, one works with a pre- specified differential equation and attempts to fit its parameters. In contrast, we demon- strate that in the case of functional data, the equation itself can be inferred from the data. Assuming only that the dynamics are described by a first order nonlinear differ- ential equation with a random component, we obtain data-adaptive dynamic equations from the observed data via a simple smoothing-based procedure. We prove consistency and introduce diagnostics to ascertain the fraction of variance that is explained by the deterministic part of the equation. This approach is shown to yield useful insights into the time-dynamic nature of human growth. Some key words: Empirical Dynamics, Functional Data Analysis, Goodness of Fit, Growth Curves, Smoothing 1. Introduction In recent years, there has been increasing interest in fitting nonlinear differential equa- tions to data arising in engineering, economics or biology. A major motivation is to understand the dynamics underlying physical or biological processes (Holte et al., 2006; Perelson et al., 1997) or to predict the future behavior of such systems from current observations. These challenges arise in growth studies (Gasser et al., 1984), where, in addition to scientific interest in understanding the dynamics of human growth by study- ing how growth velocity relates to current age and current height, differential equation models can also be used to assess clinical aspects of a child’s growth patterns. A differen- tial equation model that fits the data can be applied to predict the size of the derivative of growth for a healthy child that is low on height for current age. This predicted derivative can then be checked against the observed derivative for monitoring purposes. Substantial work has been devoted to parametric estimation procedures for dynamic systems (Bellman & Roth, 1971; Brunel, 2008; Liang & Wu, 2008; Ramsay et al., 2007). These, and also recent semiparametric approaches (Chen & Wu, 2008; Paul et al., 2011) for modelling dynamic systems, rely on the fact that a pre-specified non-random differen-
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
In most current data modelling for time-dynamic systems, one works with a pre-specified differential equation and attempts to fit its parameters. In contrast, we demon-strate that in the case of functional data, the equation itself can be inferred from thedata. Assuming only that the dynamics are described by a first order nonlinear differ-ential equation with a random component, we obtain data-adaptive dynamic equationsfrom the observed data via a simple smoothing-based procedure. We prove consistencyand introduce diagnostics to ascertain the fraction of variance that is explained by thedeterministic part of the equation. This approach is shown to yield useful insights intothe time-dynamic nature of human growth.
Some key words: Empirical Dynamics, Functional Data Analysis, Goodness of Fit, Growth Curves,Smoothing
1. Introduction
In recent years, there has been increasing interest in fitting nonlinear differential equa-tions to data arising in engineering, economics or biology. A major motivation is tounderstand the dynamics underlying physical or biological processes (Holte et al., 2006;Perelson et al., 1997) or to predict the future behavior of such systems from currentobservations. These challenges arise in growth studies (Gasser et al., 1984), where, inaddition to scientific interest in understanding the dynamics of human growth by study-ing how growth velocity relates to current age and current height, differential equationmodels can also be used to assess clinical aspects of a child’s growth patterns. A differen-tial equation model that fits the data can be applied to predict the size of the derivative ofgrowth for a healthy child that is low on height for current age. This predicted derivativecan then be checked against the observed derivative for monitoring purposes.
Substantial work has been devoted to parametric estimation procedures for dynamicsystems (Bellman & Roth, 1971; Brunel, 2008; Liang & Wu, 2008; Ramsay et al., 2007).These, and also recent semiparametric approaches (Chen & Wu, 2008; Paul et al., 2011)for modelling dynamic systems, rely on the fact that a pre-specified non-random differen-
tial equation actually applies to the data. However, this is often not the case, particularlyin the study of dynamics that are repeatedly observed for many subjects or experiments.There are two major reasons for discrepancies between stipulated dynamic models andactual behavior of systems. First, differential equation models have been traditionallyaccepted and based on their inherent plausibility and concordance with presumed un-derlying mechanisms. All too often, this leads to models that actually do not fit thedata well (Hooker, 2009), because the presumed underlying mechanisms that the modelreflects are not well understood or do not provide good approximations to the actualmechanisms. Second, deterministic models rarely provide satisfactory fits to phenomenathat are inherently stochastic in nature, because the dynamics vary across subjects orexperiments. Dynamics of viral level in HIV studies (Miao et al., 2009) that are subject-specific and the dynamics of auction price trajectories (Reddy & Dass, 2006; Wang et al.,2008) provide examples of this difficulty. In such cases, subject-specific effects come intoplay that cannot be controlled for, and it is then not reasonable to expect a deterministicdynamic equation to provide a good fit across subjects.
All of this motivates an alternative bottom-up approach, namely to directly obtaininformation about underlying dynamic systems from repeated observations of the trajec-tories that result from the dynamics, in contrast to the customary top-down approachof a priori postulating what the dynamic equations should be. Our aim thus is to derivedifferential equations from functional data, i.e., learning these equations from observingmany realizations of the trajectories that they generate. To allow for random variationbetween subjects, it is necessary to add stochastic elements to a deterministic equation.For this, inclusion of an additional stochastic drift process is expedient.
Nonparametric analysis of stochastic differential equations has been previously studiedfor diffusion processes (Hoffmann, 1999; Jacod, 2000), with solutions that are versionsof Brownian motion and have non-differentiable trajectories. As growth and many otherdynamic phenomena are usually considered to be quite smooth, the stochastic differentialequation approach is not useful for most non-financial data. Recently, Muller & Yao(2010) have investigated an empirical dynamic approach, where one determines lineardynamics empirically from a sample of trajectories. Specifically, each trajectory of adifferentiable Gaussian process is shown to satisfy a first order linear differential equation,which can be determined for various types of longitudinal data by suitable estimationprocedures. However, this approach does not extend to nonlinear dynamic systems ornon-Gaussian processes.
Here we show that each trajectory of a smooth stochastic process X satisfies a firstorder nonlinear differential equation with a random component, where the stochasticpart is an additive smooth drift process Z. We call this representation of the process thedata-driven differential equation. The variance of the process Z determines to what extentthe process X is driven by the deterministic part of the differential equation. Wheneverthe variance of the drift Z is small in comparison to the variance of X, a deterministicversion of the differential equation explains most of the observed behavior of the process.Obtaining data-driven dynamics reveals underlying mechanisms generating the observedfunctional data and provides diagnostic tools for assessing the linearity of the dynamics orthe quality of a parametric fit. Implementation proceeds via a two-step kernel estimationprocedure, which we show to be consistent. We illustrate the method by constructingthe data-driven differential equation governing the growth of children for the BerkeleyGrowth Study.
We conclude this section by describing the data structure of the available observationsfrom which the dynamics will be learned. Given n realizations Xi of the underlying pro-cess X on a domain T , we assume that Ni measurements Yij (i = 1, . . . , n, j = 1, . . . , Ni),where N = infi=1,...,nNi, are obtained at times tij according to
Yij = Yi(tij) = Xi(tij) + ǫij . (1)
Here ǫij are zero mean independent identically distributed measurements errors withfinite and constant variance var(ǫij) = σ2, independent of all other random components.The design points tij are considered deterministic and densely spaced. This model reflectstypical measurements obtained in growth studies.
2. Data-driven differential equation
In the following we consider a differentiable stochastic process X(t) such that X andits derivative X ′ are square integrable. A simple representation of the derivative processis to decompose it into a mean function µX′ and a mean zero stochastic process Z1,
X ′(t) = µX′(t) + Z1(t). (2)
Nonparametric estimation of individual derivative trajectories and of µX′ provides data-driven descriptions (Gasser & Muller, 1984; Gasser et al., 1984; Mas & Pumo, 2009).
Considering a dynamic equation that captures the relationship between the processX(t) and its derivativeX ′(t), the simplest such relation is a linear relationship betweenX ′
and X. The corresponding linear empirical dynamics is a natural approach for Gaussianprocesses, since the joint Gaussianity ofX andX ′ implies that there exists a deterministicfunction β with
X ′(t) = µX′(t) + β(t)X(t)− µX(t)+ Z2(t). (3)
Here Z2 is a zero mean drift process with covZ2(t), X(t) = 0, implying independencebetween X and Z2 in the Gaussian case (Muller & Yao, 2010).
Many complex biological processes, including growth, cannot be expected to be ade-quately represented by linear dynamics. For more complex dynamics, it is therefore ofinterest to model the dynamics of X with a nonlinear differential equation. There alwaysexists a function f with
EX ′(t) | X(t)(t) = ft,X(t), X ′(t) = ft,X(t)+ Z(t) , (4)
with EZ(t) | X(t) = 0 almost surely. When f is unknown and is determined fromthe data, (4) is a data-driven nonlinear differential equation. The function f and theproperties of the drift process Z determine the underlying non-linear dynamics. In someapplications, comparisons with the special case of a simpler autonomous system
EX ′(t) | X(t) = f1X(t), (5)
for a function f1, which is time-independent, are of interest.Parametric differential equations with random effects provide alternatives to modelling
with Equation (4). Upon integration, these become nonlinear random effects models,which are difficult to fit, especially if they contain many random effects. A typical exampleis the nonlinear Preece–Baines model (Preece & Baines, 1978) for human growth, whichcan be derived from a non-autonomous differential equation. Such nonlinear models arenearly always fitted by least squares separately for each child, not taking advantage of
the availability of a sample of growth curves and not including any random effects. Thesemodel fits are usually not efficient and have been shown to be inferior to nonparametricsmoothing and differentiation methods in Gasser et al. (1984). These parametric growthmodels can be expressed in the form of the proposed general equation X ′(t) = ft,X(t),which thus provides a general and flexible framework that is informed by all data in thesample. As is typical for the life sciences, for growth data the nature of the underlyingdynamics is largely unknown. The popular Preece-Baines model and related models havebeen derived purely based on data fitting considerations, while the model parameters arenot interpretable (Hansen et al., 2003).
Models (2), (3) and (4) are characterized by increasing complexity, as
varZ(t) ≤ varZ2(t) ≤ varZ1(t) = varX ′(t),
by definition of these drift processes. This means that the dynamic behavior of the pro-cess X is better predictable by the data-driven nonlinear differential equation (4), whencompared with the empirical linear differential equation (3). If varZ(t) = varZ2(t),there is no gain in adopting a non-linear as compared to a simpler linear differentialequation, but there can be substantial gains when the variance of Z(t) is strictly smallerthan the variance of Z2(t). Thus, the estimation of a data-driven nonlinear differentialequation also can be used to assess the linearity of the underlying dynamics.
3. Estimating the components of data-driven non-linear dynamics
3·1. Estimation of the deterministic component
We adopt a two-step kernel smoothing approach to obtain an estimator f of the deter-ministic part of the nonlinear differential equation (4), corresponding to the function f ,which from now on we assume to be a smooth function. This two-step procedure proceedsfrom the same ideas as the method of Ellner et al. (2002) for autonomous dynamics.
Step 1: Obtaining the trajectories of X(t) and X ′(t).For any i = 1, . . . , n, we estimate the trajectory Xi(t) and its derivative X ′
i(t) by aconvolution kernel smoothing method (Gasser et al., 1984). Using a nonnegative sym-metric kernel function K and an antisymmetric kernel function with one sign change K2
for derivative estimation, such that∫K(u)du = 1,
∫K2(u)du = 0 and
∫K2(u)udu = 1,
these estimates are
Xi(t) =1
hX
Ni∑
j=1
∫ sj
sj−1
YijK
(u− t
hX
)du, (6)
X ′i(t) =
1
h2X′
Ni∑
j=1
∫ sj
sj−1
YijK2
(u− t
hX′
)du, (7)
where sj = (tij + ti,j+1)/2 and hX > 0 and hX′ > 0 are smoothing bandwidths.
Step 2: Estimation of f .
Trajectory estimates X(t) and X ′(t) from Step 1 are combined to obtain a Nadaraya–
utilizing bandwidths bX > 0.When estimators (6), (7) are supplemented with suitably chosen boundary kernels for
estimating the regression function near endpoints of the domain of X (Jones & Foster,1996; Muller, 1991), these convolution kernel estimates are equivalent to fitting local
linear estimates for Xi(t), taking the intercept as estimator, and to fitting local quadratic
estimates for X ′(t), taking the linear term as estimator (Fan & Gijbels, 1996; Muller,1987). Thus, one can conveniently implement these estimators by local polynomial fitting.
3·2. Decomposition of variance
By definition (4) of the differential equation, we have the following decomposition ofvariance,
varX ′(t) = var[ft,X(t)] + varZ(t). (9)
Therefore, on subdomains where the variance of the drift process varZ(t) is small,the solution of (4) will not deviate much from the solution that is obtained with thedeterministic approximation
X ′(t) = ft,X(t) (t ∈ T ), (10)
that corresponds to the population equation. In this situation, the future changes ofindividual trajectories are easily predictable. This motivates to consider the fractionof the variance of X ′(t) that is explained by the deterministic part of the data-drivendifferential equation itself as a key quantity for assessing the predictability of the process,leading to a coefficient of determination
R2(t) =var[ft,X(t)]
varX ′(t)= 1−
varZ(t)
varX ′(t). (11)
It is of interest to locate subdomains of T where R2(t) is large. On such subdomains,the drift process is small compared to X ′(t). An obvious estimate for the coefficient ofdetermination R2(t) is obtained by plugging in estimates of the unknown quantities,yielding
R2(t) = 1−
∑ni=1
[X ′
i(t)− ft, Xi(t)]2
∑ni=1
X ′
i(t)− X ′(t)2 . (12)
The coefficient of determination R2(t) assesses the fraction of X ′(t) explained by thedeterministic differential at a given time t. However, for some processes the predictabilityof the process may depend on the time t and on the position x of the process. Consideringthe nonlinear regression model (4), we define the dynamic signal over noise ratio S(t, x)by
Obviously, S(t, x) lies between 0 and 1. When S(t, x) is close to one, then f2(t, x) is largecompared to varZ(t) | X(t) = x and the process is well predictable when X(t) = x. Incontrast, small values of S(t, x) indicate that the variability of Z(t) given X(t) = x islarge. The functions S quantify the predictability of X as a function of the level of theprocess at time t.
By plugging in the estimate f(t, x) for f(t, x), one obtains the estimator given by
S(t, x) =f2(t, x)
E[X ′2(t) | X(t) = x], E[X ′2(t) | X(t) = x] =
∑ni=1K Xi(t)−x
bXX ′
2
i (t)
∑ni=1K Xi(t)−x
bX
.
(14)
3·3. Applying data-driven nonlinear dynamics for goodness-of-fit
It is of interest to determine whether linear dynamics, implied by Gaussianity of theunderlying processes, suffices to describe the dynamics, or whether a more complex non-linear model is needed, reflecting increased complexity. A simple diagnostic of this canbe obtained by comparing the variance of the drift process Z(t) of the nonlinear dynamicmodel (4) with that of the drift process Z2(t) of the linear dynamic model (3), as follows.
For the coefficient of determination for the linear empirical dynamic model (3),
R2L(t) =
var β(t)X(t)
varX ′(t)= 1−
varZ2(t)
varX ′(t), (15)
one expects that R2(t) ≥ R2L(t). Similar to equation (12),
R2L(t) = 1−
∑ni=1
X ′
i(t)− β(t)Xi(t)2
∑ni=1
X ′
i(t)− X ′(t)2 , (16)
where we note that both R2(t) in (12) and R2L(t) in (16) might be negative when the
fits are bad. On subdomains of T where R(t) is close to RL(t), varZ(t) is close tovarZ2(t) and one may infer that the data-driven differential equation is almost linear,so Equation (3) provides a simpler description.
On subdomains where the diagnostic function R(t)−RL(t) is large, the linear differen-tial equation (3) is probably insufficient to provide a good description of the underlyingdynamics, and then one would then choose the data-driven non-linear dynamic model(4). Equation (4) can be written as an integral equation, and a solution can be obtainedby numerical integration of the equation, given an initial value and a realization of thedrift process Z.
4. Asymptotic properties
4·1. Assumptions
In the following, we describe consistency results for the estimation of the smoothbivariate function f that determines the deterministic part of the proposed data-drivendynamic model (4) and for the estimate (12) of the fraction of variance explained attime t. In the sequel, g(t, x) denotes the density of the random variable X(t) at x. Theassumptions C.1–7 are listed below.
C.1 The kernels K and K2 have a compact support [−1, 1] and are Lipschitz continuous
with respective constants µK and µK′ . Moreover,K is positive and satisfies∫ 1−1K(u)du =
1,∫ 1−1K(u)udu = 0 and
∫ 1−1K(u)u2du 6= 0. The kernel K2 satisfies
∫ 1−1K2(u)du = 0,∫ 1
−1K2(u)udu = 1,∫ 1−1K2(u)u
2du = 0 and∫ 1−1K2(u)u
3du 6= 0.C.2 The random function X is almost surely three times continuously differentiable andfor all t ∈ T , |X(t)| ≤ C0, |X
′(t)| ≤ C1, |X(2)(t)| ≤ C2 and |X(3)(t)| ≤ C3 almost surely.
C.3 The random variables ǫij (i = 1, . . . , n; j = 1, . . . , N) are centered and have a finitemoment of order 8.C.4 The functions f(t, ·) and g(t, ·) are Lipschitz with constants µf and µg, twice con-tinuously differentiable and have compact support.C.5 The conditional variance s(t, u) = varX ′(t) | X(t) = u is continuous and is non-zero.C.6 We have (N,n) → ∞ and (bX , hX , hX′) → 0 such that nbX ≥ log2 n → ∞, NhXb4X ≥1, Nh3X′ → ∞ and hX ≤ bX .C.7 There exists a constant C > 0 such that g(t, x) > C for any x ∈ [x1;x2].
4·2. Results
Theorem 1. Under assumptions C.1–6, for any t ∈ T and x such that g(t, x) 6= 0,
E
[f(t, x)− f(t, x)
2]= O
(b4X +
h4Xb2X
+ h4X′ +σ2
NhXb2X+
1
nbX+
σ2
Nh3X′
). (17)
With suitable choices of the bandwidths bX , hX , and hX′ , one obtains
Ef(t, x)− f(t, x)2 = Omax
(N−8/15, n−4/5
). (18)
If n ≤ N2/3, the classical convergence rate n−4/5 for nonparametric regression is ob-tained. Conversely, when n ≥ N2/3, the estimation error in Xi is no more negligible andthe lower bound N on the number of measurements per curve becomes the limitingquantity for the convergence rate.
Regarding R2(t), the rate of convergence of R2(t) depends on that of f(t, ·) nearthe boundary of the support of g(t, ·), where there are few observations. Therefore, weconsider bounded domains for asymptotic study. For positive numbers x1 and x2 in thesupport of g(t, ·), define
R2x1,x2
(t) =var [f t,X(t) | x1 ≤ X(t) ≤ x2]
varX ′(t) | x1 ≤ X(t) ≤ x2= 1−
varZ(t) | x1 ≤ X(t) ≤ x2
varX ′(t) | x1 ≤ X(t) ≤ x2, (19)
so that R2x1,x2
(t) quantifies the ratio of these variances when X(t) is conditioned to lie
between x1 and x2. With nx1,x2= #i : x1 ≤ Xi(t) ≤ x2, we estimate R2
Corollary 1. Under assumptions C.1–6, for the dynamic signal/noise ratio (13),
S(t, x) = Op
(b2X +
h2XbX
+ h2X′ + (nbX)−1/2 +1
(NhX)1/2bX+
1
N1/2h3/2X′
).
5. Nonlinear concurrent model
Our methodology provides an estimation procedure for a nonlinear version of theconcurrent model, also known as varying-coefficient model (Chiang et al., 2001). We aimat investigating the relationship between two stochastic processes X(t) and U(t) at eachtime t ∈ T . The linear concurrent model captures a linear relationship between X andU through a deterministic function β(t),
U(t) = µU (t) + β(t)X(t)− µX(t)+ Z2(t), (21)
where Z2(t) is a Z2 is a zero mean drift process with covZ2(t), X(t) = 0. Versions of thisfunctional linear varying coefficient linear model were mentioned in Ramsay & Silverman(2005) and estimators and asymptotics were studied in Senturk & Muller (2010).
Our methodology covers the more general situation where the link between U(t) andX(t) is nonlinear, i.e., where one has a smooth function f(·, ·) and a drift process Z(t)such that
U(t) = ft,X(t)+ Z(t) , (22)
with EZ(t) | X(t) = 0 almost surely and ft,X(t) = EU(t) | X(t). This nonlinearvarying coefficient model can be studied with the methods that we have developed forthe nonlinear dynamic model (4).
Given n realizations Xi and Ui of the underlying processes X and U on a domainT , we assume that N noisy measurements Yij and Vij (i = 1, . . . , n, j = 1, . . . , N) havebeen obtained at times tij analogously to (1). Following the arguments of Section 3·1,we propose a two-step estimator. For any i = 1, . . . , n, we first estimate the trajectoryXi(t) and Ui(t) with a convolution kernel K with bandwidths hX and hU . Then, using
another bandwidth bX , these trajectory estimates Xi(t) and Ui(t) step are combined toobtain
f(t, x) =
∑ni=1K Xi(t)−x
bXUi(t)
∑ni=1K Xi(t)−x
bX
.
Arguing as for the estimation of the non linear dynamic, we obtain the rate of convergencefor f .
Corollary 2. Suppose that assumptions D.1–6 in the Appendix hold. For any t ∈ Tand any x such that g(t, x) 6= 0
E
[f(t, x)− f(t, x)
2]= O
(b4X +
h4Xb2X
+ h4U +σ2
NhXb2X+
1
nbX+
σ2
Nh3U
).
With suitable choices of the bandwidths bX , hX , and hU , one obtains
As before, one can compute a coefficient of determination
R2(t) =var[ft,X(t)]
varU(t)= 1−
varZ(t)
varU(t),
to decompose the variance of U(t) into a part explained by the model and a part leftunexplained.
6. Nonlinear dynamics of human growth data
The proposed model and estimation procedures can be used to illuminate the dynam-ics of human growth. We illustrate the nonlinear differential equation in (4) using theBerkeley Growth Study (Jones & Bayley, 1941), in which, the heights of 54 girls and 39boys from 1–18 years were recorded. Since male and female growth patterns differ sub-stantially, with girls entering puberty much earlier than boys (Tanner et al., 1966), wefocus on girls only. For each of the 54 girls in the study, 31 measurements are available,which were recorded at different time intervals, ranging from three months (from 1 to2 years old), six months (from 8 to 18 years old), to one year (from 3 to 8 years old).The purpose of characterizing the dynamics of human growth and especially the timedomains where the dynamics is nonlinear is twofold . First, it allows us to gain a betterunderstanding of the growth process. Second, it of clinical interest to distinguish betweennormal and pathological patterns of development.
In order to estimate the data-driven differential equation, we apply the two-step proce-dure described in Section 3.1, which is implemented through local weighted least-squaresmethods (Fan & Gijbels, 1996) with a Gaussian kernel K. For t ∈ [0, 18], we obtain es-
timates Xi(t) = ai0(t), where
(ai0, ai1)(t) = argmina′∈R2
1
hX
N∑
j=1
K
(tij − t
hX
)Xij − a0 − a1(tij − t)2 , (24)
with N = 31. The growth velocities X ′i(t) are estimated analogously by taking the slope
of weighted local quadratic fits, X ′i(t) = bi1(t), where
(bi0, bi1, bi2)(t) = argminb∈R3
1
h′X
Ni∑
j=1
K
(tij − t
h′X
)Xij − b0 − b1(tij − t)− b2(tij − t)2
2.
(25)
In a second step, f(t, x) is obtained by another local linear estimator based on Xi(t)
and X ′i(t), setting f(t, x) = d0(t, x), where
d0(t, x), d1(t, x) = argmind∈R2
n∑
i=1
1
bXK
Xi(t)− x
bX
X ′
i(t)− d0 − d1(Xi(t)− x)2
.
(26)
A practically relevant feature is that for given t the function f(t, ·) is only defined on
the domain (mini Xi(t),maxi Xi(t)). A second implementation issue is the choice of thesmoothing bandwidths hX , hX′ , and bX that are needed for local polynomial estimators(24), (25) and (26). We select these tuning parameters by generalized cross-validation(Golub et al., 1979).
Estimated growth curves and estimated growth velocities for the sample of girls aredepicted in Figure 1. The estimated function f(t, x), corresponding to the deterministicpart of the data-driven nonlinear differential equation, is displayed as a surface in theleft panel of Figure 2 and as a contour plot in the right panel. Growth velocity has atendency to decrease with age, with the exception of the pubertal growth spurt at agebetween 10 and 13.
A more detailed study of the function f , considering f(t, ·) as a function of currentheight x for ages t = 2, 4, 6, 8, 12 or 16, as shown in Figure 3, reveals that at earlierages (e.g., at age 2), there is a sizeable difference between the fits of the linear and thenonlinear differential equation and furthermore that an autonomous differential equationis inadequate. The clearly more appropriate proposed nonlinear non-autonomous modelshows that there is only a weak relationship between growth velocity and height, whilebetween ages 4 and 8, taller girls also tend to have a higher growth velocity, whichcan be interpreted as manifestation of an inherent growth momentum in this age range.In contrast, for ages between 12 and 16, f(t, ·) is no longer monotone. At Age 12, therelationship is weak, likely due to the fact that the taller girls already had their pubertygrowth peak prior to this age and their growth velocity then is decreasing during thepost-pubertal growth deceleration, while the smaller girls did not enter the pubertalspurt with its growth acceleration yet. At age 16, all girls are growing in a much slowerway, however both shorter and taller girls grow relatively faster than medium sized girls,indicating a strongly nonlinear relationship.
The nonlinear dynamic coefficient of determination R2(t) defined in Equation (11)quantifies to which extent the deterministic part of the nonlinear differential equation(4) explains the variance of X ′(t). When estimating this coefficient with R2
x1(t),x2(t)(t)
defined in Equation (19), we chose x1(t), respectively x2(t), as the third smallest, respec-
Figure 2. Estimation of f(t, x). Left panel: Estimated surface f(t, x) ona curved domain, characterizing the deterministic part of the nonlinear
dynamic model in (4). Right panel: Contour plot of the surface f(t, x).
tively largest, value among Xi(t), (i = 1, . . . , n). We also estimated the linear dynamiccoefficient of determination R2
L(t) defined in Equation (15) for the linear dynamic model
(3). A comparison of the two coefficients of determination R2(t) and R2L(t) is shown in
Figure 4, and bootstrap confidence bands for the nonlinear version R2(t) are shown inthe right panel.
For the proposed nonlinear dynamic model, R2(t) is seen to be close to 0.5 fromapproximately age 4 to 8. This implies that the deterministic part of the data-drivendifferential equation captures the behavior of the growth curves during these periodsquite well. In contrast, R2(t) decays sharply from around age 11, as growth velocitiesare difficult to predict during this period, likely due to time variation in the occurrenceof menarche and pubertal growth spurts. For the simpler linear dynamic model, thecorresponding R2
L(t) is always smaller than the corresponding R2(t) for the proposedmodel, but comes closest during ages 8 to 10, where the discrepancy between the fits fromthe linear and the nonlinear systems is relatively small. In conclusion, growth dynamicsaround the pubertal growth spurt are highly nonlinear.
Acknowledgements
We wish to thank several reviewers for helpful comments. The third author acknowl-edges support from the U.S. National Science Foundation.
Figure 3. Comparison between nonlinear and linear dynamic estimation. Each of the pan-els, arranged for ages t = 2, 4, 6, 8, 12 years from left to right and top to bottom, respec-
tively, illustrates the estimates f(t, ·) of the deterministic part of the nonlinear dynamicmodel (4) (solid), the linear estimates (3) (dashed) and the scatterplot of observed data
pairs Xi(t), X′
i (t).
Appendix 1
Assumptions for Corollary 2
In these assumptions, g(t, ·) stands for the density of X(t).D.1 The kernel K has compact support [−1, 1] and is Lipschitz continuous with constant µK .
Moreover, K is positive and satisfies∫ 1
−1K(u)du = 1,
∫ 1
−1K(u)udu = 0 and
∫ 1
−1K(u)u2du 6= 0.
D.2 The random function X and U are almost surely two times continuously differentiable. Fort ∈ T , |X(t)| ≤ C0, |X
′(t)| ≤ C1, |X(2)(t)| ≤ C2, |U(t)| ≤ C3, |U
′(t)| ≤ C4, |U(2)(t)| ≤ C5.
D.3 The random variables ǫij (i = 1, . . . , n; j = 1, . . . , N) and ζij (i = 1, . . . , n; j = 1, . . . , N) arecentered and have a finite moment of order 8.D.4 The functions f(t, ·) and g(t, ·) are Lipschitz with constants µf and µg, twice continuouslydifferentiable and have a compact support.D.5 The conditional variance s(t, x) = varU(t) | X(t) = x is continuous and is nonzero.D.6 We have (N,n) → ∞ and (bX , hX , hU ) → 0, such that nbX ≥ log2 n → ∞, NhXb4X ≥ 1,NhU → ∞ and hX ≤ bX .
Figure 4. Coefficients of determination. Left panel: Estimated coefficients
of determination R2(t) (12), corresponding to the fraction of variance ex-plained by the deterministic part of the nonlinear dynamic model (4)
(solid), in comparison with the corresponding fractions of variance R2
L(t)(15) explained by the linear dynamics (3) (dot-dashed). Right panel: 95%
bootstrap confidence interval for R2(t).
Appendix 2
Proofs
Proof of Theorem 1. We decompose the difference f(t, x)− f(t, x) into the sum of two terms,
A =
∑ni=1 KXi(t)−x
bXX ′
i(t)∑ni=1 KXi(t)−x
bX
− f(t, x)
B =
∑ni=1 K Xi(t)−x
bXX ′
i(t)∑n
i=1 K Xi(t)−xbX
−
∑ni=1 KXi(t)−x
bXX ′
i(t)∑ni=1 KXi(t)−x
bX
.
The term A is simply the difference between a Nadaraya–Watson estimator and its target.Under Assumptions C.1–2,4–6, the pointwise risk of this estimator is known (Schimek, 2000,pages 43–70) to be equivalent to
b2X
∫ 1
−1
u2K(u)du
21
2
d2f(t, x)
dx2+
df(t,x)dx
dg(t,x)dx
g(t, x)
2
+s(t, x)
g(t, x)nbX
∫ 1
−1
K2(u)du , (A1)
if the quantities involved in the last expression are nonzero. Hence, we have
It then follows from Assumption C.6, (A2) and (A3) that
E(B2)= O
(h4X′ +
h4X
b2X+
σ2
NhXb2X+
σ2
Nh3X′
+1
n
). (A9)
Combining this last bound with (A1) allows to prove the first part of the theorem. Setting hX =N−1/5, hX′ = N−1/7 and bX = N−2/15 if n ≥ N2/3, while bX = n−1/5 if n ≤ N2/3, assumptionC.6 is satisfied and one obtains
Ef(t, x)− f(t, x)
2
= Omax
(N−8/15, n−4/5
).
Proof of Theorem 2. We first consider the denominator of (20) divided by nx1,x2and then the
numerator of (20) divided by nx1,x2.
We note that
varx1,x2X ′(t) =
∑ni=1 X
′2i (t)1x1≤Xi(t)≤x2
− ∑n
i=1 X′i(t)1x1≤Xi(t)≤x2
/nx1,x22
nx1,x2
.
In the sequel, nx1,x2stands for #i, x1 ≤ Xi(t) ≤ x2. The difference varx1,x2
X ′(t) −varx1,x2
X ′(t) behaves like
Op
(n−1/2
)+
∑ni=1 X
′2
i (t)1x1≤Xi(t)≤x2
nx1,x2
−
∑ni=1 X
′2i (t)1x1≤Xi(t)≤x2
nx1,x2
+
∑ni=1 X
′i(t)1x1≤Xi(t)≤x2
nx1,x2
2
−
∑ni=1 X
′i(t)1x1≤Xi(t)≤x2
nx1,x2
2
. (A10)
Consider the following upper bound of |X ′2
i (t)1x1≤Xi(t)≤x2−X ′2
i (t)1x1≤Xi(t)≤x2|
|X ′2
i (t)−X ′2i (t)|+X
′2i (t)|1x1≤Xi(t)≤x2
− 1x1≤Xi(t)≤x2| .
Since X ′ is a kernel estimator of X ′(t), we have
E
[X ′
2(t)−X ′2(t)
2
| X
]= Op
(h4X′ +
1
Nh3X′
).
To bound the expectation of the term |1x1≤Xi(t)≤x2− 1x1≤Xi(t)≤x2
|, we use the rate of convergence
(A2) of Xi(t). Since X ′(t) is uniformly bounded, we get
EX
′2i (t)|1x1≤Xi(t)≤x2
− 1x1≤Xi(t)≤x2|= O
h2X + (NhX)−1/2
.
E∣∣∣X ′
2
i (t)1x1≤Xi(t)≤x2−X ′2
i (t)1x1≤Xi(t)≤x2
∣∣∣= O
h2X′ + h2
X +1
N1/2h3/2X′
+ (NhX)−1/2
.
From the rate of convergence (A2) of X(t), we derive that
therefore the expectation of the second difference in (A13) is
O
b2X + (nbX)−1/2 +
h2X
bX+
σ
(NhX)1/2bX+ h2
X′ +σ
N1/2h3/2X′
. (A14)
In order to control difference f (−i)2t, Xi(t) − f2t, Xi(t) in (A13), we observe that
ft, Xi(t) decomposes as
K(0)X ′i(t)
K(0) +∑
j 6=i K
Xj(t)−Xi(t)bX
+ f (−i)t, Xi(t)
1− K(0)
K(0) +∑
j 6=i K
Xj(t)−Xi(t)bX
.
We note β = K(0)/(K(0) +∑
j 6=i K[Xj(t)− Xi(t)/bX ]). Thus, the difference
E[|f (−i)2t, Xi(t) − f2t, Xi(t)|1x1≤Xi(t)≤x2
]is of the form
O(1)Eβ|X ′
2
i (t)|1x1≤Xi(t)≤x2
+O(1)E
(β[f (−i)t, Xi(t)]
21x1≤Xi(t)≤x2
).
Applying Bernstein inequality as in the proof of Theorem 1, we upper bound β byO[gt, Xi(t)/(nbX)] with overwhelming probability. We control the random variable on the com-plementary event applying the Cauchy–Schwarz inequality. All in all, we get
E−i
[|f (−i)2t, Xi(t) − f2t, Xi(t) | 1x1≤Xi(t)≤x2
]≤ Op
1x1≤Xi(t)≤x2
max1, X ′2
i (t)
nbXgt, Xi(t)
.
Integrating with respect to Xi, we conclude that
E[|f (−i)2t, Xi(t) − f2t, Xi(t)|1x1≤Xi(t)≤x2
]= O
(1
nbX
). (A15)
In order to control the third difference in (A13), we upper bound |f2t, Xi(t)1x1≤Xi(t)≤x2−
f2t,Xi(t))1x1≤Xi(t)≤x2| by 2µf‖f‖∞
∣∣∣Xi(t)−Xi(t)∣∣∣ if x1 ≤ Xi(t) ≤ x2 and if x1 ≤ Xi(t) ≤ x2,
by 0 if Xi(t) /∈ [x1, x2] and if Xi(t) /∈ [x1, x2], and by ‖f‖2∞ else. From Equation (A2), we derive
E|f2t, Xi(t)1x1≤Xi(t)≤x2− f2t,Xi(t)1x1≤Xi(t)≤x2
| = Oh2X + (NhX)−1/2
. (A16)
Combining (A14), (A15), and (A16) with (A12) and (A13), we obtain
Arguing similarly, we obtain the rate of convergence of the two remaining terms in (A12). We
conclude that varx1,x2f(t, X(t)) − varx1,x2
f(t,X(t)) behaves like
Op
b2X +
h2X
bX+ h2
X′ + (nbX)−1/2 +1
(NhX)1/2bX+
1
n1/2h3/2X′
.
Proof of Corollary 1. We only need to observe that the rate of convergence of EX ′2(t) |
X(t) = x towards EX ′2(t) | X(t) = x is the same as that of f(t, x) towards f(t, x). Indeed,
EX ′2(t) | X(t) = x is a Nadaraya–Watson estimator based on X ′i(t), X
′i(t), (i = 1, . . . , n).
Gathering this remark with Theorem 1 allows to conclude.
Proof of Corollary 2. The arguments are the same as in the proof of Theorem 1, the onlydifference being that the rate of convergence of X ′(t) is replaced by the rate of convergence of
U(t).
Bibliography
Bellman, R. & Roth, R. (1971). The use of splines with unknown end points in the identification ofsystems. J. Math. Anal. Appl. 34 26–33.
Brunel, N. (2008). Parameter estimation of ODE’s via nonparametric estimators. Electron. J. Stat. 21242–1267. URL http://dx.doi.org/10.1214/07-EJS132.
Chen, J. & Wu, H. (2008). Efficient local estimation for time-varying coefficients in deterministicdynamic models with applications to hiv-1 dynamics. Ann. Stat 103 369–384.
Chiang, C., Rice, J. & Wu, C. (2001). Smoothing spline estimation for varying coefficient models withrepeatedly measured dependent variables. Journal of the American Statistical Association 96 605–619.
Ellner, S., Seifu, Y. & Smith, R. (2002). Fitting population dynamic models to time-series data bygradient matching. Ecology 83 2256–2270.
Fan, J. & Gijbels, I. (1996). Local polynomial modelling and its applications, vol. 66 of Monographs onStatistics and Applied Probability. London: Chapman & Hall.
Gasser, T. & Muller, H.-G. (1984). Estimating regression functions and their derivatives by the kernelmethod. Scand. J. Statist. 11 171–185.
Gasser, T., Muller, H.-G., Kohler, W., Molinari, L. & Prader, A. (1984). Nonparametricregression analysis of growth curves. Ann. Statist. 12 210–229.
Golub, G., Heath, M. & Wahba, G. (1979). Generalized cross-validation as a method for choosing agood ridge parameter. Technometrics 21 215–223.
Hansen, B., Cortina-Borja, M. & Ratcliffe, S. (2003). Assessing non-linear estimation proceduresfor human growth models. Ann Hum Biol 30 80 –96.
Hoffmann, M. (1999). Adaptive estimation in diffusion processes. Stochastic Process. Appl. 79 135–163.URL http://dx.doi.org/10.1016/S0304-4149(98)00074-X.
Holte, S., Melvin, A., Mullins, J., Tobin, N. & Frenkel, L. (2006). Density-dependent decay inhiv-1 dynamics. J. Acquired Immune Deficiency Syndromes 41 266–276.
Hooker, G. (2009). Empirical Dynamics for Longitudinal Data. Biometrics 65 928–936.Jacod, J. (2000). Non-parametric kernel estimation of the coefficient of a diffusion. Scand. J. Statist.
27 83–96. URL http://dx.doi.org/10.1111/1467-9469.00180.Jones, H. & Bayley, N. (1941). The Berkeley Growth Study. Child Development 12 167–173.Jones, M. C. & Foster, P. J. (1996). A simple nonnegative boundary correction method for kernel
density stimation. Statistica Sinica 6 1005–1013.Liang, H. & Wu, H. (2008). Parameter estimation for differential equation models using a frame-
work of measurement error in regression models. J. Amer. Statist. Assoc. 103 1570–1583. URLhttp://dx.doi.org/10.1198/016214508000000797.
Mas, A. & Pumo, B. (2009). Functional linear regression with derivatives. J. Nonparametr. Stat. 2119–40.
Miao, H., Dykes, C., Demeter, L. M. & Wu, H. (2009). Differential equation modeling of hiv viralfitness experiments: Model identification, model selection, and multimodel inference. Biometrics 65292 – 300.
Muller, H.-G. (1987). Weighted local regression and kernel methods for nonparametric curve fitting.J. Am. Stat. Assoc. 82 231–238.
Muller, H.-G. (1991). Smooth optimum kernel estimators near endpoints. Biometrika 78 521–530.Muller, H.-G. & Yao, F. (2010). Empirical Dynamics for Longitudinal Data. Ann. Statist. 38 3458–
3486.Paul, D., Peng, J. & Burman, P. (2011). Semiparametric modeling of autonomous nonlinear dynamical
systems with applications. Ann. Appl. Stat. 5 2078–2108.Perelson, A., Essunger, P., Cao, Y., Vesanen, M., Hurley, A., Saksela, K., Markowitz, M.
& Ho, D. (1997). Decay characteristics of hiv-l-infected compartments during combination therapy.Nature 387 188–191.
Preece, M. & Baines, M. (1978). A new family of mathematical models describing the human growthcurve. Ann. Hum. Biol. 5 1–24.
Ramsay, J. O., Hooker, G., Campbell, D. & Cao, J. (2007). Parameter estima-tion for differential equations: a generalized smoothing approach. J. R. Stat. Soc. Ser.B Stat. Methodol. 69 741–796. With discussions and a reply by the authors, URLhttp://dx.doi.org/10.1111/j.1467-9868.2007.00610.x.
Ramsay, J. O. & Silverman, B. W. (2005). Functional Data Analysis. New York: Springer, 2nd ed.Reddy, S. K. & Dass, M. (2006). Modeling on-line art auction dynamics using functional data analysis.
Stat. Sci. 21 179–193.Schimek, M., ed. (2000). Smoothing and regression. New York: John Wiley & Sons Inc.Senturk, D. & Muller, H.-G. (2010). Functional varying coefficient models for longitudinal data.
Journal of the American Statistical Association 105 1256–1264.Tanner, J., Whitehouse, R. & Takaishi, M. (1966). Standards from birth to maturity for height,
weight, height velocity, and weight velocity: British children. Arch. Dis. Child. 41 613–635.Wang, S., Jank, W., Shmueli, G. & Smith, P. (2008). Modeling price dynamics in ebay auctions using
principal differential analysis. J. Am. Stat. Assoc. 103 1100–1118.