Adaptive inference for the mean of a Gaussian process in functional data Florentina Bunea 1 , Marten H. Wegkamp 1 and Andrada E. Ivanescu 2 Florida State University and East Carolina University November 5, 2010 Abstract This paper proposes and analyzes fully data driven methods for inference about the mean function of a Gaussian process from a sample of independent trajectories of the process, observed at random time points and corrupted by additive random error. The proposed method uses thresholded least squares estimators relative to an approximat- ing function basis. The variable threshold levels are estimated from the data and the resulting estimates adapt to the unknown sparsity of the mean function relative to the selected approximating basis. These results are based on novel oracle inequalities that are used to derive the rates of convergence of our estimates. In addition, we construct confidence balls that adapt to the unknown regularity of the mean function. They are easy to compute since they do not require explicit estimation of the covariance operator of the process. The simulation study shows that the new method performs very well in practice, and is robust against large variations introduced by the random error terms. Keywords: Stochastic processes; nonparametric mean estimation; thresholded estimators; functional data; oracle inequalities; adaptive inference; confidence balls. Acknowledgements: The authors thank the Associate Editor and a referee for their con- structive remarks. The research of Florentina Bunea and Marten Wegkamp was supported in part by NSF Grants DMS-0706829 and DMS-1007444. Part of the research was done while 1 Department of Statistics, Florida State University, Tallahassee, FL 32306-4330. 2 Department of Biostatistics, East Carolina University, Greenville, NC 27858-4353. 1
41
Embed
Adaptive inference for the mean of a Gaussian …courses.cit.cornell.edu/fb238/ps/BWI_LAST_NOV4.2010.pdfAdaptive inference for the mean of a Gaussian process in functional data Florentina
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Adaptive inference for the mean of a Gaussian process
in functional data
Florentina Bunea1, Marten H. Wegkamp1 and Andrada E. Ivanescu2
Florida State University and East Carolina University
November 5, 2010
Abstract
This paper proposes and analyzes fully data driven methods for inference about the
mean function of a Gaussian process from a sample of independent trajectories of the
process, observed at random time points and corrupted by additive random error. The
proposed method uses thresholded least squares estimators relative to an approximat-
ing function basis. The variable threshold levels are estimated from the data and the
resulting estimates adapt to the unknown sparsity of the mean function relative to the
selected approximating basis. These results are based on novel oracle inequalities that
are used to derive the rates of convergence of our estimates. In addition, we construct
confidence balls that adapt to the unknown regularity of the mean function. They are
easy to compute since they do not require explicit estimation of the covariance operator
of the process. The simulation study shows that the new method performs very well in
practice, and is robust against large variations introduced by the random error terms.
Keywords: Stochastic processes; nonparametric mean estimation; thresholded estimators;
Acknowledgements: The authors thank the Associate Editor and a referee for their con-
structive remarks. The research of Florentina Bunea and Marten Wegkamp was supported in
part by NSF Grants DMS-0706829 and DMS-1007444. Part of the research was done while
1Department of Statistics, Florida State University, Tallahassee, FL 32306-4330.2Department of Biostatistics, East Carolina University, Greenville, NC 27858-4353.
1
the authors were visiting the Isaac Newton Institute for Mathematical Sciences (Statistical
Theory and Methods for Complex, High-Dimensional Data Programme) at Cambridge Uni-
versity during Spring 2008.
1 Introduction
In this paper we develop and analyze new methodology for inference about the mean of a
Gaussian process from data that consists of independent realizations of this process observed
at discrete times, where each observation is contaminated by an additive error term. For-
mally, let X(t), 0 ≤ t ≤ 1 be a Gaussian process with mean function f(t) = E[X(t)] and
stochastic part Z(t) = X(t) − f(t). We denote the covariance function of X (and Z) by
Γ(s, t) = Cov(X(s), X(t)), for all 0 ≤ s, t ≤ 1. We observe Yij at times tij, for 1 ≤ i ≤ n,
1 ≤ j ≤ m, that are of the form
Yij = Xi(tij) + εij (1)
where Xi(t), with mean f(t), are random independent realizations of the process X(t).
Although we could allow for different sample sizes mi per curve – the conditions on m
imposed below would then be replaced by conditions on minmi – we treat mi = m for ease
of notation and clarity of exposition. We assume that εij are independent across i and j
with E[εij] = 0 and E[ε2ij] = σ2
ε <∞, and ε is independent of X.
Although the estimation of f received considerable attention over the last decade, the the-
oretical study of data-adaptive estimators in model (1) is still open to investigation. In
contrast with the abundance of methods for estimating f , methods for constructing confi-
dence sets for f are very limited. This motivates our two-fold contribution to the existing
literature:
(1) construction of computationally efficient and fully data-driven estimators and confi-
dence balls for f ;
(2) theoretical assessment of the quality of our data adaptive estimates and proof that the
estimators and the confidence balls adapt to the unknown regularity of f and Z.
2
We begin by reviewing the existing results in the literature, which provides further moti-
vation for the procedure set forth in this article. The problem of estimating f from data
generated from (1) has been considered by a large number of authors, starting with Ramsay
and Silverman (2002, 2005) and Rupert, Wand and Carroll (2003). The existing methods are
either based on kernel smoothers as in Zhang and Chen (2007), Yao (2007), Benko, Hardle
and Kneip (2009), penalized splines, as in, for instance, Ramsay and Silverman (2005),
free-knot splines as in Gervini (2006), or ridge-type least squares estimates as in Rice and
Silverman (1991). All resulting estimates depend on tuning parameters that are method
specific. Theoretical properties of these estimates of f are still emerging, and have only been
established for non-adaptive choices of the respective tuning parameters which require prior
knowledge of the smoothness of f , see, for instance, Zhang and Chen (2007) and Gervini
(2006). Although guidelines for data-driven choices of these parameters are offered in all
these works, the theoretical properties of the resulting estimates are still open to investi-
gation. In contrast, we suggest in Section 2 below a computationally simple method based
on thresholded projection estimators, with variable threshold level. Our method does not
require any specification of the regularity of f(t) or Z(t) prior to estimation. We show via
oracle inequalities that our estimators adapt to this unknown regularity.
Whereas the estimation of the mean f(t) of the process X(t) is well understood, modulo
the technical and possibly computational issues raised above, the construction of uniform
confidence sets for f has not been investigated in this context and in general the construction
of confidence sets for f in model (1) seems to have received little attention. An exception
is Degras (2009). Although his procedure is attractive, his theoretical analysis ignores the
bias term when applying a classical result by Landau and Shepp (1970) on the supremum
of a Gaussian process. Therefore his confidence bands do not attain the nominal coverage.
We propose and analyze a number of alternative procedures for constructive confidence sets.
In particular, we offer a computationally simple procedure that leads to adaptive confidence
balls.
The paper is organized as follows. In Section 2.2 below we discuss thresholded projection
estimators in the functional data setting. In Section 2.3 we establish oracle inequalities for the
3
fit of the estimators which show that the estimates adapt to the unknown sparsity of the mean
f . Under appropriate conditions on the mean f(t) and the covariance function Γ(s, t) of the
process Z(t), we derive rates of convergence for our estimates in Section 2.4. In Section 2.5 we
construct confidence balls for f and prove that they have the desired asymptotic coverage
probability uniformly over large classes of functions. Moreover, we suggest a number of
methods for constructing confidence bands. Section 3 contains a comprehensive simulation
study that indicates that our methods compare favorably with existing methods. The net
merit of the proposed methods is especially visible when the variance of the random noise ε
is at the same level as that of the stochastic process Z(t), and we discuss this in detail in
Sections 3.2 and 3.3.
2 Methodology
2.1 Preliminaries
In this section we introduce notation and assumptions that will be used throughout the
paper. As explained in the introduction, the aim of this paper is
(a) to estimate the mean f(t) of the process X(t) and
(b) to construct confidence sets for the mean f(t), t ∈ [0, 1].
We assume throughout that f ∈ L2([0, 1], dt) and is bounded; dt denotes the Lebesgue mea-
sure on [0, 1] and in what follows we will write L2 for the space L2([0, 1], dt). We will also
make the following standard assumption on the process:
Assumption 1. The paths of the Gaussian process X(t), t ∈ [0, 1], are L2-functions almost
surely, and the covariance kernel Γ is continuous and satisfies∫ 1
0
Γ(t, t) dt <∞.
Remark. We notice, for further reference, that Assumption 1 guarantees that Γ ≥ 0 (semi-
positive definite); see, for instance, Shorack and Wellner (1986, page 208). Also, by Mercer’s
4
theorem, all continuous Γ have an eigen-decomposition
Γ(s, t) =∞∑j=1
λjfj(s)fj(t)
in terms of eigenvalues λ1 ≥ λ2 ≥ · · · and (orthonormal) eigenfunctions f1, f2, · · · . More-
over, λj ≥ 0 and∑∞
j=1 λj <∞; see, e.g., Shorack and Wellner (1986, page 210).
Our approach uses thresholded projection estimates which are obtained relative to bases
φ1, φ2, . . . that are orthonormal in L2 and are known to have good approximation properties
over a large scale of smoothness classes to which the target f may belong. Examples include
the Fourier, local trigonometric and wavelet bases.
Assumption 2. The mean f(t) = E[X(t)] of the Gaussian process X(t) is in L2 and may
be written as
f(t) =∞∑k=1
µkφk(t) (2)
where the convergence is uniform and absolute on [0, 1]. The coefficients µk are given by
µk =
∫ 1
0
f(t)φk(t) dt. (3)
Assumption 3. The observation “times” tij are independent and uniformly distributed on
the interval [0, 1].
Assumption 4. The errors εij are independent N(0, σ2) random variables.
Assumption 5. The basis functions φk are uniformly bounded.
Remark. The Gaussian assumption on the process X(t) and errors εij in Assumptions 1
and 4, and the bounded basis assumption (Assumption 5) may be relaxed at the cost of
rather technical proofs. These assumptions are used in the proof of Proposition 2 of Section
2.3.1 below.
5
2.2 Threshold-type estimators for functional data
Our procedure falls between two of the currently used strategies for the estimation of f :
averaging estimated individual trajectories and applying various smoothing methods to the
entire data set. Our initial estimator of f is a projection estimator onto a space generated
by a large set of basis functions and can be viewed as an average (over n) of weighted values
of the observations Yij. Our final estimator will be a truncated version of the projection
estimator, with data dependent truncation levels determined from the entire data set. We
describe the details in what follows.
Given a family of basis functions φkk and a large integer d (cut-off point), which can grow
polynomially in n, our initial estimator of f is f(t) =∑d
k=1 µkφk(t), where
µk =1
n
n∑i=1
µi,k (4)
is the average of the projection estimators
µi,k =1
m
m∑j=1
Yijφk(tij). (5)
The variance of the initial estimator f(t) =∑d
k=1 µkφk(t) may be unnecessarily inflated by
the presence of, possibly many, very small estimates µk. This drawback can be remedied
by truncating the coefficients at a level rk that takes into account both the variability of
the measurement errors εij and the variability of the stochastic processes Zi(t). This is the
essential difference between truncated estimators based on data generated as in (1) and their
counterpart based only on independent data in a standard nonparametric regression setting.
We will focus on hard threshold estimators of the coefficients µk and of the function f . They
are, respectively,
µk(rk) = µk1|µk| > rk
and
f(r) =d∑
k=1
µk(rk)φk,
6
where here and in what follows we will use the notation r = (r1, . . . , rd). In the next section
we discuss the goodness-of-fit of these estimates in terms of the L2 norm, where for any
g ∈ L2 we denote by ‖g‖2 =∫ 1
0g2(t) dt its L2 norm.
2.3 Oracle inequalities
Define, for each 1 ≤ k ≤ d,
µk(rk) = µk1|µk| > rk
and write
f(r)(t) =d∑
k=1
µk(rk)φk(t)1|µk| > rk.
The function f(r) can be regarded as a sparse approximation of f relative to a given basis;
of course, since the function f is unknown so is its sparse approximation. In this section we
show that the truncated estimators introduced above, constructed without any prior knowl-
edge of such sparse representations, mimic the bias-variance decomposition of estimators
that would use such information in their construction. Therefore our estimates adapt to
the unknown sparsity of f and we call the corresponding results sparsity oracle inequalities.
They permit us to determine, as a consequence, the rates of convergence of our estimators.
We discuss them in detail in the next section.
We begin by establishing Theorem 1, which is an oracle inequality for the hard threshold
estimator. The result holds on the event
Ωn =d⋂
k=1
|µk − µk| ≤ rk (6)
and is valid for any given threshold levels rk. This clearly shows what will drive the choice of
the threshold levels rk: they have to be chosen such that Ωn holds with probability arbitrarily
close to one. In Proposition 2 below we propose levels rk for which lim infn→∞ P(Ωn) ≥ 1−α,
for any given 0 < α < 1. In particular, for α = 1/n, this guarantees that limn→∞ P(Ωn) = 1.
7
Theorem 1. For all d ≥ 1, on the event Ωn,
‖f(2r) − f‖ ≤ ‖f − f(r)‖+ 3
√√√√ d∑k=1
r2k1|µk| > rk.
Moreover, for d ≤ n, we have for some finite constant C,
E[‖f(2r) − f‖
]≤ ‖f − f(r)‖+ 3
√√√√ d∑k=1
r2k1|µk| > rk+ C
√1− P(Ωn).
Proof. We first observe that
‖f(2r) − f(r)‖2 =d∑
k=1
(µk(2rk)− µk(rk))2.
For the first claim, it suffices to show that on the event Ωn,
|µk(2rk)− µk(rk)| ≤ 3rk1|µk| > rk (7)
holds for all 1 ≤ k ≤ d, and any d ≥ 1 since this bound at the coefficient-level implies
‖f(2r) − f(r)‖ ≤ 3
√√√√ d∑k=1
r2k1|µk| > rk,
and the claim of Theorem 1 follows from the triangle inequality. We now prove (7). Indeed,
We found in our simulations that these bands are only a little wide, but not much. We
briefly indicate a few other possible bands based on our estimators and we report on their
empirical behavior in the next section. The theoretical analysis of these bands is the subject
of future work.
24
First we consider a confidence interval based on the truncated series estimator f(t) =∑dk=1 µkφk(t). Recognizing that f(t) can be written as an average n−1
∑ni=1Wi(t) of in-
dependent components Wi(t) =∑d
k=1 µikφk(t), 1 ≤ i ≤ n, the distribution of f(t) is asymp-
totically Gaussian. Thus, provided that (i) the bias√nE[f(t)] − f(t) is asymptotically
negligible, and (ii) the sample variance S2n(t) based on W1(t), . . . ,Wn(t) consistently esti-
mates v2(t) = Var(W1(t)), the random intervals [f(t) ± n−1/2Sn(t)zα/2] contain f(t) with
probability 1−α, asymptotically, as n→∞. First we address the bias issue (i). For a large
class of functions f ∈ W β with β ≥ 1, the bias disappears for d large enough since
√n∣∣∣E[f(t)]− f(t)
∣∣∣ . √n ∞∑k=d+1
|µk| .√n
∞∑k=d+1
kβ−1/2 . n1/2d−β+1/2 → 0
for d >> n1/(2β−1) under the bounded basis assumption (Assumption 5). We now address
(ii). Each Wi(t) =∑d
k=1 µikφk(t) is a sum of d iid components with mean µk and variance
of order O(1/m). Hence, Wi(t) is asymptotically Gaussian with mean∑d
k=1 µkφk(t) and
variance v2(t) = O(d2m−1). Writing Wi(t) = v(t)Ui(t), we find that
S2n(t)− v2(t) = v2(t)
[1
n− 1
n∑i=1
Ui(t)− U(t)2 − 1
]
is of stochastic order Op(v2(t)n−1/2) and the sample variance S2
n(t) converges to v2(t) in
probability if d2/(m√n) → 0. A band can now be obtained by taking the regular grid
tj = j/m and computing
f(tj)±Sn(tj)√
nzα/(2m) (32)
based on Bonferroni’s inequality. This band is easy to compute and has reasonably good
coverage as shown in our simulations.
We found in our simulations, reported in the next section, that replacing each Wi(t) by
Wi(t) =∑
k∈s µikφk(t) with s = k : |µk| > rk being the set of selected indices, yields
much smaller bands with good asymptotic coverage. We notice that this approach ignores
the uncertainty due to selecting the set s, resulting in coverage less than 100(1− α)%.
25
A superior way to obtain confidence bands is to analyze the process
En(t) =√nf(t)− E[f(t)]
=
1√n
n∑i=1
d∑k=1
(µik − µk)φk(t).
For the trigonometric basis φk used in our simulations, it is not hard to show that En
converges weakly to a Gaussian limit. As before, for f ∈ W β, β ≥ 1, provided d >> n1/(2β−1)
is large enough, the bias√n‖E[f ]− f‖∞ → 0 and
f(t)± Sn(t)√nqα (33)
based on the α upper point qα of the distribution of the supremum of the normalized process
En(t)/Sn(t), constitutes an asymptotic 1− α confidence band. For an easily implementable
approximation q∗α of qα, we rely on the following resampling procedure.
1. Draw with replacement n vectors (µ∗i1, . . . , µ∗id) from (µ11, . . . , µ1d), · · · , (µn1, . . . , µnd).
2. Compute C∗i (t) =∑d
k=1 φk(t)(µ∗ik−µk) , S∗n(t) = (n−1)−1/2
[∑ni=1C∗i (t)− C∗(t)2
]1/2and E∗n(t) = n−1/2
∑ni=1 C
∗i (t)
3. Approximate supt |E∗n(t)/S∗n(t)|
4. Repeat the previous steps B times, and obtain the α-upper point q∗α.
3 Numerical results
3.1 Simulation design
We conducted our simulations for a combination of types of mean zero stochastic processes,
stationary and non-stationary, and differentiable and non-differentiable mean functions.
Specifically, we consider two stationary processes, AR(1) and ARMA(1,1), and two non-
stationary processes, the Brownian Bridge (BB) and the Brownian Motion (BM) on [0,1].
Equally Sp. tj 0.051 0.100 0.96Uniform tij 0.074 0.107 0.88
3.3.2 Confidence bands
Next, we investigate the finite sample coverage of the confidence bands (Methods 1 - 4)
proposed in Section 2.5, again using the Fourier basis. Method 1 is based on display (31).
33
Method 2 is the band obtained using display (32) with Wi(t) =∑
k∈s µikφk(t). Method 3 is
the band obtained using display (32) with Wi(t) =∑d
k=1 µikφk(t). Method 4 is implemented
as in display (33).
We consider the following scenario for our simulations for evaluating confidence bands:
n = 300, m = 64 and d = 30. The signal-to-noise ratio was set to 2.2. Both the uni-
form and equally spaced design are considered for the discrete time points. We evaluate our
bands at the fine grid consisting of equally spaced time points in [0,1], which we construct
by taking tj = j/m, 1 ≤ j ≤ m. We compare these methods with simultaneous confidence
bands (SCB) of the form: fave(t)± zα/(2m)n−1/2Γ1/2(t, t), based on Zhang and Chen (2007).
Here fave(t) = (1/n)∑n
i=1 floc,i(t) is the average of local linear kernel estimators floc,i for
each curve with local bandwidth (built-in bandwidth choice lokerns in R). The estimated
covariance is computed using the sample variance of the kernel estimators at each t. Table
4 shows, for equally spaced design, that these SCB are not optimal.
Table 4 summarizes the results for the AR(1) and BB processes and Signal 1. The entries
in these tables are the average widths of the confidence bands followed, in parentheses, by
their empirical coverage over S = 300 simulations. Since we chose for this simulation study
α = 0.05, we expect the empirical coverage of the proposed bands to be around the 0.95
nominal level. The results presented in Table 4 below support the following conclusions on
the proposed confidence bands.
1. Method 1 yields an adaptive band that is conservative and necessarily wider than
the other methods we analyze. The coverage is close to and often times exceeds the
nominal coverage.
2. Method 2 has coverage close to the nominal level for equally spaced design, and its
width is narrowest among all methods considered. Methods 2 and 3 have similar
coverage in the case of equally spaced design with Method 2 having smaller width. In
the uniform design case, however, Method 2 has slightly lower coverage than Method
3.
3. Method 3 and 4 are centered at the same estimator; the difference in their width being
34
the quantile used. Method 4, employing a quantile chosen via bootstrap, produces
bands with narrower width on average, while maintaining coverage close to the nominal
level. It has slightly lower coverage than Method 3.
4. For the same (n,m, d) combination, the scenario σ∗ = 10 has narrower band widths
compared to σ∗ = 1. Also, the bands for the uniform design have a slightly wider width
compared to equally spaced design. Across all scenarios we consider, the coverage for
the proposed bands lies between 0.84 - 1.00 for AR(1) process and 0.91 - 1.00 for BB
process.
4 Application to daily temperature curves
We use the tempkent dataset, which is part of the functional datasets fds package in R.
It consists of many temperature curves recorded over the course of a day in Kent Town,
Australia. We consider the daily temperature curves corresponding to the time period 2003
- 2007. Temperature curves are observed at equally spaced (half-hour) m = 48 time points
each day. Our goal is to estimate the mean temperature curve and to provide confidence
bands. We used d = 40 Fourier basis functions. The temperature curves contained in this
dataset, together with the confidence bands we obtained, are displayed in Figure 4. The con-
fidence band for the mean temperature curve has a smooth appearance, and indicates a cyclic
behavior for the mean temperature curve, with an expected maximum of approximately 22.3
degrees Celsius, and a minimum of approximately 12.4 degrees Celsius.
References
[1] Yannick Baraud. Confidence balls in Gaussian regression. Annals of Statistics, 32, 528-
551, 2004.
[2] Peter Bickel and Yaacov Ritov. Non and semi-parametric statistics compared and con-
trasted. Journal of Statistical Planning and Inference, 91, 209-228, 2000.
35
0.0 0.2 0.4 0.6 0.8 1.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
(a) Method 2
0.0 0.2 0.4 0.6 0.8 1.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
(b) Method 3
Figure 3: Scenario: AR(1), σ∗ = 1, SNR = 2.2, n = 300, m = 64, d = 30, Equally spacedtj, α = 0.05. The plots show simulated data curves in black, true f in white solid line, andconfidence bands in white dash lines.
36
0 10 20 30 40
1020
3040
Half-hour time points
Tem
pera
ture
(Deg
rees
C)
Figure 4: Daily temperature curves for Kent Town, Australia, for years 2003 - 2007, areshown in black lines. Confidence bands for the mean temperature curve for this time periodare shown in white lines: Method 1 band (pronounced dash), and Method 2 band (fine dash).
37
[3] Lucien Birge and Pascal Massart. An Adaptive Compression Algorithm in Besov Spaces.
Constructive Approximation 16 (1), 1-36, 2000.
[4] Michal Benko, Wolfgang Hardle and Alois Kneip. Common functional principal compo-
nents. Annals of Statistics 37(1), 1–34, 2009.
[5] Rudolph Beran and Lutz Dumbgen. Modulation of estimators and confidence sets. An-
nals of Statistics 26, 1826–1856, 1998.
[6] P. Laurie Davies, Arne Kovac and Monika Meise. Nonparametric regression, confidence
regions and regularisation. Annals of Statistics 37, 2597–2625, 2009.
[7] David Degras. Nonparametric estimation of a trend based upon sampled continuous
processes. C. R. Acad. Sci. Paris Ser. I 347, 191-194, 2009.
[8] David Donoho and Iain Johnstone. Adapting to the unknown smoothness via wavelet
shrinkage. Journal of the American Statistical Association 90, 1200–1224,1995.
[9] David Donoho and Iain Johnstone. Minimax estimation via wavelet shrinkage. Annals
of Statistics 26, 789–921, 1998.
[10] David Donoho, Iain Johnstone, Gerard Kerkyacharian, and Dominique Picard. Density
estimation by wavelet thresholding. Annals of Statistics 24(2), 508–539, 1996.
[11] Christopher Genovese and Larry Wasserman. Adaptive confidence bands. Annals of
Statistics 36(2), 875–905, 2008.
[12] Daniel Gervini. Free-knot spline smoothing for functional data. Journal of the Royal
Statistical Society, Series B 68(4), 671–687, 2006.
[13] Evarist Gine and Richard Nickl. Confidence bands in density estimation. Annals of
Statistics 38(2), 1122–1170, 2010.
[14] Henry Landau and Lawrence Shepp. On the supremum of a Gaussian process. Sankhya,
32, 369–378, 1970.
[15] Mark Low. On nonparametric confidence intervals. The Annals of Statistics, 25 (6),
2547 - 2554, 1997.
38
[16] Pascal Massart. Concentration Inequalities and Model Selection, Ecole d’Ete de Proba-
bilites de Saint-Flour XXXIII - 2003, volume 1896, 2007.
[17] Hans-Georg Muller. Functional modeling and classification of longitudinal data. Scan-
dinavian Journal of Statistics 32, 223–240, 2005.
[18] Dominique Picard and Karine Tribouley. Adaptive confidence interval for pointwise
curve estimation Annals of Statistics 28(1), 298–335, 2000.
[19] R Development Core Team (2009). R: A language and environment for statistical com-
puting. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0,
URL http://www.R-project.org.
[20] James Ramsay and Bernard Silverman. Functional data analysis, 2nd Edition. Springer,
New York, 2005.
[21] James Ramsay and Bernard Silverman. Applied functional data analysis. Springer, New
York, 2002.
[22] John Rice and Bernard Silverman. Estimating the mean and covariance structure non-
parametrically when the data are curves. Journal of the Royal Statistical Society, Series
B 53(1), 233–243, 1991.
[23] Jamie Robins and Aad van der Vaart. Adaptive nonparametric confidence sets. Annals
of Statistics, 34, 229-253, 2006.
[24] David Ruppert, Matthew Wand and Raymond Carroll. Semiparametric regression.
Cambridge University Press, Cambridge 2003.
[25] Burkhart Seifert, Michael Brockmann, Joachim Engel and Theo Gasser. Fast algorithms
for nonparametric curve estimation. Journal of Computational and Graphical Statistics
3(2), 192–213, 1994.
[26] Galen Shorack. Probability for Statisticians. Springer, 2000.
[27] Galen Shorack and Jon Wellner. Empirical Processes with Applications to Statistics,
Wiley, 1986.
39
[28] Alexandre Tsybakov. Introduction to nonparametric estimation. Springer, New York,
2009.
[29] Larry Wasserman. All of nonparametric statistics. Springer, New York, 2006.
[30] Fang Yao. Asymptotic distributions of nonparametric regression estimators for longitu-
dinal and functional data. Journal of Multivariate Analysis 98, 40–56, 2007.
[31] Fang Yao, Hans-Georg Muller and Jane-Ling Wang. Functional data analysis for sparse
longitudinal data. Journal of the American Statistical Association 100(740), 577–590,
2005.
[32] Jin-Ting Zhang and Jianwei Chen. Statistical inferences for functional data. Annals of
Statistics 35(3), 1052–1079, 2007.
40
Table 4: Table entry is ave. width (coverage) over S = 300 simulations. Scenario: Signal 1, AR(1),α = 0.05, Fourier φk basis functions, SNR = 2.2, B = 100, n = 300, m = 64, d = 30.
AR(1) BBprocess process
σ∗ = 10
Equally Sp. tjBands based on asympt. normalitySCB 0.16 (0.78) 0.15 (0.84)