Re-sampling techniques for estimating the distribution of descriptive statistics of functional data Han Lin Shang 1, ESRC Centre for Population Change, University of Southampton, Southampton, SO17 1BJ, United Kingdom Abstract Re-sampling methods for estimating the distribution of descriptive statistics of functional data are considered. Through Monte-Carlo simulations, we compare the performance of several re-sampling methods commonly used for estimating the distribution of descriptive statistics. We introduce two re-sampling methods that rely on functional principal component analysis, where the scores were randomly drawn from multivariate normal distribution and Stiefel manifold. Illustrated by one-dimensional Canadian weather station data and two-dimensional bone shape data, the re-sampling methods provide a way of visualizing the distribution of descriptive statistics for functional data. Keywords: bootstrap validity, functional mean, trimmed functional mean, functional median, functional variance, smoothed bootstrap 1. Introduction Recent computer technology facilitates the presence of functional data, whose graphical representations are in the form of curve (Hyndman and Shang, 2010), image (Locantore et al., 1999), or shape (Epifanio and Ventura-Campos, 2011). The monographs by Ramsay and Silverman (2002, 2005) have greatly popularized the functional data analysis (FDA), offering a number of appealing case studies and presenting many parametric statistical methods. The book by Ferraty and Vieu (2006) is an excellent reference on the introduction Email address: [email protected](Han Lin Shang) 1 Tel: +44 (0) 2380 595 796; Fax: +44 (0) 2380 593 858 Preprint submitted to Communication in Statistics–Simulation and Computation March 4, 2013
33
Embed
Re-sampling techniques for estimating the distribution of descriptive ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Re-sampling techniques for estimating the distribution of
descriptive statistics of functional data
Han Lin Shang1,
ESRC Centre for Population Change, University of Southampton,Southampton, SO17 1BJ, United Kingdom
Abstract
Re-sampling methods for estimating the distribution of descriptive statistics of functional data
are considered. Through Monte-Carlo simulations, we compare the performance of several
re-sampling methods commonly used for estimating the distribution of descriptive statistics.
We introduce two re-sampling methods that rely on functional principal component analysis,
where the scores were randomly drawn from multivariate normal distribution and Stiefel
manifold. Illustrated by one-dimensional Canadian weather station data and two-dimensional
bone shape data, the re-sampling methods provide a way of visualizing the distribution of
Davidson, R. and MacKinnon, J. G. (2007). Improving the reliability of bootstrap tests with
the fast double bootstrap. Computational Statistics and Data Analysis, 51(7):3259–3281.
Delaigle, A. and Hall, P. (2012). Methodology and theory for partial least squares applied to
functional data. The Annals of Statistics, 40(1):322–352.
Delaigle, A., Hall, P., and Bathia, N. (2012). Componentwise classification and clustering of
functional data. Biometrika, 99(2):299–313.
Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics,
7(1):1–26.
Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall,
New York.
Epifanio, I. and Ventura-Campos, N. (2011). Functional data analysis in shape analysis.
Computational Statistics and Data Analysis, 55(9):2758–2773.
Febrero, M., Galeano, P., and Gonzalez-Manteiga, W. (2007). A functional analysis of
NOx levels: location and scale estimation and outlier detection. Computational Statistics,
22(3):411–427.
Febrero, M., Galeano, P., and Gonzalez-Manteiga, W. (2008). Outlier detection in functional
data by depth measures, with application to identify abnormal NOx levels. Environmetrics,
19(4):331–345.
Ferraty, F. and Romain, Y., editors (2011). The Oxford Handbook of Functional Data Analysis.
Oxford University Press, Oxford.
Ferraty, F., Van Keilegom, I., and Vieu, P. (2010). On the validity of the bootstrap in
non-parametric functional regression. Scandinavian Journal of Statistics, 37(2):286–306.
Ferraty, F., Van Keilegom, I., and Vieu, P. (2012). Regression when both response and
predictor are functions. Journal of Multivariate Analysis, 109:10–28.
20
Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Theory and
Practice. Springer, New York.
Ferraty, F. and Vieu, P. (2011). Kernel regression estimation for functional data. In Ferraty,
F. and Romain, Y., editors, The Oxford Handbook of Functional Data Analysis. Oxford
University Press, Oxford.
Fraiman, R. and Muniz, G. (2001). Trimmed mean for functional data. Test, 10(2):419–440.
Gervini, D. (2012). Outlier detection and trimmed estimation for general functional data.
Statistica Sinica, 22(4):1639–1660.
Gine, E. and Zinn, J. (1990). Bootstrapping general empirical measures. The Annals of
Probability, 18(2):851–869.
Gonzalez-Manteiga, W., Gonzalez-Rodrıguez, G., Martınez-Calvo, A., and Garcıa-Portugues,
E. (2012). Bootstrap independence test for functional linear models. Working paper,
University of Santiago de Compostella. http://arxiv.org/pdf/1210.1072v2.pdf.
Gonzalez-Manteiga, W. and Martınez-Calvo, A. (2011). Bootstrap in functional linear
regression. Journal of Statistical Planning and Inference, 141(1):453–461.
Hall, P. (1986). On the bootstrap and confidence intervals. The Annals of Statistics,
14(4):1431–1452.
Hall, P. (2011). Principal component analysis for functional data: methodology, theory, and
discussion. In Ferraty, F. and Romain, Y., editors, The Oxford Handbook of Functional
Data Analysis. Oxford University Press, Oxford.
Hall, P. and Horowitz, J. (2007). Methodology and convergence rates for functional linear
regression. The Annals of Statistics, 35(1):70–91.
Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components
analysis. Journal of the Royal Statistical Society: Series B, 68(1):109–126.
21
Hall, P. and Hosseini-Nasab, M. (2009). Theory for high-order bounds in functional principal
components analysis. Mathematical Proceedings of the Cambridge Philosophical Society,
146(1):225–256.
Hall, P., Lee, Y. K., Park, B. U., and Paul, D. (2009). Tie-respecting bootstrap methods for
estimating distributions of sets and functions of eigenvalues. Bernoulli, 15(2):380–401.
Hall, P., Poskitt, D. S., and Presnell, B. (2001). A functional data-analytic approach to signal
discrimination. Technometrics, 43(1):1–9.
Hall, P. and Vial, C. (2006). Assessing the finite dimensionality of functional data. Journal
of the Royal Statistical Society (Series B), 68(4):689–705.
Hall, P. G. (1992). The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York.
Hartigan, J. A. (1969). Using subsample values as typical values. Journal of the American
Statistical Association, 64(328):1303–1317.
Hoff, P. D. (2009). Simulation of the matrix Bingham-von Mises-Fisher distribution, with
applications to multivariate and relational data. Journal of Computational and Graphical
Statistics, 18(2):438–456.
Hyndman, R. J. and Shang, H. L. (2009). Forecasting functional time series (with discussions).
Journal of the Korean Statistical Society, 38(3):199–211.
Hyndman, R. J. and Shang, H. L. (2010). Rainbow plots, bagplots, and boxplots for functional
data. Journal of Computational and Graphical Statistics, 19(1):29–45.
James, G. M., Wang, J., and Zhu, J. (2009). Functional linear regression that’s interpretable.
The Annals of Statistics, 37(5A):2083–2108.
James, I. M. (1976). The Topology of Stiefel Manifolds. Number 24 in Lecture Note Series.
Cambridge University Press, London Mathematical Society.
22
Kargin, V. and Onatski, A. (2008). Curve forecasting by functional autoregression. Journal
of Multivariate Analysis, 99(10):2508–2526.
Kreiss, J. and Paparoditis, E. (2011). Bootstrap methods for dependent data: a review (with
discussion). Journal of the Korean Statistical Society, 40(4):357–395.
Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., and Cohen, K. (1999). Robust
principal component analysis for functional data. Test, 8(1):1–73.
Lopez-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data. Journal
of the American Statistical Association, 104(486):718–734.
Mahalanobis, P. C. (1946). On large-scale sample surveys. Philosophical Transactions of the
Royal Society of London, Series B, 231(584):329–451.
Mas, A. and Pumo, B. (2011). Linear processes for functional data. In Ferraty, F. and Romain,
Y., editors, The Oxford Handbook of Functional Data Analysis. Oxford University Press,
Oxford.
McMurry, T. and Politis, D. N. (2011). Resampling methods for functional data. In Ferraty,
F. and Romain, Y., editors, The Oxford Handbook of Functional Data Analysis. Oxford
University Press, Oxford.
Muller, H. and Yao, F. (2008). Functional additive models. Journal of the American Statistical
Association, 103(484):1534–1544.
Politis, D. (2003). The impact of bootstrap methods on time series analysis. Statistical
Science, 18(2):219–230.
Politis, D. and Romano, J. (1994). Limit theorems for weakly dependent Hilbert space
valued random variables with application to the stationary bootstrap. Statistica Sinica,
4(2):461–476.
23
Poskitt, D. S. and Sengarapillai, A. (2013). Description length and dimensionality reduction
in functional data analysis. Computational Statistics and Data Analysis, 58:98–113.
Preda, C. and Saporta, G. (2005). Clusterwise PLS regression on a stochastic process.
Computational Statistics & Data Analysis, 49(1):99–108.
R Development Core Team (2012). R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.
Ramsay, J. O., Hooker, G., and Graves, S. (2009). Functional Data Analysis with R and
MATLAB. Springer, New York.
Ramsay, J. O. and Silverman, B. W. (2002). Applied Functional Data Analysis: Methods and
Case Studies. Springer, New York.
Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer, New York,
2nd edition.
Ramsay, J. O., Wickham, H., Graves, S., and Hooker, G. (2012). fda: Functional Data
Analysis. R package version 2.3.2. http://CRAN.R-project.org/package=fda.
Reiss, P. and Ogden, T. (2007). Functional principal component regression and functional
partial least squares. Journal of the American Statistical Association, 102(479):984–996.
Simon, J. (1993). Resampling: the New Statistics. Duxbury, Belmont, CA.
Tarpey, T. and Kinateder, K. K. J. (2003). Clustering functional data. Journal of Classification,
20(1):93–114.
Yao, F., Muller, H.-G., and Wang, J.-L. (2005). Functional data analysis for sparse longitudinal
data. Journal of the American Statistical Association, 100(470):577–590.
24
Table 1: Abbreviations of the bootstrap methods. St and Sm are described in Section 2.2, while therest are described in Section 2.4.
St = Standard function bootstrapSm = Smoothed function bootstrapStU = Standard score bootstrapSmU = Smoothed score bootstrapStGU = Standard Gaussian-distributed score bootstrapStiefelU = Random Stiefel manifold score bootstrap
25
Table 2: Coverage probabilities for the bootstrapped CIs based on B = 150 repetitions and 200replications, at the nominal coverage probability of 95%.
Estimator and Model (a) Model (b)bootstrap Distance Distance
Figure 1: Histograms of the widths for the CIs with the L∞ metric, based on the sample mean(left) and the 5%-trimmed mean (right).
29
0 100 200 300
−30
−20
−10
010
20
Day
Cel
sius
Figure 2: Averaged daily temperatures from 1960 to 1994 observed at 35 Canadian weather stations.Note that each curve represents the averaged daily temperatures at a weather station, not at aparticular year.
30
0 100 200 300
−30
−20
−10
010
20
Mean
Day
Cel
sius
0 100 200 300
−30
−20
−10
010
20
Median
Day
Cel
sius
0 100 200 300
−30
−20
−10
010
20
5% Trimmed mean
Day
Cel
sius
0 100 200 300
050
100
150
Variance
Day
Cel
sius
Figure 3: 95% CIs of the descriptive statistics for the Canadian weather station data, based on thesmoothed bootstrap principal component score method with smoothing parameter β = 0.05 andB = 150 repetitions. While the sample estimates are shown in red, their 95% CIs are shown in blue.
31
−200 −100 0 100 200
−20
0−
100
010
020
0
x−coordinate
y−co
ordi
nate
(a) 16 eburnated femora.
−200 −100 0 100 200
−20
0−
100
010
020
0
x−coordinate
y−co
ordi
nate
(b) 52 non-eburnated femora.
Figure 4: Raw data for the 68 bone shapes.
32
−200 −100 0 100 200
−20
0−
100
010
020
0
x−coordinate
y−co
ordi
nate
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
(a) 95% CI for the mean shape of eburnatedbones.
−200 −100 0 100 200
−20
0−
100
010
020
0
x−coordinate
y−co
ordi
nate
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
(b) 95% CI for the mean shape of non-eburnated bones.
−200 −100 0 100 200
−20
0−
100
010
020
0
x−coordinate
y−co
ordi
nate
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
(c) 95% CI for the median shape of eburnatedbones.
−200 −100 0 100 200
−20
0−
100
010
020
0
x−coordinate
y−co
ordi
nate
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
(d) 95% CI for the median shape of non-eburnated bones.
Figure 5: 95% CIs of the descriptive statistics for the bone shape data, based on the standardbootstrap principal component score method. The mean or median shape is shown in red, while the95% pointwise CI of the mean or median shape based on B = 150 repetitions is shown by the blackcircles. The variability associated with the mean shape is smaller than the variability associatedwith the median shape.