Submitted to Statistica Sinica 1 Efficient Estimation of Partially Linear Models for Data on Complicated Domains by Bivariate Penalized Splines over Triangulations Li Wang 1 , Guannan Wang 2 , Ming-Jun Lai 3 and Lei Gao 1 1 Iowa State University, Ames, IA 50011, USA 2 College of William & Mary, Williamsburg, VA 23185, USA. 3 The University of Georgia, Athens, GA 30602, USA. Abstract: In this paper, we study the estimation of partially linear models for spatial data distributed over complex domains. We use bivariate splines over triangulations to represent the nonparamet- ric component on an irregular two-dimensional domain. The proposed method is formulated as a constrained minimization problem which does not require constructing finite elements or locally supported basis functions. Thus, it allows an easier implementation of piecewise polynomial repre- sentations of various degrees and various smoothness over an arbitrary triangulation. Moreover, the constrained minimization problem is converted into an unconstrained minimization via a QR de- composition of the smoothness constraints, which allows for the development of a fast and efficient penalized least squares algorithm to fit the model. The estimators of the parameters are proved to be asymptotically normal under some regularity conditions. The estimator of the bivariate function is consistent, and its rate of convergence is also established. The proposed method enables us to construct confidence intervals and permits inference for the parameters. The performance of the estimators is evaluated by two simulation examples and by a real data analysis. Key words and phrases: Bivariate splines, Penalty, Semiparametric regression, Spatial data, Trian- gulation. 1 Introduction In many geospatial studies, spatially distributed covariate information is available. For example, geographic information systems may contain measurements obtained from satellite images at some locations. These spatially explicit data can be useful in the construction and estimation
27
Embed
E cient Estimation of Partially Linear Models for …alpha.math.uga.edu/~mjlai/papers/WWL2018.pdfSubmitted to Statistica Sinica 1 E cient Estimation of Partially Linear Models for
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Submitted to Statistica Sinica 1
Efficient Estimation of Partially Linear Models
for Data on Complicated Domains
by Bivariate Penalized Splines over Triangulations
Li Wang1, Guannan Wang2, Ming-Jun Lai3 and Lei Gao1
1Iowa State University, Ames, IA 50011, USA
2College of William & Mary, Williamsburg, VA 23185, USA.
3The University of Georgia, Athens, GA 30602, USA.
Abstract:
In this paper, we study the estimation of partially linear models for spatial data distributed
over complex domains. We use bivariate splines over triangulations to represent the nonparamet-
ric component on an irregular two-dimensional domain. The proposed method is formulated as a
constrained minimization problem which does not require constructing finite elements or locally
supported basis functions. Thus, it allows an easier implementation of piecewise polynomial repre-
sentations of various degrees and various smoothness over an arbitrary triangulation. Moreover, the
constrained minimization problem is converted into an unconstrained minimization via a QR de-
composition of the smoothness constraints, which allows for the development of a fast and efficient
penalized least squares algorithm to fit the model. The estimators of the parameters are proved to
be asymptotically normal under some regularity conditions. The estimator of the bivariate function
is consistent, and its rate of convergence is also established. The proposed method enables us to
construct confidence intervals and permits inference for the parameters. The performance of the
estimators is evaluated by two simulation examples and by a real data analysis.
Key words and phrases: Bivariate splines, Penalty, Semiparametric regression, Spatial data, Trian-
gulation.
1 Introduction
In many geospatial studies, spatially distributed covariate information is available. For example,
geographic information systems may contain measurements obtained from satellite images at
some locations. These spatially explicit data can be useful in the construction and estimation
2 L. Wang, G. Wang, M. J. Lai and L. Gao
of regression models, however, the domain over which variables of interest are defined is often
found to be complicated, such as stream networks, islands and mountains. For example, Figure
1 (a) and (b) show the largest estuary in New Hampshire together with the location of 97 sites
where mercury in sediment concentrations was surveyed in the years 2000, 2001 and 2003; see
Wang and Ranalli (2007). It is well known that many conventional smoothing tools with respect
to the Euclidean distance between observations suffer from the problem of “leakage” across the
complex domains, which refers to the poor estimation over difficult regions by the inappropriate
linking of parts of the domain separated by physical barriers; see excellent discussions in Ramsay
(2002) and Wood et al. (2008). In this paper, we propose to use bivariate splines (smooth
piecewise polynomial functions over a triangulation of the domain of interest) to model spatially
explicit datasets which enable us to overcome the “leakage” problem and provide more accurate
Figure 6: Prediction maps of mercury concentrations over the estuaries in New Hampshire.
24 L. Wang, G. Wang, M. J. Lai and L. Gao
6 Concluding Remarks
In this paper, we have considered PLMs for modeling spatial data with complicated domain
boundaries. We introduce a framework of bivariate penalized splines defined on triangulations in
the semi-parametric estimation. Our BPST method has demonstrated competitive performance
compared to existing methods, while providing a number of possible advantages.
First, the proposed method greatly enhances the application of non/semiparametric meth-
ods to spatial data analysis. It solves the problem of “leakage” across the complex domains
where many conventional smoothing tools suffer from. The numerical results from the simula-
tion studies and application show our method is very effective to account for complex domain
boundaries. Our method does not require the data to be evenly distributed or on regular-spaced
grids like the tensor product smoothing methods. When we have regions of sparse data, bivariate
penalized splines provides a more convenient tool for data fitting than the unpenalized splines
since the roughness penalty helps regularize the estimation. Relative to the conventional FEM,
our method provides a more flexible way to use piecewise polynomials of various degrees and
various smoothness over an arbitrary triangulation for spatial data analysis.
Secondly, we provide new statistical theories for estimating the PLM for data distributed
on complex spatial domains. It is shown that our estimates of both parametric part and non-
parametric part of the model enjoy excellent asymptotic properties. In particular, we have
shown that our estimates of the coefficients in the parametric part are asymptotically normal
and derived the convergence rate of the nonparametric component under regularity conditions.
We have also provided a standard error formula for the estimated parameters and our simulation
studies show that the standard errors are estimated with good accuracy. The theoretical results
provide measures of the effect of covariates after adjusting for the location effect. In addition,
they give valuable insights into the accuracy of our estimate of the PLM and permit joint
inference for the parameters.
Finally, our proposed method is much more computationally efficient compared with other
Bivariate Spline Smoothing on Complex Domains 25
approaches such as kriging and GLTPS. Specifically, for model fitting with n locations, the
computational complexity of the ordinary kriging and GLTPS is O(n3), while the computational
complexity of our method is onlyO(nN2), whereN is the number of triangles in the triangulation
and is usually much smaller than n as suggested in Condition (C4).
Supplementary Materials
The online supplement Wang et al. (2018) contains the details of how to implement the pro-
posed methods, additional simulation and application results, the proofs of Lemma 1, Theorems
1 and 2.
Acknowledgment
The first author’s research was supported in part by National Science Foundation grants
DMS-1106816 and DMS-1542332, the second author’s research was supported in part by College
of William & Mary Faculty Summer Research Grant and the third author’s research was sup-
ported in part by National Science Foundation grant DMS-1521537 and Simons collaboration
grant #280646. The authors would like to thank Haonan Wang and M. Giovanna Ranalli for
providing the New Hampshire estuary data. This paper has not been formally reviewed by the
EPA. The views expressed here are solely those of the authors. The EPA does not endorse any
products or commercial services mentioned in this report. Finally, the authors would like to
thank the editor, the associate editor and reviewers for their valuable comments and suggestions
to improve the quality of the paper.
Bibliography
Abbott, M. L., Lin, C.-J., Martian, P., and Einerson, J. J. (2008), “Atmospheric mercury nearSalmon falls creek reservoir in southern Idaho,” Applied Geochemistry, 23, 438–453.
Awanou, G., Lai, M. J., and Wenston, P. (2005), “The multivariate spline method for scattereddata fitting and numerical solutions of partial differential equations,” Wavelets and splines:Athens 2005, 24–74.
Brown, L. E., Chen, C. Y., Voytek, M. A., and Amirbahman, A. (2015), “The effect of sediment
26 L. Wang, G. Wang, M. J. Lai and L. Gao
mixing on mercury dynamics in two intertidal mudflats at Great Bay Estuary, New Hampshire,USA,” Marine chemistry, 177, 731–741.
Chen, R., Liang, H., and Wang, J. (2011), “On determination of linear components in additivemodels,” Journal of Nonparametric Statistics, 23, 367–383.
Eilers, P. (2006), P-spline smoothing on difficult domains, [online] Available athttp://www.statistik.lmu.de/sfb386/workshop/smcs2006/slides/eilers.pdf.
Furrer, R., Nychka, D., and Sainand, S. (2011), Package ‘fields’. R package version 6.6.1.,[online] Available at http://cran.r-project.org/web/packages/fields/fields.pdf.
Green, P. J. and Silverman, B. W. (1993), Nonparametric regression and generalized linearmodels: a roughness penalty approach, CRC Press.
— (1994), Nonparametric regression and generalized linear models, Chapman and Hall, London.
Hardle, W., Liang, H., and Gao, J. T. (2000), Partially linear models, Heidelberg: SpringerPhysica-Verlag.
He, X. and Shi, P. (1996), “Bivariate tensor-product B-splines in a partly linear model,” Journalof Multivariate Analysis, 58, 162–181.
Huang, J. (2003), “Asymptotics for polynomial spline regression under weak conditions,” Statis-tics & Probability Letters, 65, 207–216.
Huang, J. Z., Zhang, L., and Zhou, L. (2007), “Efficient estimation in marginal partially linearmodels for longitudinal/clustered data using splines,” Scandinavian Journal of Statistics, 34,451–477.
Lai, M. J. (2008), “Multivariate splines for data fitting and approximation,” Conference Pro-ceedings of the 12th Approximation Theory,, 210–228.
Lai, M. J. and Schumaker, L. L. (1998), “Approximation power of bivariate splines,” Advancesin Computational Mathematics, 9, 251–279.
— (2007), Spline functions on triangulations, Cambridge University Press.
Lai, M. J. and Wang, L. (2013), “Bivariate penalized splines for regression,” Statistica Sinica,23, 1399–1417.
Li, Y. and Ruppert, D. (2008), “On the asymptotics of penalized splines,” Biometrika, 95,415–436.
Liang, H., Hardle, W., and Carroll, R. J. (1999), “Estimation in a semiparametric partiallylinear errors-in-variables model,” The Annals of Statistics, 27, 1519–1535.
Liang, H. and Li, R. (2009), “Variable selection for partially linear models with measurementerrors,” Journal of the American Statistical Association, 104, 234–248.
Liu, X., Wang, L., and Liang, H. (2011), “Estimation and variable selection for semiparametricadditive partial linear models,” Statistica Sinica, 21, 1225–1248.
Ma, S., Song, Q., and Wang, L. (2013), “Simultaneous variable selection and estimation insemiparametric modeling of longitudinal/clustered data,” Bernoulli, 19, 252–274.
Bivariate Spline Smoothing on Complex Domains 27
Ma, Y., Chiou, J.-M., and Wang, N. (2006), “Efficient semiparametric estimator for het-eroscedastic partially linear models,” Biometrika, 93, 75–84.
Mammen, E. and van de Geer, S. (1997), “Penalized quasi-likelihood estimation in partial linearmodels,” The Annals of Statistics, 1014–1035.
Marx, B. and Eilers, P. (2005), “Multidimensional penalized signal regression,” Technometrics,47, 13–22.
Miller, D. L. and Wood, S. N. (2014), “Finite area smoothing with generalized distance splines,”Environmental and ecological statistics, 21, 715–731.
Ramsay, T. (2002), “Spline smoothing over difficult regions,” Journal of the Royal StatisticalSociety, Series B, 64, 307–319.
Sangalli, L., Ramsay, J., and Ramsay, T. (2013), “Spatial spline regression models,” Journal ofthe Royal Statistical Society, Series B, 75, 681–703.
Speckman, P. (1988), “Kernel smoothing in partial linear models,” Journal of the Royal Statis-tical Society. Series B (Methodological), 413–436.
von Golitschek, M. and Schumaker, L. L. (2002), “Bounds on projections onto bivariate poly-nomial spline spaces with stable local bases,” Constructive approximation, 18, 241–254.
Wang, H. and Ranalli, M. G. (2007), “Low-rank smoothing splines on complicated domains,”Biometrics, 63, 209–217.
Wang, L., Liu, X., Liang, H., and Carroll, R. (2011), “Estimation and variable selection forgeneralized additive partial linear models,” Annals of Statistics, 39, 1827–1851.
Wang, L., Wang, G., Lai, M., and Gao, L. (2018), “Efficient estimation of partially linear modelsfor data on complicated domains via bivariate penalized splines over triangulations,” StatisticaSinica, Supplementary Materials.
Wang, L., Xue, L., Qu, A., and Liang, H. (2014), “Estimation and model selection in generalizedadditive partial linear models for correlated data with diverging number of covariates,” Annalsof Statistics, 42, 592–624.
Wood, S. N. (2003), “Thin plate regression splines,” Journal of the Royal Statistical Society,Series B, 65, 95–114.
Wood, S. N., Bravington, M. V., and Hedley, S. L. (2008), “Soap film smoothing,” Journal ofthe Royal Statistical Society, Series B, 70, 931–955.
Xiao, L., Li, Y., and Ruppert, D. (2013), “Fast bivariate P-splines: the sandwich smoother,”Journal of the Royal Statistical Society, Series B, 75, 577–599.
Zhang, H., Cheng, G., and Liu, Y. (2011), “Linear or nonlinear? Automatic structure discoveryfor partially linear models,” Journal of American Statistical Association, 106, 1099–1112.
Zhou, L. and Pan, H. (2014), “Smoothing noisy data for irregular regions using penalized bi-variate splines on triangulations,” Computational Statistics, 29, 263–281.