OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS Xinyu Zhang 1 and Wendun Wang 2 1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences 2 Econometric Institute, Erasmus University Rotterdam, and Tinbergen Institute Abstract: This article studies optimal model averaging for partially linear models with het- eroscedasticity. A Mallows-type criterion is proposed to choose the weight. The resulting model averaging estimator is proved to be asymptotically optimal under some regularity conditions. Simulation experiments show that the proposed model averaging method is superior to other com- monly used model selection and averaging methods. The proposed procedure is further applied to study Japan’s sovereign credit default swap spreads. Key words and phrases: Asymptotic optimality, Heteroscedasticity, Model averaging, Partially linear model 1. Introduction Linear regression models have been predominantly popular in a variety of applications, including biology, economics, psychology, and machine learning. One important reason may be its simplicity and the clear interpretation of the estimation results. However, an increasing number of studies have noted that the relationship between the response variable and covari- ates is not always linear. To list a few examples, Barro (1996) found that democracy can in- fluence economic development in a nonlinear pattern. Henderson et al. (2012) and Su & Lu (2013) found a nonlinear effect of initial state on the economic growth rate. Liang et al.
35
Embed
OPTIMAL MODEL AVERAGING ESTIMATION FOR ......OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS (2007), in a study on the effectiveness of antiretroviral medicines, showed
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
OPTIMAL MODEL AVERAGING ESTIMATION FOR
PARTIALLY LINEAR MODELS
Xinyu Zhang1 and Wendun Wang2
1Academy of Mathematics and Systems Science, Chinese Academy of Sciences
2Econometric Institute, Erasmus University Rotterdam, andTinbergen Institute
Abstract: This article studies optimal model averaging for partiallylinear models with het-
eroscedasticity. A Mallows-type criterion is proposed to choose the weight. The resulting model
averaging estimator is proved to be asymptotically optimalunder some regularity conditions.
Simulation experiments show that the proposed model averaging method is superior to other com-
monly used model selection and averaging methods. The proposed procedure is further applied
to study Japan’s sovereign credit default swap spreads.
Key words and phrases:Asymptotic optimality, Heteroscedasticity, Model averaging, Partially
linear model
1. Introduction
Linear regression models have been predominantly popular in a variety of applications,
including biology, economics, psychology, and machine learning. One important reason may
be its simplicity and the clear interpretation of the estimation results. However, an increasing
number of studies have noted that the relationship between the response variable and covari-
ates is not always linear. To list a few examples, Barro (1996) found that democracy can in-
fluence economic development in a nonlinear pattern. Henderson et al. (2012) and Su & Lu
(2013) found a nonlinear effect of initial state on the economic growth rate. Liang et al.
OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS
(2007), in a study on the effectiveness of antiretroviral medicines, showed that the HIV vi-
ral load depends nonlinearly on treatment time. Ignoring nonlinearity can result in incorrect
estimates and inferences, further resulting in misleadingexplanations and bad decisions. For
example, ignoring the nonlinear effect of global stock markets on the local market may lead
to a lack of awareness of financial contagion; Simply estimating a linear relationship between
inflation and economic growth may lead to inappropriate inflation-targeting policies.
To avoid potential ignorance of nonlinearity, partially linear models (PLMs) have re-
ceived extensive attention in theoretical and applied statistics due to their flexible specifica-
tion. It allows for both linear and nonparametric relationsbetween covariates and the re-
sponse variable. This type of specification is also frequently used when the primary interest
is in the linear component, whereas the relation between themean response and additional
covariates is not easily parameterized. The superiority ofthe partially linear model over the
standard linear models is that it does not require the parametric assumption for all covariates
and allows us to capture potential nonlinear effects. This model is sometimes preferred over
the fully nonparametric models since it preserves the advantages of linear models, e.g., an
easy interpretation of the linear covariates, and suffering less from the dimensionality curse.
PLMs are used in a wide range of applications in the literature; see, for example, Engle et al.
(1986) for an economic application and Liang et al. (2007) for a medical application.
Various methods have been proposed to estimate PLMs, for example, smoothing splines
this consideration may cause a dimensionality problem by including too many determinants
in the nonlinear component. Thus, we assign determinants tothe nonlinear component only
when necessary. Based on the PLM analysis in the previous subsection, it seems reasonable
to presume a linear relationship between Japan’s CDS spreads and the global default risk pre-
mium and the domestic stock market return and its volatility. It is also clear that the foreign
exchange rate and global stock returns have a nonlinear impact on Japan’s CDS spread; thus
it is necessary to include these two determinants in the nonlinear component when they are
included in the model. As for the US treasury yield, since itseffect only exhibits a moder-
ate degree of nonlinearity and the formal linearity test is not informative, we are less certain
whether to assign this variable to the linear or nonlinear component. Allowing this ambigu-
ous determinant to enter the nonlinear component leads to a more complete model space but
may also result in the dimensionality curse. There is no apriori knowledge of how to make
OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS
Table 4: Mean square prediction error of Japan’s CDS spreads
Prediction sample MAPLM SAIC SBIC AIC BIC
Scenario I 5% 0.8608 0.9360 0.9278 0.9403 0.9253
10% 0.8490 1.0162 1.0181 1.0256 1.0190
15% 0.9708 1.0950 1.0830 1.1007 1.1106
20% 0.9927 1.0933 1.1111 1.0751 1.1107
Scenario II 5% 0.8865 0.9723 0.9264 0.9673 0.9253
10% 0.7903 0.9410 1.0175 0.9308 1.0190
15% 0.8119 0.9814 0.8542 0.96520.7770
20% 0.8697 0.9695 1.1073 0.9530 1.1107
an appropriate tradeoff between a more complete model spaceand the dimensionality curse.
Therefore, we compare the prediction performance of six methods in two scenarios. In Sce-
nario I, we allow only the foreign exchange rate and global stock return to be in the nonlinear
component. In other words, the foreign exchange rate and global stock return can either
not be included in the model or be in the nonlinear component of the model. The remain-
ing determinants are either not in the model or in the linear component. Scenario II differs
from Scenario I in that we also allow the US treasury yield to enter the nonlinear component.
Hence, there are three possibilities for the uncertain determinant of the US treasury yield:
not included in the model, included in the linear component,or included in the nonlinear
component. We split the sample into two sub-samples, one forestimation and the other for
prediction and evaluation. We consider the estimation sample varying from 80% to 95% of
the whole period; thus the prediction sample ranging from 20% to 5% correspondingly.
Table 4 presents the mean square prediction error (MSPE) of five PLM methods. All
OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS
values are normalized by dividing by the MSPE of the linear model averaging method. We
see that our MAPLM produces the lowest MSPE for all prediction samples in Scenario I.
In Scenario II, MAPLM is the best in most cases, except when the prediction sample is
15%. In all cases, MAPLM outperforms the linear Mallows averaging, demonstrating that
incorporating the necessary nonlinearity improves the prediction performance. Since the
performance of linear model averaging is invariant to the scenario, we can also compare the
predictability of MAPLM in the two scenarios. Interestingly, we observe that allowing the
US treasury yield to enter the nonlinear component improvesthe prediction performance for
all methods when the prediction sample is larger than 5%. However, when we have a small
prediction sample, a smaller model space is better. One possible explanation is that averaging
over a larger model space may offset the additional noise by better diversification. When the
prediction sample is large, the diversification gain from averaging over a larger model space is
substantial and dominates the estimation inaccuracy due tothe dimensionality curse. This is,
however, not the case when the prediction sample is small (or, equivalently, when the training
sample is large) because the predicted values obtained fromdifferent candidate models are
more accurate and more similar to each other; thus, the diversification gain is smaller.
5. Concluding remarks
Partially linear models have become popular in applied econometrics and statistics be-
cause they allow a more flexible specification compared to linear models and provide more
interpretable estimates compared to fully nonparametric models. Estimation of partially lin-
ear models is subjected to at least two types of uncertainty:the uncertainty of which variables
to include in the model and the uncertainty of whether a covariate should be in the linear or
nonlinear component given that it is in the model. Typical model testing and selection meth-
OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS
ods do not appropriately address these two types of uncertainty simultaneously, especially
when the research interest is to estimate the parameters or to make predictions. In this paper,
we propose an optimal model averaging procedure for PLMs that jointly incorporates the
two types of model uncertainty. The extension from linear model averaging to partially lin-
ear models is by no means straightforward and routine because it involves kernel smoothing,
which complicates the proof of optimality. We demonstrate the advantages of our methods by
examining the determinants of Japan’s sovereign CDS spreads. Our empirical study suggests
that there exists a large degree of nonlinearity in the effects of macroeconomic determinants,
such as the global stock return and exchange rate. Conventional linear models do not capture
such nonlinearity, and ignoring the nonlinearity can result in a lack of awareness of financial
contagion, which may further lead to inappropriate policies and investment decisions.
At least three issues deserve future research. First, the computational burden of our
method would be substantial when the number of candidate models is large; therefore, a
model screening step prior to model averaging is desirable.Second, although the dimension
ps is allowed to increase with the sample sizen, it must be smaller thann and its increasing
rate is restricted by the second part of Condition 6. How to develop an optimal model averag-
ing method for high- or ultrahigh-dimensional PLMs is an interesting open question. Finally,
if the research interest is to consistently estimate the linear and/or nonlinear component rather
than to make predictions, a consistent model averaging estimator and post-model-averaging
inference are desired. See, for example, Hjort & Claeskens (2003), Zhang & Liang (2011)
and Xu et al. (2014). In these studies, a crucial assumption of local misspecification is re-
quired, and the weights also need to have an explicit form. Bycontrast, we do not utilize
the local misspecification framework, and our weight estimates do not have an explicit form.
OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS
Therefore, the development of model averaging estimators for the linear and nonlinear com-
ponents without local misspecification and analytical weights warrants further investigation.
AcknowledgmentsThe authors are grateful to Co-Editor Zhiliang Ying, the Associate Editor
and two referees for their constructive comments, and to Dr.Na Li for providing codes for
nonlinearity test. Zhang’s research was supported by National Natural Science Foundation of
China (Grant nos. 71522004 and 11471324) and a grant from theMinistry of Education of
China (Grant no. 17YJC910011).
Online SupplementOnlineSupp.pdf describes the technical proofs and providemore expla-
nations on the conditions as well as additional simulation studies.
References
ANDO, T. & L I , K.-C. (2014). A model-averaging approach for high-dimensional regression.Journal of theAmerican Statistical Association109, 254–265.
ANDREWS, D. (1991). Asymptotic optimality of generalizedCL, cross-validation, and generalized cross-validation in regression with heteroskedastic errors.Journal of Econometrics47, 359–377.
BAE, K.-H., KAROLYI G. A. & STULZ , R. M. (1996). A new approach to measuring financial contagion. TheReview of Financial Studies16, 717–763.
BARRO, R. J. (1996). Democracy and growth.Journal of Economic Growth1, 1–27.
BUCKLAND , S. T., BURNHAM , K. P. & AUGUSTIN, N. H. (1997). Model selection: An integral part ofinference.Biometrics53, 603–618.
BUNEA, F. (2004). Consistent covariate selection and post model selection inference in semiparametric regres-sion. The Annals of Statistics32, 898–927.
DANILOV, D. & M AGNUS, J. R. (2004). On the harm that ignoring pretesting can cause. Journal of Econo-metrics122, 27–46.
DETTE, H. & M UNK , A. (1998). Validation of linear regression models.The Annals of Statistics26, 778–800.
DIECKMANN , S. & PLANK , T. (2012). Default risk of advanced economies: An empirical analysis of creditdefault swaps during the financial crisis.Review of Finance16, 903–934.
EICHENGREEN, B., ROSE, A. & W YPLOSZ, C. (1996). Contagious currency crises.Scandinavian Journal ofEconomics98, 463–484.
ENGLE, R. F., GRANGER, C. W., RICE, J. & WEISS, A. (1986). Semiparametric estimates of the relationbetween weather and electricity sales.Journal of the American Statistical Association81, 310–320.
HAMILTON , S. A. & TRUONG, Y. K. (1997). Local linear estimation in partly linear models. Journal ofMultivariate Analysis60, 1–19.
HANSEN, B. E. (2007). Least squares model averaging.Econometrica75, 1175–1189.
HANSEN, B. E. (2014). Model averaging, asymptotic risk, and regressor groups. Quantitative Economics5,495–530.
OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS
HANSEN, B. E. & RACINE, J. (2012). Jackknife model averaging.Journal of Econometrics167, 38–46.
HARDLE, W., LIANG , H. & GAO, J. (2000).Partially linear models. Springer.
HARDY, G. H., LITTLEWOOD, J. E. & POLYA , G. (1952).Inequalities. Cambridge university press.
HECKMAN , N. E. (1986). Spline smoothing in a partly linear model.Journal of the Royal Statistical Society.Series B (Methodological)48, 244–248.
HENDERSON, D. J., PAPAGEORGIOU, C. & PARMETER, C. F. (2012). Growth empirics without parameters.The Economic Journal122, 125–154.
HJORT, N. L. & CLAESKENS, G. (2003). Frequentist model average estimators.Journal of the AmericanStatistical Association98, 879–899.
HOETING, J. A., MADIGAN , D., RAFTERY, A. E. & VOLINSKY, C. T. (1999). Bayesian model averaging: Atutorial. Statistical Science14, 382–417.
L I , N., XU, X. & JIN , P. (2010). Testing the linearity in partially linear models. Journal of NonparametricStatistics23, 99–114.
L IANG , H., WANG, S. & CARROLL, R. J. (2007). Partially linear models with missing response variables anderror-prone covariates.Biometrika94, 185–198.
L IANG , H., ZOU, G., WAN , A. T. K. & Z HANG, X. (2011). Optimal weight choice for frequentist modelaverage estimators.Journal of the American Statistical Association106, 1053–1066.
L IU , Q. & OKUI , R. (2013). Heteroskedasticity-robustCp model averaging.The Econometrics Journal16,463–472.
LONGFORD, N. T. (2005). Editorial: Model selection and efficiency—is‘which model ...?’ the right question?Journal of the Royal Statistical Society. Series A (Statistics in Society)168, 469–472.
LONGSTAFF, F. A., PAN , J., PEDERSEN, L. H. & SINGLETON, K. J. (2011). How sovereign is sovereigncredit risk?American Economic Journal: Macroeconomics3, 75–103.
LU, X. & SU, L. (2015). Jackknife model averaging for quantile regressions. Journal of Econometrics188,40–58.
MAGNUS, J. R., WANG, W. & ZHANG, X. (2016). Weighted average least square prediction.EconometricReviews35, 1040–1074.
NI , X., ZHANG, H. H. & ZHANG, D. (2009). Automatic model selection for partially linearmodels.Journalof the American Statistical Association100, 2100–2111.
QIAN , Z., WANG, W. & JI , K. (2017). Sovereign credit risk, macroeconomic dynamics, and financial conta-gion: Evidence from Japan.Macroeconomic Dynamics, forthcoming.
ROBINSON, P. M. (1988). Root-n-consistent semiparametric regression. Econometrica56, 931–954.
RUPPERT, D., WAND , M. P. & CARROLL, R. J. (2003).Semiparametric Regression. Cambridge, New York:Cambridge University Press.
SPECKMAN, P. (1988). Kernel smoothing in partial linear models.Journal of the Royal Statistical Society.Series B (Methodological)50, 413–436.
SU, L. & L U, X. (2013). Nonparametric dynamic panel data models: Kernel estimation and specificationtesting.Journal of Econometrics176, 112–133.
WAN , A. T. K., ZHANG, X. & Z OU, G. (2010). Least squares model averaging by Mallows criterion. Journalof Econometrics156, 277–283.
WHITTLE , P. (1960). Bounds for the moments of linear and quadratic forms in independent variables.Theoryof Probability & Its Applications5, 302–305.
X IE, H. & HUANG, J. (2009). SCAD-penalized regression in high-dimensional partially linear models.TheAnnals of Statistics37, 673–696.
OPTIMAL MODEL AVERAGING ESTIMATION FOR PARTIALLY LINEAR MODELS
XU, G., WANG, S. & HUANG, J. (2014). Focused information criterion and model averaging based on weightedcomposite quantile regression.Scandinavian Journal of Statistics41, 365–381.
XU, X. & L I , G. (2006). Fiducial inference in the pivotal family of distributions. Science in China: Series A49, 410–432.
YUAN , Z. & YANG, Y. (2005). Combining linear regression models: When and how? Journal of the AmericanStatistical Association100, 1202–1214.
ZHANG, H. H., CHENG, G. & L IU , Y. (2011). Linear or nonlinear? Automatic structure discovery for partiallylinear models.Journal of the American Statistical Association106, 1099–1112.
ZHANG, X. & L IANG , H. (2011). Focused information criterion and model averaging for generalized additivepartial linear models.The Annals of Statistics39, 174–200.
ZHANG, X., WAN , A. T. K. & Z HOU, S. Z. (2012). Focused information criteria, model selection and modelaveraging in a Tobit model with a non-zero threshold.Journal of Business & Economic Statistics30,132–142.
ZHANG, X., ZOU, G. & CARROLL, R. (2015). Model averaging based on Kullback-Leibler distance.StatisticaSinica25, 1583–1598.
ZHANG, X., ZOU, G. & L IANG , H. (2014). Model averaging and weight choice in linear mixed-effects models.Biometrika101, 205–218.
ZHAO, T., CHENG, G. & L IU , H. (2016). A partially linear framework for massive heterogeneous data.TheAnnals of Statistics44, 1400–1437.
Xinyu Zhang, Academy of Mathematics and Systems Science, Chinese Academy of SciencesE-mail: [email protected] Wang, Econometric Institute, Erasmus University Rotterdam, and Tinbergen InstituteE-mail: [email protected]