Goodness–of–fit tests for regression models: the functional data case Wenceslao González Manteiga Joint work with Juan Cuesta–Albertos, Eduardo García–Portugués and Manuel Febrero–Bande Department of Statistics and O.R., University of Santiago de Compostela, Spain ICMAT, Madrid, 29/01/2015
46
Embed
Goodness–of–fit tests for regression models: the functional data case
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Goodness–of–fit tests for regression models: thefunctional data case
Wenceslao González Manteiga
Joint work with Juan Cuesta–Albertos, Eduardo García–Portuguésand Manuel Febrero–Bande
Department of Statistics and O.R., University of Santiago de Compostela, Spain
ICMAT, Madrid, 29/01/2015
Goodness–of–fit tests for regression models
Our Research group: “Optimization Models, Decision,Statistics and Applications (MODESTYA)"
The group is formed by 25 persons dis-tributed as follows: 15 professors, 5PhD students, 5 contracted persons un-der transfer projects and
also 12 collaborators from other univer-sities.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 2/46
Different research fields:Statistical consulting and data analysis, interpretation of statistical results,
Time series forecasting,
Spatial statistics and mapping,
Statistical applications in industry,
Financial modeling,
Environmental statistics,
Bio-statistics,
Tourism statistics,
Design of experiments,
Application of mathematical optimization techniques to different aspects of industrialprocesses: production, logistics . . .
Statistical software support (R, SPSS, Gurobi, . . .),
Training courses, . . .
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 3/46
Goodness–of–fit tests for regression models
In recent years our group has made big efforts in:
1 Create the Statistical Consulting Service2 Nationally and internationally, the group participates in important tech-
nology transfer projects with other groups and networks3 Important methodological development
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 4/46
Goodness–of–fit tests for regression models
Statistical Consulting Service (SCS)SCS is a scientific and technical service of the University of Santiago deCompostela, established in 2006, whose main objectives are to coordinateand promote activities related to the Statistics and Operations Research,and provide support to researchers and practitioners in these fields.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 5/46
Goodness–of–fit tests for regression models
Participation in research networksRegional National International
Transfer of technology Itmati1 Math-in2 EU-Maths-IN3
1 Technical Institute for Industrial Mathematics2 Spanish Network for Mathematics & Industry3 Stitching European Service Network of Mathematics for Industry and Innovation4 Technologies and data analysis language5 Developing crucial Statistical methods for Understanding major complex Dynamic Systems in natural,biomedical and social sciences
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 6/46
Goodness–of–fit tests for regression models
Examples of collaboration: Public institutions
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 7/46
Goodness–of–fit tests for regression models
Examples of collaboration: Private companies
EES2014_oral_GonzalezManteiga
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 8/46
Goodness–of–fit tests for regression models
1 Introduction2 GOF for regression models in Functional Data3 The test4 Simulation study5 Real data application
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 9/46
Goodness–of–fit tests for regression models
Introduction
Goodness of fit
The term Goodness-of-fit (GOF) was introduced by Pearson at the begin-ning of the 20th century and it refers to tests that check how a distributionfit to a data set in an omnibus way.
The basic idea consists in comparing a nonparametric pilot estimator forthe unknown distribution F or the density f , with a consistent parametricestimator under the null hypothesis.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 10/46
Goodness–of–fit tests for regression models
Introduction
Two basic references
Durbin (1973) and Bickel and Rosenblatt (1973) settled the beginnings ofthe mathematical developments for GOF tests, based on the estimation ofthe cdf and the density function, respectively.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 11/46
Goodness–of–fit tests for regression models
Introduction
Extension
These ideas have been extended in the nineties to the general case of aregression model (c.f. Härdle and Mammen (1993), GM and Cao (1993)).
Given a regression model (fixed or random design):
Yi = m(Xi) + εi, i = 1, . . . , n
where E(Yi|Xi) = m(Xi). The goal is to test
H0 : m ∈M = {mθ}θ∈Θ⊂Rq , vs. Ha : m /∈M
with m(x) = E(Y |X = x) the regression function of Y over X,σ2 = Var(Y |X = x) and f the density of the explanatory variable (ifexists).
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 12/46
Goodness–of–fit tests for regression models
Introduction
The integrated regression function
In a similar way to the tests for the distribution function F (x) =∫ x
−∞f(t)dt, we may consider the integrated regression function:
I(x) =
∫ x
−∞m(t)dF (t) = E(Y · I(X ≤ x))
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 13/46
Goodness–of–fit tests for regression models
Introduction
The integrated regression function can be nonparametrically estimatedas follows:
In(x) =1
n
n∑i=1
Yi · I(Xi ≤ x)
with associated empirical process:
Rn(x) =√n(In(x)− Eθ(In(x))) =
1√n
n∑i=1
I(Xi ≤ x)εi
This empirical process was the basis for a broad class of test statistics(see Stute, 1997 and references in the last years).
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 14/46
Goodness–of–fit tests for regression models
Introduction
More generally in the last fifteen years (semiparametric ornonparametric) “null hypothesis” have been considered:
- Partial linear models:
H0 : Yi = Xti θ +m(Zi) + εi, i = 1, . . . , n
- Generalized partial linear models:
H0 : E(Yi|Xi, Zi) = G(Xti θ +m(Zi))
with G a known link function.- Significance test:
H0 : E(Yi|Xi, Zi) = E(Yi|Xi)
- Testing additivity:
E(Yi|Xi1, . . . , Xip) =
p∑j=1
mj(Xij)
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 15/46
Goodness–of–fit tests for regression models
Introduction
Even more generally, “complex models” have been also consideredrecently
ä Goodness–of–Fit Tests for Interest Rate Models
ä Goodness–of–Fit Tests for Directional Data
ä Goodness–of–Fit Tests for Functional Data
(see GM and Crujeiras (2013) for a big review in the topic)
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 16/46
Figure : SP500 index. Each day is a functional datum.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 20/46
Goodness–of–fit tests for regression models
Functional data
Objective
Propose a Goodness–of–Fit test for the null hypothesis of the functionallinear model,
Y = 〈X , β〉+ ε =
∫X (t)β(t)dt+ ε,
with ε a centred r.v. independent from X . This is equivalent to test H0 :m(·) ∈ {〈·, β〉 : β ∈ H}, being H = L2[0, T ] the Hilbert space of squareintegrable functions.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 21/46
Goodness–of–fit tests for regression models
Functional data
ä Let X (t) be a functional r.v. taking values in H = L2[0, T ].
ä Let {Ψj}∞j=1 be a basis of H. Then for each observation Xi,i = 1, . . . , n of X , we can express
Xi =
∞∑j=1
xijΨj .
ä Let {Ψj}pj=1 the p–truncate basis with the first p elements of{Ψj}∞j=1. The representation of X ∈ H in this truncated basis isdenoted by
X (p)i =
p∑j=1
xijΨj .
ä The basis choice and the number of elements of the truncated basisare crucial in order to capture correctly the information given by thefunctional process.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 22/46
Goodness–of–fit tests for regression models
Functional data
Functional linear model
ä The Functional Linear Model states that
Y = 〈X , β〉+ ε =
∫X (t)β(t)dt+ ε,
with ε a centred r.v. independent from X .
ä The estimation of the functional parameter β is done by minimisingthe RSS:
β = arg minβ∈H
n∑i=1
(Yi − 〈Xi, β〉)2.
ä Different methods have been proposed to search for the β thatminimizes the RSS (see Ferraty and Romain (2011)):
1 Using a basis representation of B–splines or Fourier functions.2 Using Principal Components (PC).3 Using Partial Least Squares (PLS).
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 23/46
Goodness–of–fit tests for regression models
The test
ä Let (X , Y ) be r.v.’s in H× R and the sample {(Xi, Yi)}ni=1.
ä We want to test this null composite hypothesis:
H0 : m ∈ {〈·, β〉 : β ∈ H} ,
versus the general alternative
H1 : P {m /∈ {〈·, β〉 : β ∈ H}} > 0.
ä The simple hypothesis, i.e. checking for a specific functional linearmodel, is also of interest
H0 : m (X ) = 〈X , β0〉 , for a fixed β0 ∈ H
and it includes the important case of no interaction between thefunctional covariate and the scalar response (β0 = 0).
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 24/46
Goodness–of–fit tests for regression models
The test
Lemma
Let β ∈ H. The following statements are equivalent:
I m (X ) = 〈X , β〉 , ∀X ∈ H.
II E [Y − 〈X , β〉 |X = x] = 0, for a.e. x ∈ H.
III E [Y − 〈X , β〉 | 〈X , γ〉 = u] = 0, for a.e. u ∈ R and ∀γ ∈ SH.
III’ E [Y − 〈X , β〉 | 〈X , γ〉 = u] = 0, for a.e. u ∈ R and ∀γ ∈ SpH, ∀p ≥ 1.
IV E[(Y − 〈X , β〉)1{〈X ,γ〉≤u}
]= 0, for a.e. u ∈ R and ∀γ ∈ SH.
IV’ E[(Y − 〈X , β〉)1{〈X ,γ〉≤u}
]= 0, for a.e. u ∈ R and ∀γ ∈ SpH,
∀p ≥ 1.
Lemma based on the results of Patilea et al. (2012).
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 25/46
Goodness–of–fit tests for regression models
The test
ä A possible way to measure the deviation of the data from H0
(following the ideas of Stute (1997) in scalar case) is by theprojected empirical process:
Rn(u, γ) = n−12
n∑i=1
(Yi −
⟨Xi, β
⟩)1{〈Xi,γ〉≤u}.
ä To measure the distance of the empirical process from zero:Cramér–von Mises and Kolmogorov–Smirnov norms, adapted to theprojected space Π = R× SH:
PCvMn =
∫Π
Rn(u, γ)2 Fn,γ(du)ω(dγ),
PKSn = sup(u,γ)∈Π
|Rn(u, γ)| ,
where Fn,γ is the ecdf of {〈Xi, γ〉}ni=1 and ω represents a functionalmeasure on SH.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 26/46
Goodness–of–fit tests for regression models
The test
ä A p–truncated version of this statistic is:
Rn,p
(u, γ(p)
)= n−
12
n∑i=1
(Yi − xTi,p Ψ bp
)1{xTi,p Ψ gp≤u}
= Rn,p (u,gp) ,
where bp are the coefficients of β in the p–truncated basis {Ψj}pj=1,Ψ = (〈Ψi,Ψj〉)ij (Ip if the basis is orthonormal) and gp are thecoefficients of γ in the truncated basis.
ä A simplified version of the statistics PCvMn considering the uniformdistribution of the sphere is:
PCvMn,p =
∫Sp×R
|R|−1Rn,p(u,R
−1gp)2 Fn,R−1gp(du) dgp,
where R is the p× p matrix such that Ψ = RTR
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 27/46
Goodness–of–fit tests for regression models
The test
By calculus analogous to those of Escanciano (2006),
PCvMn,p = n−2n∑i=1
n∑j=1
n∑r=1
εiεjAijr,
where:
Aijr = A(0)ijr
πp/2−1
Γ(p2 + 1
) |R|−1,
A(0)ijr =
2π, x′i,p = x′j,p = x′r,p,π, x′i,p = x′j,p,x
′i,p = x′r,p or x′j,p = x′r,p,∣∣∣∣π − arccos
((x′i,p−x′r,p)T (x′j,p−x′r,p)
||x′i,p−x′r,p||·||x′j,p−x′r,p||
)∣∣∣∣ , else.
E. García-Portugués, W. González-Manteiga and M. Febrero-Bande (2014).A Goodness–of–Fit test for the functional linear model with scalar response.Journal of Computational and Graphical Statistics, Vol. 23 Issue 3, 761–778.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 28/46
Goodness–of–fit tests for regression models
The test
Simple hypothesis
Calibration of the test procedure for simple hypothesis
Let be {(Xi, Yi)}ni=1 an iid sample:1 Express Xi(·) (and optionally β0(·)) in a p-truncated basis.2 Construct εi = Yi − 〈Xi, β0〉.3 Compute PCvMn,p = n−2
∑ni=1
∑nj=1
∑nr=1 εiεjAijr
4 Bootstrap procedure (Wild bootstrap):ä Construct ε∗i = Viεi, with E [Vi] = 0, E
[V 2i
]= 1.
ä Compute PCvM∗n,p = n−2∑n
i=1
∑nj=1
∑nr=1 ε
∗i ε
∗jAijr
5 p–value≈ #{
PCvMn,p ≤ PCvM∗n,p}/B
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 29/46
Goodness–of–fit tests for regression models
The test
Simple hypothesis
Other competing methods for the simple hypothesis
ä Delsol et al. (2011) propose a test statistic for H0 : m(X ) = m0(X )using ideas of Härdle and Mammen (1993):
Tn =1
n
n∑j=1
(n∑i=1
εiK
(d(Xj ,Xi)
h
))2
ω(Xj),
where K is a kernel function, d is a semimetric (L2), h is thebandwidth (0.25, 0.50, 0.75 and 1.00) and ω a weight function(uniform).
ä González-Manteiga et al. (2012) extends the ideas of the classicalF–test to the functional framework, resulting a statistic to test thenull hypothesis of no interaction inside the functional linear model:
Dn =
∣∣∣∣∣∣∣∣∣∣ 1n
n∑i=1
(Xi − X
) (Yi − Y
)∣∣∣∣∣∣∣∣∣∣H
.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 30/46
Goodness–of–fit tests for regression models
The test
Composite hypothesis
Calibration of the test procedure for composite hypothesis
Let be {(Xi, Yi)}ni=1 an iid sample:1 Set Xi(·) := Xi(·)− Xi(·) and Yi := Yi − Y .2 Express Xi(·) and β(·) in a p-truncated basis (bspline, PC or PLS).
3 Construct εi = Yi −⟨Xi, β
⟩.
4 Compute PCvMn,p = n−2∑ni=1
∑nj=1
∑nr=1 εiεjAijr
5 Bootstrap procedure (Wild bootstrap):
ä Construct Y ∗i =
⟨Xi, β
⟩+ ε∗i , where ε∗i = Viεi, with E [Vi] = 0,
E[V 2i
]= 1.
ä Estimate β∗(·) by basis representation, PC or PLS with kn elements.
ä Construct ε∗∗i = Y ∗i −
⟨Xi, β
∗⟩
.
ä Compute PCvM∗n,p = n−2∑n
i=1
∑nj=1
∑nr=1 ε
∗∗i ε
∗∗j Aijr
6 p–value≈ #{
PCvMn,p ≤ PCvM∗n,p}/B
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 31/46
Goodness–of–fit tests for regression models
The test
Composite hypothesis
The weak convergence of the process Rn(u, γ) indexed inΠ = R× SH is a very difficult task!
A related process is given by
Rn(u) = n−12
n∑i=1
(Yi −
⟨Xi, β
⟩)1{〈Xi,γ〉},
with a fixed but randomly chosen γ ∼ ω.
If ω is a non–degenerated Gaussian measure, then:
H0 ⇔ E [Y − 〈X , β〉 | 〈X , γ〉] = 0, for some β ∈ H.
In this way it is possible to obtain the weak convergence of Rnindexed in u ∈ R and obtain statistical tests for testing H0.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 32/46
Goodness–of–fit tests for regression models
The test
Composite hypothesis
Testing significance: H0 : m(X ) = c
Under regularity conditions:
Rn(x) = n−1/2n∑i=1
(Yi − Y
)1{〈Xi,γ〉≤u}
d−→ G
where G is a centred gaussian process with covariance function
K (s, t) = Γ(s ∧ t) + Var (Y )F γ(s)F γ(t)
−F γ(t)E[1{〈X ,γ〉≤s} (Y − c)2
]−F γ(s)E
[1{〈X ,γ〉≤t} (Y − c)2
]with Γ(x) =
∫ x−∞Var (Y/ 〈X , γ〉 = u) dF γ(u) and F γ the distribution
function of 〈X , γ〉
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 33/46
ä The big advantage of using random projections is that the empiricalprocess is indexed in real valuesä The disadvantage is that possibly we are suffering of some loss ofpowerä This possibility can be alleviated by chosing several projectionsγ1, . . . , γk and selecting an appropiated way to mix the obtainedp-values. For example, using the FDR method proposed in Benjaminiand Yekutieli (2001).
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 35/46
Goodness–of–fit tests for regression models
The test
Composite hypothesis
Calibration of the test procedure for the functional linear model with PCA
Let be {(Xi, Yi)}ni=1 an iid sample:
ä Estimate β by PCA for a chosen p = pn elements of the basis andobtain the fitted residuals: εi = Yi −
⟨Xi, β
⟩, i = 1, . . . , n
ä Compute∣∣∣∣∣∣Rn∣∣∣∣∣∣ with || || = || ||KS or || || = || ||CvM
ä Resample using Wild Bootstrap and obtain estimations of β with thebootstrap samples for b = 1, . . . , B.
ä Compute∣∣∣∣∣∣R∗n,b(x)
∣∣∣∣∣∣ , b = 1, . . . , B
ä Approximate the p-value by 1B
∑Bb=1 1{||R∗n,b||≤||Rn||}
Repeat the procedure k times and use FDR for the k p-values.
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 36/46
Goodness–of–fit tests for regression models
Simulation study
Composite hypothesis
Simulation scenarios and deviations from the null hypothesis.Scenario Coefficient β(t) Process X Deviation
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 43/46
Goodness–of–fit tests for regression models
Real data application
Computation time
●● ● ●●
●
●
●
0 200 400 600 800 1000
020
4060
80
Sample size
Tim
e (s
econ
ds)
●● ● ●●
●
●
●
PCvMCvM and KS
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 44/46
Goodness–of–fit tests for regression models
Real data application
Conclusions
ä We have proposed a goodness–of–fit test for the functional linearmodel based on empirical process using random projections as analternative of a previous test of García–Portugués et al (2014) withresults quite promising.
ä The test is easy to compute and simple to calibrate using wildbootstrap and it will be available through the R-package fda.usc ofFebrero–Bande and Oviedo–De la Fuente (2011).
ä Obvious extensions of this test can include other possible forms ofthe regression function (several covariates, non linear form,. . . )
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 45/46
Goodness–of–fit tests for regression models
Real data application
Thanks for your attention!
ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 46/46