Top Banner
Goodness–of–fit tests for regression models: the functional data case Wenceslao González Manteiga Joint work with Juan Cuesta–Albertos, Eduardo García–Portugués and Manuel Febrero–Bande Department of Statistics and O.R., University of Santiago de Compostela, Spain ICMAT, Madrid, 29/01/2015
46

Goodness–of–fit tests for regression models: the functional data case

Jul 31, 2015

Download

Education

NeuroMat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models: thefunctional data case

Wenceslao González Manteiga

Joint work with Juan Cuesta–Albertos, Eduardo García–Portuguésand Manuel Febrero–Bande

Department of Statistics and O.R., University of Santiago de Compostela, Spain

ICMAT, Madrid, 29/01/2015

Page 2: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Our Research group: “Optimization Models, Decision,Statistics and Applications (MODESTYA)"

The group is formed by 25 persons dis-tributed as follows: 15 professors, 5PhD students, 5 contracted persons un-der transfer projects and

also 12 collaborators from other univer-sities.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 2/46

Page 3: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Different research fields:Statistical consulting and data analysis, interpretation of statistical results,

Time series forecasting,

Spatial statistics and mapping,

Statistical applications in industry,

Financial modeling,

Environmental statistics,

Bio-statistics,

Tourism statistics,

Design of experiments,

Application of mathematical optimization techniques to different aspects of industrialprocesses: production, logistics . . .

Statistical software support (R, SPSS, Gurobi, . . .),

Training courses, . . .

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 3/46

Page 4: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

In recent years our group has made big efforts in:

1 Create the Statistical Consulting Service2 Nationally and internationally, the group participates in important tech-

nology transfer projects with other groups and networks3 Important methodological development

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 4/46

Page 5: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Statistical Consulting Service (SCS)SCS is a scientific and technical service of the University of Santiago deCompostela, established in 2006, whose main objectives are to coordinateand promote activities related to the Statistics and Operations Research,and provide support to researchers and practitioners in these fields.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 5/46

Page 6: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Participation in research networksRegional National International

Transfer of technology Itmati1 Math-in2 EU-Maths-IN3

Methodological developments Tecandali4 Biostatnet StuDys5

1 Technical Institute for Industrial Mathematics2 Spanish Network for Mathematics & Industry3 Stitching European Service Network of Mathematics for Industry and Innovation4 Technologies and data analysis language5 Developing crucial Statistical methods for Understanding major complex Dynamic Systems in natural,biomedical and social sciences

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 6/46

Page 7: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Examples of collaboration: Public institutions

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 7/46

Page 8: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Examples of collaboration: Private companies

EES2014_oral_GonzalezManteiga

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 8/46

Page 9: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

1 Introduction2 GOF for regression models in Functional Data3 The test4 Simulation study5 Real data application

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 9/46

Page 10: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Introduction

Goodness of fit

The term Goodness-of-fit (GOF) was introduced by Pearson at the begin-ning of the 20th century and it refers to tests that check how a distributionfit to a data set in an omnibus way.

The basic idea consists in comparing a nonparametric pilot estimator forthe unknown distribution F or the density f , with a consistent parametricestimator under the null hypothesis.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 10/46

Page 11: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Introduction

Two basic references

Durbin (1973) and Bickel and Rosenblatt (1973) settled the beginnings ofthe mathematical developments for GOF tests, based on the estimation ofthe cdf and the density function, respectively.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 11/46

Page 12: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Introduction

Extension

These ideas have been extended in the nineties to the general case of aregression model (c.f. Härdle and Mammen (1993), GM and Cao (1993)).

Given a regression model (fixed or random design):

Yi = m(Xi) + εi, i = 1, . . . , n

where E(Yi|Xi) = m(Xi). The goal is to test

H0 : m ∈M = {mθ}θ∈Θ⊂Rq , vs. Ha : m /∈M

with m(x) = E(Y |X = x) the regression function of Y over X,σ2 = Var(Y |X = x) and f the density of the explanatory variable (ifexists).

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 12/46

Page 13: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Introduction

The integrated regression function

In a similar way to the tests for the distribution function F (x) =∫ x

−∞f(t)dt, we may consider the integrated regression function:

I(x) =

∫ x

−∞m(t)dF (t) = E(Y · I(X ≤ x))

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 13/46

Page 14: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Introduction

The integrated regression function can be nonparametrically estimatedas follows:

In(x) =1

n

n∑i=1

Yi · I(Xi ≤ x)

with associated empirical process:

Rn(x) =√n(In(x)− Eθ(In(x))) =

1√n

n∑i=1

I(Xi ≤ x)εi

This empirical process was the basis for a broad class of test statistics(see Stute, 1997 and references in the last years).

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 14/46

Page 15: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Introduction

More generally in the last fifteen years (semiparametric ornonparametric) “null hypothesis” have been considered:

- Partial linear models:

H0 : Yi = Xti θ +m(Zi) + εi, i = 1, . . . , n

- Generalized partial linear models:

H0 : E(Yi|Xi, Zi) = G(Xti θ +m(Zi))

with G a known link function.- Significance test:

H0 : E(Yi|Xi, Zi) = E(Yi|Xi)

- Testing additivity:

E(Yi|Xi1, . . . , Xip) =

p∑j=1

mj(Xij)

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 15/46

Page 16: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Introduction

Even more generally, “complex models” have been also consideredrecently

ä Goodness–of–Fit Tests for Interest Rate Models

ä Goodness–of–Fit Tests for Directional Data

ä Goodness–of–Fit Tests for Functional Data

(see GM and Crujeiras (2013) for a big review in the topic)

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 16/46

Page 17: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Functional data

Functional data

Each observation is a curve! Real examples in:

Meteorology: temperature, precipitation, wind speed,. . .

Finance: evolution of asset prices/returns.

Spectrometry

. . .

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 17/46

Page 18: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Functional data

850 900 950 1000 1050

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Wavelength (mm)

Abs

orba

nces

Figure : Tecator dataset. Absorbance coloured by fat content.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 18/46

Page 19: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Functional data

Day

Tem

pera

ture

(ºC

)

j F m A M J J A S O N D

510

1520

2530

Figure : Yearly evolution of temperatures in 76 AEMET monitoring stations.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 19/46

Page 20: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Functional data

Sessions

SP

500

inde

x

1260

1270

1280

1290

1300

1310

11/08/2008 12/08/2008 13/08/2008 14/08/2008 15/08/2008 18/08/2008 19/08/2008 20/08/2008 21/08/2008 22/08/2008

Figure : SP500 index. Each day is a functional datum.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 20/46

Page 21: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Functional data

Objective

Propose a Goodness–of–Fit test for the null hypothesis of the functionallinear model,

Y = 〈X , β〉+ ε =

∫X (t)β(t)dt+ ε,

with ε a centred r.v. independent from X . This is equivalent to test H0 :m(·) ∈ {〈·, β〉 : β ∈ H}, being H = L2[0, T ] the Hilbert space of squareintegrable functions.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 21/46

Page 22: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Functional data

ä Let X (t) be a functional r.v. taking values in H = L2[0, T ].

ä Let {Ψj}∞j=1 be a basis of H. Then for each observation Xi,i = 1, . . . , n of X , we can express

Xi =

∞∑j=1

xijΨj .

ä Let {Ψj}pj=1 the p–truncate basis with the first p elements of{Ψj}∞j=1. The representation of X ∈ H in this truncated basis isdenoted by

X (p)i =

p∑j=1

xijΨj .

ä The basis choice and the number of elements of the truncated basisare crucial in order to capture correctly the information given by thefunctional process.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 22/46

Page 23: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Functional data

Functional linear model

ä The Functional Linear Model states that

Y = 〈X , β〉+ ε =

∫X (t)β(t)dt+ ε,

with ε a centred r.v. independent from X .

ä The estimation of the functional parameter β is done by minimisingthe RSS:

β = arg minβ∈H

n∑i=1

(Yi − 〈Xi, β〉)2.

ä Different methods have been proposed to search for the β thatminimizes the RSS (see Ferraty and Romain (2011)):

1 Using a basis representation of B–splines or Fourier functions.2 Using Principal Components (PC).3 Using Partial Least Squares (PLS).

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 23/46

Page 24: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

ä Let (X , Y ) be r.v.’s in H× R and the sample {(Xi, Yi)}ni=1.

ä We want to test this null composite hypothesis:

H0 : m ∈ {〈·, β〉 : β ∈ H} ,

versus the general alternative

H1 : P {m /∈ {〈·, β〉 : β ∈ H}} > 0.

ä The simple hypothesis, i.e. checking for a specific functional linearmodel, is also of interest

H0 : m (X ) = 〈X , β0〉 , for a fixed β0 ∈ H

and it includes the important case of no interaction between thefunctional covariate and the scalar response (β0 = 0).

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 24/46

Page 25: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Lemma

Let β ∈ H. The following statements are equivalent:

I m (X ) = 〈X , β〉 , ∀X ∈ H.

II E [Y − 〈X , β〉 |X = x] = 0, for a.e. x ∈ H.

III E [Y − 〈X , β〉 | 〈X , γ〉 = u] = 0, for a.e. u ∈ R and ∀γ ∈ SH.

III’ E [Y − 〈X , β〉 | 〈X , γ〉 = u] = 0, for a.e. u ∈ R and ∀γ ∈ SpH, ∀p ≥ 1.

IV E[(Y − 〈X , β〉)1{〈X ,γ〉≤u}

]= 0, for a.e. u ∈ R and ∀γ ∈ SH.

IV’ E[(Y − 〈X , β〉)1{〈X ,γ〉≤u}

]= 0, for a.e. u ∈ R and ∀γ ∈ SpH,

∀p ≥ 1.

Lemma based on the results of Patilea et al. (2012).

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 25/46

Page 26: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

ä A possible way to measure the deviation of the data from H0

(following the ideas of Stute (1997) in scalar case) is by theprojected empirical process:

Rn(u, γ) = n−12

n∑i=1

(Yi −

⟨Xi, β

⟩)1{〈Xi,γ〉≤u}.

ä To measure the distance of the empirical process from zero:Cramér–von Mises and Kolmogorov–Smirnov norms, adapted to theprojected space Π = R× SH:

PCvMn =

∫Π

Rn(u, γ)2 Fn,γ(du)ω(dγ),

PKSn = sup(u,γ)∈Π

|Rn(u, γ)| ,

where Fn,γ is the ecdf of {〈Xi, γ〉}ni=1 and ω represents a functionalmeasure on SH.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 26/46

Page 27: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

ä A p–truncated version of this statistic is:

Rn,p

(u, γ(p)

)= n−

12

n∑i=1

(Yi − xTi,p Ψ bp

)1{xTi,p Ψ gp≤u}

= Rn,p (u,gp) ,

where bp are the coefficients of β in the p–truncated basis {Ψj}pj=1,Ψ = (〈Ψi,Ψj〉)ij (Ip if the basis is orthonormal) and gp are thecoefficients of γ in the truncated basis.

ä A simplified version of the statistics PCvMn considering the uniformdistribution of the sphere is:

PCvMn,p =

∫Sp×R

|R|−1Rn,p(u,R

−1gp)2 Fn,R−1gp(du) dgp,

where R is the p× p matrix such that Ψ = RTR

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 27/46

Page 28: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

By calculus analogous to those of Escanciano (2006),

PCvMn,p = n−2n∑i=1

n∑j=1

n∑r=1

εiεjAijr,

where:

Aijr = A(0)ijr

πp/2−1

Γ(p2 + 1

) |R|−1,

A(0)ijr =

2π, x′i,p = x′j,p = x′r,p,π, x′i,p = x′j,p,x

′i,p = x′r,p or x′j,p = x′r,p,∣∣∣∣π − arccos

((x′i,p−x′r,p)T (x′j,p−x′r,p)

||x′i,p−x′r,p||·||x′j,p−x′r,p||

)∣∣∣∣ , else.

E. García-Portugués, W. González-Manteiga and M. Febrero-Bande (2014).A Goodness–of–Fit test for the functional linear model with scalar response.Journal of Computational and Graphical Statistics, Vol. 23 Issue 3, 761–778.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 28/46

Page 29: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Simple hypothesis

Calibration of the test procedure for simple hypothesis

Let be {(Xi, Yi)}ni=1 an iid sample:1 Express Xi(·) (and optionally β0(·)) in a p-truncated basis.2 Construct εi = Yi − 〈Xi, β0〉.3 Compute PCvMn,p = n−2

∑ni=1

∑nj=1

∑nr=1 εiεjAijr

4 Bootstrap procedure (Wild bootstrap):ä Construct ε∗i = Viεi, with E [Vi] = 0, E

[V 2i

]= 1.

ä Compute PCvM∗n,p = n−2∑n

i=1

∑nj=1

∑nr=1 ε

∗i ε

∗jAijr

5 p–value≈ #{

PCvMn,p ≤ PCvM∗n,p}/B

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 29/46

Page 30: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Simple hypothesis

Other competing methods for the simple hypothesis

ä Delsol et al. (2011) propose a test statistic for H0 : m(X ) = m0(X )using ideas of Härdle and Mammen (1993):

Tn =1

n

n∑j=1

(n∑i=1

εiK

(d(Xj ,Xi)

h

))2

ω(Xj),

where K is a kernel function, d is a semimetric (L2), h is thebandwidth (0.25, 0.50, 0.75 and 1.00) and ω a weight function(uniform).

ä González-Manteiga et al. (2012) extends the ideas of the classicalF–test to the functional framework, resulting a statistic to test thenull hypothesis of no interaction inside the functional linear model:

Dn =

∣∣∣∣∣∣∣∣∣∣ 1n

n∑i=1

(Xi − X

) (Yi − Y

)∣∣∣∣∣∣∣∣∣∣H

.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 30/46

Page 31: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Composite hypothesis

Calibration of the test procedure for composite hypothesis

Let be {(Xi, Yi)}ni=1 an iid sample:1 Set Xi(·) := Xi(·)− Xi(·) and Yi := Yi − Y .2 Express Xi(·) and β(·) in a p-truncated basis (bspline, PC or PLS).

3 Construct εi = Yi −⟨Xi, β

⟩.

4 Compute PCvMn,p = n−2∑ni=1

∑nj=1

∑nr=1 εiεjAijr

5 Bootstrap procedure (Wild bootstrap):

ä Construct Y ∗i =

⟨Xi, β

⟩+ ε∗i , where ε∗i = Viεi, with E [Vi] = 0,

E[V 2i

]= 1.

ä Estimate β∗(·) by basis representation, PC or PLS with kn elements.

ä Construct ε∗∗i = Y ∗i −

⟨Xi, β

∗⟩

.

ä Compute PCvM∗n,p = n−2∑n

i=1

∑nj=1

∑nr=1 ε

∗∗i ε

∗∗j Aijr

6 p–value≈ #{

PCvMn,p ≤ PCvM∗n,p}/B

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 31/46

Page 32: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Composite hypothesis

The weak convergence of the process Rn(u, γ) indexed inΠ = R× SH is a very difficult task!

A related process is given by

Rn(u) = n−12

n∑i=1

(Yi −

⟨Xi, β

⟩)1{〈Xi,γ〉},

with a fixed but randomly chosen γ ∼ ω.

If ω is a non–degenerated Gaussian measure, then:

H0 ⇔ E [Y − 〈X , β〉 | 〈X , γ〉] = 0, for some β ∈ H.

In this way it is possible to obtain the weak convergence of Rnindexed in u ∈ R and obtain statistical tests for testing H0.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 32/46

Page 33: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Composite hypothesis

Testing significance: H0 : m(X ) = c

Under regularity conditions:

Rn(x) = n−1/2n∑i=1

(Yi − Y

)1{〈Xi,γ〉≤u}

d−→ G

where G is a centred gaussian process with covariance function

K (s, t) = Γ(s ∧ t) + Var (Y )F γ(s)F γ(t)

−F γ(t)E[1{〈X ,γ〉≤s} (Y − c)2

]−F γ(s)E

[1{〈X ,γ〉≤t} (Y − c)2

]with Γ(x) =

∫ x−∞Var (Y/ 〈X , γ〉 = u) dF γ(u) and F γ the distribution

function of 〈X , γ〉

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 33/46

Page 34: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Composite hypothesis

Testing Functional Linear Model: H0 : m(X ) = 〈X , β〉Under regularity conditions:

Rn(x) = n−1/2n∑i=1

(Yi −

⟨Xi, β

⟩)1{〈Xi,γ〉≤u}

has the same distribution as:

n−1/2n∑i=1

εi1{〈Xi,γ〉≤u} + n−1/2⟨E[1{〈Xi,γ〉≤u}X

], β − β

⟩ä One important case: the estimation with Principal ComponentAnalysis (PCA) (Working in progress!!)

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 34/46

Page 35: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Composite hypothesis

ä Using the process Rn(x) we can define different tests:∣∣∣∣∣∣Rn(x)∣∣∣∣∣∣KS

= supu |Rn(u)| (Kolmogorov–Smirnov)∣∣∣∣∣∣Rn(x)∣∣∣∣∣∣CvM

=∫R Rn(u)2dF γn (u) (Cramer–von Mises) being F γn

the empirical estimation of F γ

. . .

ä The big advantage of using random projections is that the empiricalprocess is indexed in real valuesä The disadvantage is that possibly we are suffering of some loss ofpowerä This possibility can be alleviated by chosing several projectionsγ1, . . . , γk and selecting an appropiated way to mix the obtainedp-values. For example, using the FDR method proposed in Benjaminiand Yekutieli (2001).

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 35/46

Page 36: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

The test

Composite hypothesis

Calibration of the test procedure for the functional linear model with PCA

Let be {(Xi, Yi)}ni=1 an iid sample:

ä Estimate β by PCA for a chosen p = pn elements of the basis andobtain the fitted residuals: εi = Yi −

⟨Xi, β

⟩, i = 1, . . . , n

ä Compute∣∣∣∣∣∣Rn∣∣∣∣∣∣ with || || = || ||KS or || || = || ||CvM

ä Resample using Wild Bootstrap and obtain estimations of β with thebootstrap samples for b = 1, . . . , B.

ä Compute∣∣∣∣∣∣R∗n,b(x)

∣∣∣∣∣∣ , b = 1, . . . , B

ä Approximate the p-value by 1B

∑Bb=1 1{||R∗n,b||≤||Rn||}

Repeat the procedure k times and use FDR for the k p-values.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 36/46

Page 37: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Simulation study

Composite hypothesis

Simulation scenarios and deviations from the null hypothesis.Scenario Coefficient β(t) Process X Deviation

S1 (2ψ2(t) + 4ψ2(t) + 5ψ3(t))/√

2 BM δk∆1, δ =(0, 1

4, 34

)S2 (2ψ3(t) + 4ψ5(t) + 5ψ7(t))/

√2 BM δk∆1, δ =

(0, 1

20, 15

)S3 (2ψ2(t) + 4ψ3(t) + 5ψ7(t))/

√2 BM −δk∆1, δ =

(0, 1

5, 12

)S4 log

(15t2 + 10

)+ cos(4πt) BM δk∆1, δ =

(0, 1

5, 1)

S5∑20

j=1 23/2(−1)jj−2φj(t) HHN (l = 1) −δk∆2, δ = (0, 1, 3)

S6∑20

j=1 23/2(−1)jj−2φj(t) HHN (l = 2) −δk∆2, δ = (0, 1, 3)

S7 (2ψ2(t) + 4ψ2(t) + 5ψ3(t))/√

2 BB −δk∆2, δ =(0, 2, 15

2

)S8 sin(2πt)− cos(2πt) OU −δk∆2, δ =

(0, 1

4, 1)

S9 t−(t− 3

4

)2 OU −δk∆3, δ =(0, 1

100, 110

)S10 t+ cos(2πt) OU δk∆3, δ =

(0, 1

100, 120

)S11 log

(15t2 + 10

)+ cos(4πt) GBM (s0 = 1) δk∆3, δ =

(0, 1

2, 2)

S12 π2(t2 − 1

3

)GBM (s0 = 2) δk∆3, δ =

(0, 1

2, 52

)Hk,d : Y = 〈X, βk〉+ δd∆d k

3e(X) + ε, with ∆1(X) = ||X||,

∆2(X) = 25∫ 10

∫ 10

sin(2πts)s(1− s)t(1− t)X(s)X(t) ds dt and ∆3(X) =⟨e−X,X2

⟩. The

noise ε is distributed as aN (0, σ2), where σ2 is chosen such that R2H0

is 0.95.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 37/46

Page 38: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Simulation study

Composite hypothesis

0.0 0.2 0.4 0.6 0.8 1.0

−2

02

46

810

−3

−2

−1

01

2

0.0 0.2 0.4 0.6 0.8 1.0

−5

05

10

−2

02

0.0 0.2 0.4 0.6 0.8 1.0

−5

05

−3

−2

−1

01

2

0.0 0.2 0.4 0.6 0.8 1.0

11.

52

2.5

33.

54

4.5

−2

02

4

0.0 0.2 0.4 0.6 0.8 1.0

−4

−2

02

46

−4

−2

02

4

0.0 0.2 0.4 0.6 0.8 1.0

−4

−2

02

46

−3

−2

−1

01

23

0.0 0.2 0.4 0.6 0.8 1.0

−4

−2

02

46

810

−2

−1

01

2

0.0 0.2 0.4 0.6 0.8 1.0

−1.

5−

1−

0.5

00.

51

1.5

−3

−2

−1

01

2

0.0 0.2 0.4 0.6 0.8 1.0

−1

−0.

50

0.5

1

−3

−2

−1

01

23

0.0 0.2 0.4 0.6 0.8 1.0

−1

−0.

50

0.5

11.

52

−3

−2

−1

01

2

0.0 0.2 0.4 0.6 0.8 1.0

11.

52

2.5

33.

54

4.5

02

46

8

0.0 0.2 0.4 0.6 0.8 1.0

−4

−2

02

46

05

1015

20

Figure : From right to left and up to down, functional coefficients β and underlyingprocesses for the twelve different scenarios, labelled S1 to S12.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 38/46

Page 39: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Simulation study

Composite hypothesis

Hk,δ KS 1 KS 5 KS 10 KS 25 KS 50 CvM 1 CvM 5 CvM 10 CvM 25 CvM 50 PCvM Mean pH1,0 0.055 0.046 0.035 0.028 0.033 0.055 0.049 0.044 0.041 0.039 0.046 3.417H2,0 0.066 0.061 0.063 0.050 0.039 0.066 0.049 0.046 0.048 0.042 0.071 8.783H3,0 0.056 0.057 0.052 0.044 0.043 0.056 0.048 0.052 0.054 0.045 0.062 8.243H1,1 0.554 0.500 0.495 0.496 0.477 0.554 0.534 0.502 0.490 0.484 0.696 3.396H2,1 0.339 0.299 0.299 0.286 0.279 0.339 0.323 0.303 0.300 0.285 0.447 8.748H3,1 0.939 0.952 0.955 0.957 0.950 0.939 0.948 0.948 0.955 0.954 0.990 8.109

Table : Part I. Empirical sizes and powers for the competing tests with α = 0.05, n = 100,

B = 5000, M = 1000 and estimation of β by data-driven PC’s. Hk,δ represents the k–th model with

δ deviation (δ = 0 for the null hypothesis). KS and CvM tests are taken with 1, 5, 10, 25 and 50

projections.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 39/46

Page 40: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Simulation study

Composite hypothesis

Hk,δ KS 1 KS 5 KS 10 KS 25 KS 50 CvM 1 CvM 5 CvM 10 CvM 25 CvM 50 PCvM Mean pH1,2 0.991 0.998 0.999 1.000 1.000 0.991 0.999 1.000 1.000 1.000 1.000 3.275H2,2 0.995 0.997 0.998 0.997 0.996 0.995 0.998 1.000 0.998 0.998 1.000 8.496H3,2 0.993 1.000 1.000 1.000 1.000 0.993 0.999 1.000 1.000 1.000 1.000 7.673H1,3 0.993 0.999 1.000 1.000 1.000 0.993 0.999 1.000 1.000 1.000 1.000 3.181H2,3 0.995 0.999 1.000 1.000 1.000 0.995 0.999 1.000 1.000 1.000 1.000 7.871H3,3 0.997 1.000 1.000 1.000 1.000 0.997 0.999 1.000 1.000 1.000 1.000 6.131

Table : Part II. Empirical sizes and powers for the competing tests with α = 0.05, n = 100,

B = 5000, M = 1000 and estimation of β by data-driven PC’s. Hk,δ represents the k–th model with

δ deviation (δ = 2 and δ = 3). KS and CvM tests are taken with 1, 5, 10, 25 and 50 projections.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 40/46

Page 41: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Real data application

Day

Tem

pera

ture

(ºC

)

j F m A M J J A S O N D

510

1520

2530

Day

β(t)

j F m A M J J A S O N D

−0.

002

−0.

001

0.00

00.

001

0.00

2

Figure : AEMET temperatures and β. Response: average wind speed.ä H0 : Y = 〈X , β〉+ ε.

NProj 10 25 50 100 PCvMp-value (CVM) 0.512 0.580 0.700 0.719 0.119p-value (KS) 0.443 0.440 0.382 0.442

ä Simple hypothesis H0 : β = 0

RPCvM(50) RPKS(50) PCvM F–test Delsol et al.p-value 0.00 0.00 0.062 0.002 0.000

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 41/46

Page 42: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Real data application

850 900 950 1000 1050

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Wavelength (mm)

Abs

orba

nces

850 900 950 1000 1050

−0.

02−

0.01

0.00

0.01

0.02

0.03

0.04

0.05

Wavelength (mm)

d(A

bsor

banc

es,1

)850 900 950 1000 1050

−0.

004

−0.

002

0.00

00.

002

0.00

4

Wavelength (mm)

d(A

bsor

banc

es,2

)

Figure : Tecator dataset. Absorbance, and first derivative by fat content.

ä H0 : Y = 〈X , β〉+ ε. p–value is 0.0000 (nproj:10–100), evidence forrejecting FLM.

ä Significative dependence between Y and X , but not a linear one.

ä Same results for first and second derivative (X ′, X ′′).ä Same conclusions with previous alternative test.

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 42/46

Page 43: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Real data application

Sessions

SP

500

inde

x

700

800

900

1000

1100

1200

1300

11/08/2008 03/03/2009 01/10/2009 12/05/2010 01/12/2010 04/08/2011

Time

β(t)

−5

05

1015

20

09:30 10:15 11:00 11:45 12:30 13:15 14:00 14:45 15:30

Figure : SP500 index and estimated functional coefficient β with 3 PLS.Response: mean index for the next session.

ä Idea: predict the mean of the tomorrow’s session (Yt+1 = Xt+1) withthe today’s curve, Xt.

NProj 10 25 50 100 PCvMp-value (CVM) 0.748 0.809 0.820 0.803 0.662p-value (KS) 0.530 0.0.627 0.644 0.604

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 43/46

Page 44: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Real data application

Computation time

●● ● ●●

0 200 400 600 800 1000

020

4060

80

Sample size

Tim

e (s

econ

ds)

●● ● ●●

PCvMCvM and KS

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 44/46

Page 45: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Real data application

Conclusions

ä We have proposed a goodness–of–fit test for the functional linearmodel based on empirical process using random projections as analternative of a previous test of García–Portugués et al (2014) withresults quite promising.

ä The test is easy to compute and simple to calibrate using wildbootstrap and it will be available through the R-package fda.usc ofFebrero–Bande and Oviedo–De la Fuente (2011).

ä Obvious extensions of this test can include other possible forms ofthe regression function (several covariates, non linear form,. . . )

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 45/46

Page 46: Goodness–of–fit tests for regression models: the functional data case

Goodness–of–fit tests for regression models

Real data application

Thanks for your attention!

ICMAT 2015 – W. González-Manteiga et al. Goodness–of–fit tests for regression models 46/46