Top Banner
67

Functional Regression Analysis

Jul 18, 2015

Download

Education

NeuroMat
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Functional Regression Analysis

Functional Regression Analysis

Manuel Febrero�Bande

Dpt. de Estadística e Inv. OperativaUniv. de Santiago de Compostela

Page 2: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Table of Contents

1 Linear ModelsBasis representationPrincipal ComponentsPartial Least SquaresExamples

2 Non Linear and Semi Linear ModelsNon LinearSemi Linear Model

3 Generalized ModelsGeneralized Linear ModelsGeneralized Additive Models

4 ExamplesTecator

Page 3: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Table of Contents

1 Linear ModelsBasis representationPrincipal ComponentsPartial Least SquaresExamples

2 Non Linear and Semi Linear ModelsNon LinearSemi Linear Model

3 Generalized ModelsGeneralized Linear ModelsGeneralized Additive Models

4 ExamplesTecator

Page 4: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Introduction

Suppose that X ∈ L2(T ) and y ∈ R. Assume also thatE [X (t)] = 0,∀t ∈ [0,T ] and E [y ] = 0.The functional linear regression model states that

y = 〈X , β〉+ ε =

∫T

X (t)β(t)dt + ε

where β ∈ L2(T ) and ε is the error term.One way of estimating β, it is representing the parameter (and optionallyXi ) in a L2-basis in the following way:

β(t) =∑k

βkθk(t), Xi (t) =∑k

ci,kψk(t)

Page 5: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Representation in a basis

Observed the sample {(X1, y1), . . . , (Xn, yn)}, we can approximate Xi

and β using a �nite sum of basis elements:

Xi (t) =Kx∑k

cikψk(t), β(t) =

Kβ∑k

bkθk(t)

X = CΨ(t), β = θ′b

where y = 〈X , β〉+ ε ≈ CΨθ′b + ε = Zb + ε

b = (Z′Z)−1Z′y ,

y = CJψθb = Zb = Z(Z′Z)−1Z′y = Hy

with Jψθ = (〈ψi , θj〉)ij . The choice of the appropiate basis becomes nowin a crucial step.

Page 6: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Estimation of β

Fixed basis: B-spline, Wavelets, Fourier.Ramsay and Silverman (2005), Ramsay and Silverman (2002),Cardot (2000), Cardot et al. (2003), Antoniadis and Sapatinas(2003) . . .

Functional Principal Components (FPC).Silverman (1996), Cardot et al. (1999), Cardot and Sarda (2005),Hall et al. (2006), Cardot et al. (2007), Yao and Lee (2005),. . .

Partial Least Squares (FPLS).Preda and Saporta (2005), Krämer et al. (2008), . . .

Page 7: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Principal components (PC)

The principal components of X are linear combinations given by theeigenfunctions {vk}k≥1 of the covariance operator of X :

X (t) =∑k

ckvk(t), ck = 〈X , vk〉

where vk are the solution of the eigenvalue equation∫T

Σ(t, s)vk(s)ds = λkvk(t), 〈vk , vl〉 = 1{k=l},

and Σ(t, s) = Cov (X (s),X (t))∀t, s ∈ [0,T ]As in classical multivariate setting, the process X and the set of itsprincipal eigenfunctions, {vk}k≥1 span the same linear space.So, the PC's constitutes an orthonormal basis of L2.

Page 8: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Fitted, Residuals, Leverage

Once a Functional Linear Model is estimated, then

yi = 〈Xi , β(kn)〉 =kn∑k=1

vik βk =kn∑k=1

vikv ′·kY

nλk−→ Y = H(kn)Y

where H(kn) is the n × n hat matrix, given by:

H(kn) =1

n

(v·1v

′·1

λ1+ · · ·+

v·knv′·kn

λkn

).

So, the Cov(Y |X1, . . . ,Xn) = σ2H(kn). The leverage (0 ≤ H(kn),ii ≤ 1) isa measure of the in�uence a priori of a given observation in prediction.As Tr

(H(kn)

)= kn, we can mark that observations (Xi , yi ) with leverage

much larger than the average (kn/n).The residuals can now be written in matrix form:

e = Y − Y =(In − H(kn)

)Y = v(kn+1:n)β(kn+1:n) +

(In − H(kn)

)ε,

Page 9: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Residual Variance

Using Cardot et al. (2003) and Hall et al. (2006), the termv(kn+1:n)β(kn+1:n) can be neglected if n is large enough and kn has been

chosen suitably. Moreover, as Tr(In − H(kn)

)= n − kn, it is not di�cult

to see that:

E [e′e|X1, . . . ,Xn] = n

(β2kn+1

λkn+1

+ · · ·+ β2nλn

)+ (n − kn)σ2,

which suggests that the error variance σ2 may be estimated by thefunctional residual variance estimate, s2R , given by:

s2R =e′e

n − kn.

Page 10: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Measures of in�uence

The functional Cook's measure for prediction

CPi =

(y − y(−i,kn)

)′ (y − y(−i,kn)

)s2R

,

The functional Cook's measure for estimation

CEi =

∣∣∣∣∣∣β(kn) − β(−i,kn)∣∣∣∣∣∣2s2Rn

kn∑k=1

1λk

,

The functional Peña's measure for prediction

Pi =s ′i si

s2RH(kn),ii,

where si =(yi − y(−1,kn),i , . . . , yi − y(−n,kn),i

)′

Page 11: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Example with PC's

t = seq(0, 1, length = nt <- 51)

covexp = function(t1, t2) {

3 * exp(-abs(t1 - t2)/0.5)

}

Sigma = outer(t, t, covexp)

X = rproc2fdata(n <- 200, t, sigma = Sigma)

plot(X)

0.0 0.2 0.4 0.6 0.8 1.0

-6-2

26

Gaussian process

t

X(t)

Page 12: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Example with PC's cont'ed

res = eigen(Sigma)

pc5teo = fdata(t(res$vector[, 1:5]), argvals = t) #Theo. PC's

pc5teo[["data"]] = sweep(pc5teo[["data"]], 1, norm.fdata(pc5teo),

"/")

res.est = fdata2pc(X, ncomp = 5) # Estimated PC's

pc5est = res.est$rotation

0.0 0.2 0.4 0.6 0.8 1.0

-1.5

-0.5

0.5

1.5

Theo. PC's

t

X(t)

0.0 0.2 0.4 0.6 0.8 1.0

-1.5

-0.5

0.5

1.5

Estimated PC's

t

rotation

Page 13: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

FLM with PC's I

betaf = t + log(t + 0.1)

betaf = fdata(betaf, argvals = t) #Theo. Beta

vteo = inprod.fdata(pc5teo, betaf) # Theo. Coefs

vest = inprod.fdata(pc5est, betaf) # Estim. coefs

comb.func = function(X, coefs) {

t = X$argvals

Xnew = sweep(X$data, 1, coefs, "*")

Xnew = fdata(apply(Xnew, 2, sum), argvals = t, rangeval = X$rangeval,

names = X$names)

return(Xnew)

}

betapc5t = comb.func(pc5teo, vteo)

betapc5e = comb.func(pc5est, vest)

y = 4 + drop(inprod.fdata(X, betaf)) + rnorm(n, sd = 0.5) # Simulated response

res.pc = fregre.pc(X, y, l = 1:5)

Page 14: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

FLM with PC's II

0.0 0.2 0.4 0.6 0.8 1.0

-2.0

-1.0

0.0

1.0

fdataobj

t

X(t

)

Theor.Oracle Theo. (5)Oracle Est. PC(5)Estim. from data

Page 15: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

FLM with PC's III

summary(res.pc)

> *** Summary Functional Data Regression with Principal Components ***

>

> Call:

> fregre.pc(fdataobj = X, y = y, l = 1:5)

>

> Residuals:

> Min 1Q Median 3Q Max

> -1.46463 -0.34188 -0.00754 0.36205 1.48351

>

> Coefficients:

> Estimate Std. Error t value Pr(>|t|)

> (Intercept) 4.03876 0.03721 108.553 < 2e-16 ***

> PC1 -0.12819 0.02836 -4.520 1.08e-05 ***

> PC2 -0.84670 0.04904 -17.265 < 2e-16 ***

> PC3 0.30974 0.08688 3.565 0.000458 ***

> PC4 -0.35799 0.10170 -3.520 0.000538 ***

> PC5 -0.11690 0.15306 -0.764 0.445917

> ---

> Signif. codes:

....

Page 16: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

FLM with PC's IV

2 3 4 5

24

6

R-squared= 0.63

Fitted values

y

2 3 4 5

-1.5

0.0

1.5

Residuals vs fitted.values

Fitted valuesR

esid

uals

2 3 4 5

0.0

1.0

Scale-Location

Fitted values

Sta

ndar

dize

d re

sidu

als

0.02 0.04 0.06 0.08

010

020

0

Leverage

Leverage

Inde

x.cu

rves

-3 -2 -1 0 1 2 3

-1.5

0.0

1.5

Residuals

Theoretical Quantiles

Sam

ple

Qua

ntile

s

-1.5

0.0

1.5

Residuals

Page 17: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

FLM with PC's V

....

>

> Residual standard error: 0.5262 on 194 degrees of freedom

> Multiple R-squared: 0.6349, Adjusted R-squared: 0.6255

> F-statistic: 67.46 on 5 and 194 DF, p-value: < 2.2e-16

>

>

> -With 5 Principal Components is explained 91.31 %

> of the variability of explicative variables.

>

> -Variability for each principal components -PC- (%):

> PC1 PC2 PC3 PC4 PC5

> 58.79 19.68 6.26 4.57 2.02

> -Names of possible atypical curves: No atypical curves

> -Names of possible influence curves:

Page 18: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

FLM with PC's VI

2 3 4 5

24

6

R-squared= 0.63

Fitted values

y

2 3 4 5

-1.5

0.0

1.5

Residuals vs fitted.values

Fitted valuesR

esid

uals

2 3 4 5

0.0

1.0

Scale-Location

Fitted values

Sta

ndar

dize

d re

sidu

als

0.02 0.04 0.06 0.08

010

020

0

Leverage

Leverage

Inde

x.cu

rves

-3 -2 -1 0 1 2 3

-1.5

0.0

1.5

Residuals

Theoretical Quantiles

Sam

ple

Qua

ntile

s

-1.5

0.0

1.5

Residuals

Page 19: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Choice of kn I

To avoid a perfect �t, Cardot et al. (1999) proposed to estimate β bytaking βk = 0, for k ≥ kn + 1, with 0 < kn < n and λkn > 0, andminimizing the residual sum of squares given by:

RSS(β(1:kn)

)=

n∑i=1

(yi −

kn∑k=1

cikβk

)2

=∥∥Y − c(1:kn)β(1:kn)

∥∥2 ,where Y = (y1, . . . , yn)′, β(1:kn) = (β1, . . . , βkn)′ and c(1:kn) is the n × knmatrix whose k-th column is the vector c·k = (c1k , . . . , cnk)′, the k-thprincipal component score, which veri�es c ′·kc·k = nλk and c ′·kc·l = 0, fork 6= l . So,

β(1:kn) =

(c ′·1Y

nλ1, . . . ,

c ′·knY

nλkn

)′, β(kn) =

kn∑k=1

βkvk =kn∑k=1

c ′·kY

nλkvk .

Page 20: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Choice of kn II

The optimal kn should be chosen taking into account the work by Hallet al. (2006) that establishes:

Hall et al. (2006)

E[∣∣∣∣∣∣β − β(kn)∣∣∣∣∣∣2 |X] =

σ2

n

kn∑k=1

1

λk+

∞∑k=kn+1

〈β, vk〉2

Predictive Cross-Validation:

PCV (k) = 1n

n∑i=1

(yi −

⟨Xi , β(−i,k)

⟩)2,

Model Selection Criteria:

MSC (k) = log

(1n

n∑i=1

(yi − 〈Xi , β(k)〉

)2)+ pn

kn ,

pn = 2 (AIC),pn = 2n/(n − k − 2) (AICc),pn = log(n)/n (SIC)

Page 21: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Example

res.pc3 = fregre.pc(X, y, l = 1:3)

res.pc7 = fregre.pc(X, y, l = 1:7)

basis.x = create.bspline.basis(c(0, 1), nbasis = 21)

basis.b5 = create.bspline.basis(c(0, 1), nbasis = 5)

basis.b7 = create.bspline.basis(c(0, 1), nbasis = 11)

res.basis5 = fregre.basis(X, y, basis.x = basis.x, basis.b = basis.b5)

res.basis7 = fregre.basis(X, y, basis.x = basis.x, basis.b = basis.b7)

0.0 0.2 0.4 0.6 0.8 1.0

-2.0

-1.0

0.0

1.0

PC's-Basis Example

t

X(t)

BetaPC(3)PC(7)Spl(5)Spl(11)

Page 22: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

PC Ridge Regression

Cardot et al. (2007) have proposed to modi�ed the estimation of β inorder to solve its stability when some terms corresponding to smalleigenvalues are added to the model.

βRR(kn)

=kn∑k=1

Cov(c·k , y)

λk + rnvk .

where rn > 0 (ridge parameter).

E

[∥∥∥β − βRR(kn)

∥∥∥2 |X] =σ2

n

kn∑k=1

λk(λk + rn

)2 + r2n

kn∑k=1

〈β, vk〉2(λk + rn

)2 +

+∞∑

k=kn+1

〈β, vk〉2

Page 23: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Partial Least Squares (PLS) [Preda and Saporta (2005)]

The basis idea of PLS approach is to construct a set of orthogonalrandom variables {νi}i≥1 in the linear space spanned by X taking intoaccount the covariance between Y and X .The PLS components are obtained in the following iterative way:

1 De�ne y0 = y − y and X0 = X − X and let l = 0

2 Let tl+1 = 〈Xl ,wl+1〉, where wl+1 ∈ L2 such that Cov (yl , tl+1)2 ismaximal. Then wl+1 = Cov (yl ,Xl) / ||Cov (yl ,Xl)||

3 Let yl+1 = yl − ul+1tl+1 where ul+1 = Cov (yl , tl+1) /Var [tl+1] andXl+1 = Xl − νl+1tl+1 where νl+1 = Cov (Xl , tl+1) /Var [tl+1]

4 Let l = l + 1 and back to step 2.

Finally, X = X +∑

l tlνl and y = y +∑

l ul tl + e

Page 24: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

MV PLS estimation I

Let X = (Xi (τj)) the (n × T ) matrix with the evaluations of functional

data at the discretization points {τj}Tj=1and y the response vector

(n × p).

1 Select a weight non-zero vector w of length T (for example a row ofX or the PC1) and normalize it.

2 Compute a score vector t = Xw, t is (n × 1)

3 Compute a y-loading vector q = y′t, q is (p × 1)

4 Compute a y-score vector u = yq, u is (n × 1)

5 Compute a new weight vector w1 = X′u and normalize it.

6 If ||w −w1|| < ε the convergence is obtained, otherwise w = w1 andgo to step 2.The pair (t,u) are the scores, respectively, for X and y.These six steps can be summarized obtaining the �rst eigenvector ofthe matrices X′YY′X and XX′YY′.

The components (p,b) for X and y are computed in the followingway:

Page 25: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

MV PLS estimation II

7 Compute the loading vector p = X′t/(t′t)

8 De�act X computing X1 = X− tp′

9 Compute regression of Y onto t: b = y′t/(t′t)

10 Adjust y using b: y1 = y − tb′

11 If more are needed then set X = X1 and y = y1 and go to 1.

Page 26: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Best selection of components I

res.pc.cv = fregre.pc.cv(X, y, 5)

res.pc.cv2 = fregre.pc.cv(X, y, 5, rn = seq(0, 0.5, len = 11),

criteria = "CV")

res.basis.cv = fregre.basis.cv(X, y, basis.x = 13:17, basis.b = 5:11)

res.pls.cv = fregre.pls.cv(X, y, 4, criteria = "CV")

> Opt. PC: 2 1 4

> PCRR: 2 1 4 3 -

> Basis X 13 Basis B: 5

> PLS 1

Page 27: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Best selection of components II

0.0 0.2 0.4 0.6 0.8 1.0

-2.0

-1.0

0.0

0.5

1.0Beta

t

X(t)

BetaPCPCRRPLSSpl

Page 28: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Prediction

PC r^2: 0.655 s^2: 0.236

3 4 5 6

23

45

34

56

PLS r^2: 0.548 s^2: 0.306

2 3 4 5 2 3 4 5

23

45B-Spline

r^2: 0.665 s^2: 0.231

Page 29: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Remarks on FLM

Penalized versions of PC or PLS can also be applied simplysubstituting {X}ni=1 by

{X}ni=1

with Xi = (I + λP)−1 Xi and P apenalization matrix.Bootstrap methods can be adapted to test or study di�erent aspectsof the FLM

res.boot = fregre.bootstrap(res.pc3, nb = 500, wild = FALSE)

lines(betaf, lwd = 2)

0.0 0.2 0.4 0.6 0.8 1.0

-2.0

-1.0

0.0

1.0

beta.est bootstrap

t

X(t)

Page 30: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Bootstrap on Regression Models I

Fit the funcional linear model to the dataset and obtain β, yi , ei , . . ..

Consider the statistic θ you want to replicate

Depends on model and it is homoskedastic (β, r2, s2R ,...) ⇒ ObtainB standard bootstrap samples of size n from the dataset of samplecurves (denoted by X b

1 , . . . ,X bn where X b

i = Xi∗). �Optional�Smooth the bootstrap samples of both sets of curves and residuals.Obtain X b

i = X bi + Z b

i where Z bi is a Gaussian process with zero

mean and covariance operator γXΓX , (0 ≤ γX ≤ 1)

Depends on model and on i-element or it is heteroskedastic(yi , IFi , . . .) ⇒ Fix X b

i = Xi

Obtain B standard bootstrap samples of size n from the residuals

(denoted by eb =(eb1 , . . . , e

bn

)′).

Homoskedasticity. Naive boostrap (ebi = ei∗) or Smoothed bootstrap(ebi = ebi + zbi , where zbi is normally distributed with mean 0 andvariance γes

2

R , (0 ≤ γe ≤ 1).)

Page 31: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Bootstrap on Regression Models II

Heteroskedasticity. Wild Bootstrap. ebi = f (ei )v∗i with

f (ei ) =

ei√

nn−kn

Opt1

ei/√1− hii Opt2

ei/(1− hii ) Opt3

and

v∗i =

{−(√5− 1)/2 with prob. (

√5 + 1)/2

√5

−(√5 + 1)/2 with prob. (

√5− 1)/2

√5

(Golden rule).

Let{θb}B

b=1the statistic associated for each bootstrap dataset

The �nal estimated is:

Con�dence Interval: Consider the (1− α)-quantile (c1−α) of{∣∣∣∣∣∣θb − θ∣∣∣∣∣∣}B

b=1and de�ne IC(1− α) =

{θ :∣∣∣∣∣∣θ − θ∣∣∣∣∣∣ ≤ c1−α

}Hypothesis testing: pθ =

∑Bb=1 1

{∣∣∣∣∣∣θb∣∣∣∣∣∣ ≤ ∣∣∣∣∣∣θ∣∣∣∣∣∣} /B

Page 32: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Tecator dataset

850 900 950 1000 1050

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Spectrometric curves

Wavelength (mm)

Abs

orba

nces

850 900 950 1000 1050

−0.

02−

0.01

0.00

0.01

0.02

0.03

0.04

0.05

Spectrometric curves

Wavelength (mm)

d(A

bsor

banc

es,1

)

850 900 950 1000 1050

−0.

004

−0.

002

0.00

00.

002

0.00

4

Spectrometric curves

Wavelength (mm)

d(A

bsor

banc

es,2

)Figure : Tecator example. From left to right: Absorbances, �rst and secondderivative coloured by the content of fat (blue=low, red=high)

Page 33: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Tecator example

data(tecator)

ab = tecator$absorp.fdata

ab2 = fdata.deriv(ab, 2)

dataf = as.data.frame(tecator$y) # Fat, Protein, Water

tt = ab[["argvals"]]

b.pc0 = create.pc.basis(ab, 1:4)

b.pc2 = create.pc.basis(ab2, 1:4)

basis.x = list(ab = b.pc0, ab2 = b.pc2)

f = Fat ~ ab + ab2

ldata = list(df = dataf, ab = ab, ab2 = ab2)

res = fregre.lm(f, ldata, basis.x = basis.x)

Page 34: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Tecator results

summary(res)

>

> Call:

> lm(formula = pf, data = XX, x = TRUE)

>

> Residuals:

> Min 1Q Median 3Q Max

> -10.8067 -1.9219 0.2561 1.8306 9.0273

>

> Coefficients:

> Estimate Std. Error t value Pr(>|t|)

> (Intercept) 18.14233 0.20772 87.342 < 2e-16 ***

> ab.PC1 0.15511 0.08402 1.846 0.06633 .

> ab.PC2 4.70801 1.52557 3.086 0.00231 **

> ab.PC3 -13.37410 4.58308 -2.918 0.00391 **

> ab.PC4 0.26779 2.46191 0.109 0.91349

> ab2.PC1 3437.06617 386.05052 8.903 2.85e-16 ***

> ab2.PC2 2688.52106 1525.50024 1.762 0.07949 .

> ab2.PC3 932.68030 432.69736 2.156 0.03228 *

> ab2.PC4 628.03681 767.97070 0.818 0.41442

> ---

> Signif. codes:

> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

>

> Residual standard error: 3.046 on 206 degrees of freedom

> Multiple R-squared: 0.945, Adjusted R-squared: 0.9428

....

Page 35: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Tecator results II

summary(res)

....

> ab2.PC4 628.03681 767.97070 0.818 0.41442

> ---

> Signif. codes:

> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

>

> Residual standard error: 3.046 on 206 degrees of freedom

> Multiple R-squared: 0.945, Adjusted R-squared: 0.9428

....

Page 36: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Tecator Diagnosis I

10 20 30 40 50 60

-10

-50

510

Fitted values

Res

idua

ls

Residuals vs Fitted

43

44

7

-3 -2 -1 0 1 2 3

-4-2

02

4

Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q-Q

43

7

44

10 20 30 40 50 60

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale-Location43

744

Page 37: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Tecator Diagnosis II

850 900 950 1000 1050

2.0

3.5

5.0

Spectrometric curves

Wavelength (mm)

Abs

orba

nces

850 900 950 1000 1050

-0.0

040.

002

Spectrometric curves

Wavelength (mm)

d(A

bsor

banc

es,2

)

850 900 950 1000 1050

-1.5

0.0

1.5

Beta ab, r^2: 0.218

t

rota

tion

850 900 950 1000 1050

-100

00

1000

Beta ab2, r^2: 0.707

t

rota

tion

Page 38: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Table of Contents

1 Linear ModelsBasis representationPrincipal ComponentsPartial Least SquaresExamples

2 Non Linear and Semi Linear ModelsNon LinearSemi Linear Model

3 Generalized ModelsGeneralized Linear ModelsGeneralized Additive Models

4 ExamplesTecator

Page 39: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Non Linear Model [Ferraty and Vieu (2006)]

Suppose (X , y) are a pair of r.v. with y ∈ R and X ∈ E where E is asemi-metric space. To predict the response Y with X , the naturalestimator is the conditional expectation:

m(X ) = E(Y |X = X ),

where the NW estimator is given by:

m(X ) =

∑ni=1 YiK (h−1d(X ,Xi ))∑ni=1 K (h−1d(X ,Xi ))

,

where K is a asymmetric kernel function and h is the bandwidthparameter.Cross-Validation hopt = argmin CV (h)

CV (h) =n∑

i=1

(yi − m(−i)(Xi )

)2or any of the GCV methods (MSC).

Page 40: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Conditional distribution

Another alternative is to use the cumulative conditional distribution

FY |X=X (y) = FXY (y) = P(Y ≤ y |X = X )

and computing from this, for example, the median or the quantiles

med(X ) = inf {y ∈ R,FY |X=X (y) ≥ 1/2}

tα(X ) = inf {y ∈ R,FY |X=X (y) ≥ α}

Page 41: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Asymptotics

Conditions for regression function:

m : E→ R, limd(X ′,X )→0

m(X ′) = m(X ),

m : E→ R, |m (X )−m (X ′) | < Cd (X ′,X )β

Conditions for conditional distributions

F : E× R→ R, limd(X ′,X )→0

FX′

Y (y) = FXY (y), limd(y ′,y)→0

FXY (y ′) = FXY (y)

F : E× R→ R, |FX′

Y (y ′)− FXY (y)| < C(d (X ′,X )

β+ d (y ′, y)

β)

Indeed, the small ball probability condition is neededP(X ∈ B(X , ε)) = ϕX (ε) > 0 and the existence of conditional momentsgreater than 2.

Page 42: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Semi Linear Model [Aneiros-Pérez and Vieu (2006)]

Let (X ,Z, y) with y ∈ R (response), X ∈ E (functional) and Z ∈ Rp

(MV covariates).y = Zβ + m(X ) + ε

The parameters of the model are estimated by:

βh =(ZthZh

)−1Zthyh,

mh(X ) =∑n

i=1Wnh(X ,Xi )(yi − Zti βh)

whereZh = (I−Wh)Z, yh = (I−Wh) y, Wh = Wnh (Xi ,Xj)ij ,

Wnh (X ,Xi ) = K(d(X ,Xi )/h∑nj=1 K(d(X ,Xj )/h

Page 43: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Diagnosis, Residuals, In�uence

Fitted values: yi = HXY where HX is the projection or smoothingmatrix (n × n)

Residuals: e = (I − HX )Y

Eq. degrees of freedom: df (H) = tr(H)

Cov(Y |X1, . . . ,Xn) = σ2HX .

Residual variance: s2R = e′en−df (HX ) .

In�uence: (0 ≤ HX ,ii ≤ 1).So, we can label those observations (Xi , yi ) with more in�uence thanthe average (3 df (HX )/n).

Page 44: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Tecator example I

fat = tecator$y$Fat

res.np = fregre.np(ab2, fat, h = 5e-04)

summary(res.np)

> *** Summary Functional Non-linear Model ***

>

> -Call: fregre.np(fdataobj = ab2, y = fat, h = 5e-04)

>

> -Bandwidth (h): 5e-04

> -R squared: 0.9928937

> -Residual variance: 1.626762 on 151.737 degrees of freedom

> -Names of possible atypical curves: No atypical curves

> -Names of possible influence curves: 5 6 7 10 11 31 33 34 35 43

> It prints only the 10 most influence curves

Page 45: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Tecator example II

10 20 30 40 50

010

2030

4050

R-squared= 0.99

Fitted values

y

10 20 30 40 50

-3-1

12

3

Residuals vs fitted.values

Fitted values

Res

idua

ls10 20 30 40 50

0.0

0.5

1.0

1.5

Scale-Location

Fitted values

Sta

ndar

dize

d re

sidu

als

0.0 0.2 0.4 0.6 0.8 1.0

050

100

200

Leverage

Leverage

Inde

x.cu

rves

56710113133 343543

99

122131132140143

171174175183

-3 -2 -1 0 1 2 3

-3-1

12

3

Residuals

Theoretical Quantiles

Sam

ple

Qua

ntile

s

-3-1

12

3

Residuals

Page 46: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Table of Contents

1 Linear ModelsBasis representationPrincipal ComponentsPartial Least SquaresExamples

2 Non Linear and Semi Linear ModelsNon LinearSemi Linear Model

3 Generalized ModelsGeneralized Linear ModelsGeneralized Additive Models

4 ExamplesTecator

Page 47: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Generalized Linear Models

Let y belonging to a Exponential Family PDF:

f (y ; θ, τ) = h(y ; τ) exp

(b(θ)T (y)− A(θ)

d(τ)

)where h(y ; τ), b(θ), T (y), A(θ) and d(τ) are known. In this case,E (Y ) = µ = A′(θ) and Var(Y ) = A′′(θ)d(τ).y is related with a covariate X(X ) through a linear predictor η = Xβ(〈X , β〉) and a link function g such that E (y) = µ = g−1(η).Distribution Link Function Mean VarianceNormal Identity: η = µ µ = η 1

Binomial Logit: η = ln( µ1−µ ) µ = 1

1+exp(−η) µ(1− µ)

Poisson Log: η = ln(µ) µ = exp(η) µGamma Inverse: η = 1/µ µ = 1/η µ2

Page 48: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Estimation of η

Typically, to estimate η, project X and β onto a �nite number ofelements of a functional basis:η = 〈X , β〉 ≈

∑pXi=1

∑pβj=1 xi 〈φi , ψj〉βj = xTJβ

with X (t) =∑pX

i=1 xiφi (t) and β(t) =∑pβ

j=1 βjψj(t)

Fixed basis: B-spline, Wavelets, Fourier.James (2002), . . .

Functional Principal Components (FPC).Cardot and Sarda (2005); Escabias et al. (2004, 2005); Müller andStadtmüller (2005),. . .

Partial Least Squares (FPLS).Preda and Saporta (2005), Escabias et al. (2007). . .

Page 49: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Estimation of Generalized Linear Models

Iterated Reweighted Least Squares (IRLS)

Let η0 = Xβ0 (〈X , β0〉) the initial or current estimate of the linearpredictor with �tted value µ0 = g−1(η0)

Form the adjusted dependent variate z0 = η0 + (y − µ0)g ′(µ0)

De�ne the weights W0 = 1/(Var [µ0] g ′(µ0)2)

Regress z0 on the covariates X with weights W0 to obtain newestimates β0, (η0, µ0)

Repeat until changes in parameters and/or deviance are small

Page 50: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Generalized Additive Models (MV)

As in GLM, the response variable y is estimated through a sum ofsmooth functions of the covariates X and a g link function.

E (y) = µ = g−1(β0 +K∑j=1

fj (Xj))

with Xj the columns of X and E (fj(Xj)) = 0ESTIMATION: IRLS mixed with BACKFITTING steps

Let η0 = β0 +∑K

j=1 fj (Xj), the initial or current estimate of the

linear predictor with �tted value µ0 = g−1(η0)

Form the adjusted dependent variate z0 = η0 + (y − µ0)g ′(µ0)

De�ne the weights W0 = 1/(V (µ0)g ′(µ0)2)

Regress using Back�tting steps z0 on the covariates X with weightsW0

Repeat until changes in functions and/or deviance are small

Page 51: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Functional Spectral Additive Models Müller and Yao (2008)

Consider the PC representation of X

X (t) = µ(t) +∑k

xkvk(t)

where vk(t) is the k eigenfunction and xk the scores. Then, theFunctional Spectral Additive Model is de�ned as:

Y = β0 +K∑

k=1

fk(xk) + ε

with with E (ε) = 0, Var [ε] = σ2 and E (fk(xk)) = 0,∀k = 1, 2, . . . ,K

Page 52: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Functional Generalized Spectral Additive Models

Consider (again) the PC representation of X (or other representation)

X (t) = µ(t) +∑k

xkvk(t)

where vk(t) is the k eigenfunction and xk the scores.Then, the Functional Generalized Spectral Additive Model is de�ned toverify:

E (y) = g−1

(β0 +

K∑k=1

fk(xk)

)with E (fk(xk)) = 0,∀k = 1, 2, . . . ,K

Page 53: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Functional Generalized Kernel Additive ModelsFebrero-Bande and González-Manteiga (2013)

Given several functional variables (X 1, d1), . . . , (X p, dp) (dj is asemi-metric)Then, the Functional Generalized Kernel Additive Model is de�ned toverify:

E (y) = µ = g−1

(β0 +

K∑k=1

fk(X k)

)with E (fk(X k)) = 0,∀k = 1, 2, . . . , pIn the back�tting step, the functional non parametric method is used

fk(X k0 ) =

N∑i=1

(yi − β0 −

∑j 6=k fj(X

ji ))K(dk(X k

0 ,X ki )/hk

)∑N

j=1 K(dk(X k

0 ,X kj )/hk

)

Page 54: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Practical considerations

Our model only uses distances between data → Other spaces thanL2.How to avoid concurvity in FDA? The Distance Correlation proposedby Székely et al. (2007) works although is not yet proved for FDA.

Avoiding over�tting. Control the global amount of smoothing ateach step. GCV.

Convergence. Using Buja et al. (1989), the global convergence isensured and also oracle property.

Boundary e�ect in FDA is closely related to small ball probabilities.Are your data closely surrounded with your chosen semi-metrics?

Page 55: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Table of Contents

1 Linear ModelsBasis representationPrincipal ComponentsPartial Least SquaresExamples

2 Non Linear and Semi Linear ModelsNon LinearSemi Linear Model

3 Generalized ModelsGeneralized Linear ModelsGeneralized Additive Models

4 ExamplesTecator

Page 56: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

fda.usc Febrero-Bande and Oviedo de la Fuente (2012)

Let fat,ab,ab1 and ab2, the response and the covariates.

ldata=list(df=data.frame(fat=fat),

ab=ab,ab1=ab1,ab2=ab2)

b.pc0=create.pc.basis(ab,1:4)

b.pc1=create.pc.basis(ab1,1:4)

b.pc2=create.pc.basis(ab2,1:4)

basis.x=list(ab=b.pc0,ab1=b.pc1,ab2=b.pc2)

Correlation Distances Székely et al. (2007)

R d2(fat) d2(X ) d2(X ′) d2(X ′′)d2(fat) 1.000 0.454 0.886 0.956d2(X ) 0.454 1.000 0.669 0.497d2(X ′) 0.886 0.669 1.000 0.930d2(X ′′) 0.956 0.497 0.930 1.000

Page 57: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

FGLM

res.glm=fregre.glm(fat∼ab+ab2,data=ldata, basis.x=basis.x)Estimate Std. Error t value Pr(> |t|)

(Intercept) 18.14233 0.20772 87.342 < 2e-16 ***ab.PC1 0.15511 0.08402 1.846 0.06633 .ab.PC2 4.70801 1.52557 3.086 0.00231 **ab.PC3 -13.37410 4.58308 -2.918 0.00391 **ab.PC4 0.26779 2.46191 0.109 0.91349ab2.PC1 3437.06617 386.05052 8.903 2.85e-16 ***ab2.PC2 2688.52106 1525.50024 1.762 0.07949 .ab2.PC3 932.68030 432.69736 2.156 0.03228 *ab2.PC4 628.03681 767.97070 0.818 0.41442

Residual standard error: 3.046 on 206 d.f.

Multiple R-squared: 0.945, Adjusted R-squared: 0.9428

F-statistic: 442.3 on 8 and 206 DF, p-value: < 2.2e-16

cor(fat, 〈β1, ab〉)2 = 21.8%, cor(fat, 〈β2, ab2〉)2 = 70.7%

Page 58: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

850 900 950 1000 1050

2.0

3.0

4.0

5.0

Spectrometric curves

Wavelength (mm)

Abs

orba

nces

850 900 950 1000 1050

−0.

004

0.00

00.

004

Spectrometric curves

Wavelength (mm)

d(A

bsor

banc

es,2

)850 900 950 1000 1050

−1.

50.

01.

0

beta.est

t

rota

tion

850 900 950 1000 1050

−10

000

500

beta.est

t

rota

tion

Figure : Tecator example. Estimation of beta parameters

Page 59: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

GSAM

res.gsam=fregre.gsam(fat∼s(ab)+s(ab2),data=ldata,basis.x=basis.x)Parametric coe�cients: Estimate Std. Error t value Pr(> |t|)

(Intercept) 18.14233 0.05041 359.9 <2e-16 ***

Approximate signi�cance of smooth termsedf Ref.df F p-value

s(ab.PC1) 5.548 6.654 4.696 0.000111 ***s(ab.PC2) 1.000 1.000 27.491 4.40e-07 ***s(ab.PC3) 1.980 2.536 17.891 8.23e-09 ***s(ab.PC4) 7.127 8.126 4.471 5.38e-05 ***s(ab2.PC1) 7.115 8.110 242.865 < 2e-16 ***s(ab2.PC2) 7.381 8.305 5.004 1.03e-05 ***s(ab2.PC3) 8.276 8.797 5.052 5.61e-06 ***s(ab2.PC4) 5.986 7.130 7.532 4.52e-08 ***

R-sq.(adj) = 0.997 Deviance explained = 99.7%

GCV score = 0.6927 Scale est. = 0.54638 n = 215

cor(fat,∑K

k=1 fk (xabk ))2 = 35.2%, cor(fat,

∑Kk=1 fk (x

ab2k ))2 = 89.6%

Page 60: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

GKAM

res.gkam=fregre.gkam(fat∼s(ab)+s(ab2),data=ldata)

alpha= 18.2 n= 215 Converged? Yes Iterations:4Smoothed termsh cor(f(X),eta) edf

f(ab2) 0.000371 1.000 88.7f(ab) 9.410000 0.409 1.6

Residual deviance= 116.361 Null deviance= 34735.44

AIC= 662.88 Deviance explained= 99.7 %

R-sq.= 0.997 R-sq.(adj)= 0.994

cor(fat, f1(ab))2 = 16.9%, cor(fat, f2(ab2))2 = 99.6%

Page 61: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●●●

●●●●●●

●●●●

●●●●

●●●●

●●●●●

●●

●● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●●

●●

●●●● ●

● ●●

● ● ●

● ●

●● ●

●●

●●

●●●

●●●

●● ●

●●

●●

●●

●● ● ●

●●

0 10 20 30 40 50

−10

−5

05

GLM

Fat

Res

idua

ls ●●

●● ● ●●

●● ●

●●●

●●

●●●●●

●●

● ● ● ●●

●●

●●●●●

●●

●●●●●

●●●●

●●●●

●●●●

●●

●●●●●

●●

●●

●●●●●●

●● ●●

●●●●

●●●●

●●

● ●●

●●

●●●●●

●●

●●

● ● ●

●●

●●

● ●

●● ●

●●

●●●

●●●●●●

●●

●●●●

●● ●

●●

●●

●●

● ●

●●

● ●●

● ● ●●●

●●●●

●●●

●●●●

●●

●●

●●

● ● ●●

● ●

0 10 20 30 40 50

−10

−5

05

GSAM

Fat

Res

idua

ls

● ●●● ● ●●● ●● ●●

●●● ●●●●●● ●●● ●●●●● ● ● ●● ● ● ● ●●● ●● ● ●●●● ●● ●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●● ● ●●● ●●●●●●●●●●●● ●●●●● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ●

●● ●●●●●●●● ●●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●●● ●●●●●●●● ●●●● ● ● ●●● ● ● ● ●● ● ● ● ● ●

0 10 20 30 40 50

−10

−5

05

GKAM

Fat

Res

idua

ls

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●●

●●●●●

●●

●●

●●

●●●

●●●●●

●●●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

● ●

●● ●

● ●

●●●

●●

●●●● ● ● ●

● ●

● ● ●

●●

● ●

●●

●●

●●●

●●●●

●● ● ●

0 10 20 30 40 50

02

46

810

GLM

Fat

|Res

idua

ls|

●●

●● ● ●●

●●●

●● ●●

●●●●

●●●●

●●●●

●●

● ● ●●●

●●

●●

●●●●●

● ●

●●

●●●●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●●

●●●●

●●

●●●

●●

●●●●●

● ●

●● ●

●●

●●●●●

●●

●●●

● ● ●●

●●

● ●

●●

●●

●●

●●

●●●●●●●●

●●

●●●

●●

●●

●●

● ●

●●

●● ●

●●

● ● ●●●

●●●●

●●●●

●●●●

●●●

●● ●● ● ●

● ● ●

0 10 20 30 40 50

02

46

810

GSAM

Fat

|Res

idua

ls|

● ●●● ● ●●● ●● ●

●●●● ●●●●●● ●●● ●●●●● ● ● ●● ● ● ● ●●● ●● ● ●●●● ●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●●●●●●●●●● ●● ● ●●● ●●●●●●●●●●●● ●●●●● ● ●●●●● ●

●● ● ● ● ● ● ● ● ● ●

●● ●●●●●●●● ●●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●●● ●●●●●●●● ●●●● ● ● ●●● ● ● ● ●● ● ● ● ● ●

0 10 20 30 40 50

02

46

810

GKAM

Fat

|Res

idua

ls|

Figure : Diagnostic plots for Tecator example

Page 62: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Fat

10 20 30 40 50 60

●●

●●

●●

●●●●

●●●● ●●●● ●

● ●●●●●●

●●

●●●

●●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●● ●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●● ●

●●● ●●●● ●●●● ●●

●●●●●●● ●●

●● ●

●●

●●

●●

●● ●●●

●●●●●●●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●●●● ●●

●●●●●●

●● ●●

●● ●●

●●

●●

●●

●●

●●●● ●●●● ●● ●● ●

● ●●●●●●

●●

●●●

●●●

● ●

●●●

●●

●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●● ●●●

●●●●●●●●●●●●

● ●●

●● ●●●● ●●●●●● ●●●

●●●●●●

● ●●

●●

●●

●●

●●●●

●●●●●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●●●●●

●●●●●●

●●●●

●● ●●

●●

10 20 30 40 50

010

2030

4050

●●

●●

●●

●●●●

●●●●●●●● ●

●●●●●●

●●●

●●●

● ●●

●●

●●●

●●

●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●●● ●●●●

●●●●●●●●● ●●●●●

● ●●

● ●●●●●●●●●● ●●●●●●●●

●●

●●●

●●●

●●

●●●●●●●●●

●●●●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●● ●●

●●●

●● ●●●

●●●●

●●

1030

50

●●

●●

●●

●●●●●●

●●●

●●

● ●●●●● ● ●

●●

●●●

●●

●●●

● ●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●

●●●

●●●

●●●

●●●●●●

●●●

●●●●

●●

●●

●●

● ● ●● ●

●●

●●●●●●●

●●●●●●●

● ●●●●

●●

●●

●●●

●●

● ●● ●

●●

●●●●●●●●●●●●

● ●●●

●●●

●●●

●●

FGLM Est.5.5%

Pr.CV 1.91 ●

●●

●●

●●

●●● ●●●

●●● ●●

● ●●●●●●●

●●

●●●●

● ●

●●●

● ●

●●●●●●●●●●●●●●●●●●●

●●●●● ●●●●●●●●●●●●●●●● ●●●

●●●●●●●●●

●●●●●●

●●●●

●●●●●●●●

●●●

●●

●●●●

●●

●●

● ● ●● ●

●●

●●●●●●●

●●●●●● ●● ●● ●●

●●

●●

●● ●

●●

● ●● ●

●●●●●●●

●●● ●●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●● ●●●

●●●●●

● ●●●●● ● ●

●●

●● ●

●●

●●●

●●

●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●

●● ●●●● ●●● ●●●● ●●●●●●●●● ●●●●●●● ●

● ●●

●●●

●●●●● ●

●●●

●●●●●●

●●

●●● ●●

● ●

●●

●●

●●●●●●● ●●●●

●● ●●●●

●●●●

●●●

●●

● ●● ●

●●

●●●●●●●● ●●●

●● ●● ●

●●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●●● ● ●

●●●

●●●●●

●●

●●●

●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●

●●●

●●

●●

●●

●●

●●●●●●●●●●●

●●

●●●

●●

●● ●●

● ●●

●●

●● ●

● ●●●●●●●●

●●●●● ●

●●●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●

●●●●●●

●●●

●●●● ●

●●

●●●

●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●

●●●●

●●●●

●●●● ●●●●●

●●●● ●●

●● ●

●●

●●

●●

●●

● ●●●●●●●●

●●●

●●●●

●●

●●●●

●●●

●●

●●●

●●●●●●● ●●●●●●

●●●●

●●

●●

●●●

●●

GSAM Est. 0.3% Pr.CV 0.59

010

2030

4050

●●

●●

●●

●●●●●●●

●●●●

●●

●●●●● ●

●●●

●●● ●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●● ●

●● ●●●● ●●● ●●●● ●●

●●●●●●● ●●●●●●● ●

● ●●●

●●●●●●● ●

●●●●●●●●

●●●

●●●

●●

●●

●●

●●●●●●●● ●●●

●●

●●●●●

●●●●

● ●●

●●

●● ●

● ●●●●●●●●

●●●●● ●

● ●●●

●●

●●●

●●

0 10 20 30 40 50

1020

3040

50

●●

●●

●●

●●●●

●●●●●●●●●

●●●●●●

●●●

●●

● ●●●

●●

●●●

● ●

●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●

●●

●●●●●●●●●●●

● ●●

●●

●●●●●●●●●●

●●●●●●●●●

●●●

● ●● ●

●●

●●●●●●●●●●●●

●●

●●●●

● ●●●

●●

●●

●●

●●

●●

●●●●●●●●●

●●●●●

● ●●●●

●●●●

●●

●●

●●

●●

●●●●

●●●● ●●●●●

● ●●●●●●

●●

●●

● ●● ●

●●

●●●

●●

●●●●●●●●●●●●●●●●●● ●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●

●●● ●●●● ●●●● ●

● ●●●●●●● ●

●● ●

● ●●●

●●

● ●●●●●●●

●●●●●

●●●●●

●●●●

●●

●●

●●●

●●

●●●●●●● ●●

●●●●●

● ●● ●●

● ●●●

●●

0 10 20 30 40 50

●●

●●

●●

●●●●

●●●● ●● ●●●

● ●●●●●●

●●

●●

● ●●●

● ●

●●●

● ●

●●●

●●●●●●●●●●●●●●● ●●●●● ●●●●●●

●●●●●●●

●●● ●

●●

●●●●●●●●●●●

●●●●

●● ●●

●● ●●●●●● ●●●

●●●●●●

● ●●

● ●● ●

●●

●●●●●●●●●●●●

●●

●● ●●

●●●●● ●

●●

●●

●●

●●

●●●●●●●●●

●●●● ●

● ●● ●●

● ●●●

●●

FGKAM Est. 0.007 %

Pr.CV 0.83

Figure : Fitted values for Tecator example

Page 63: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Bernouilli response: I(Fat≥ 15%)).165 random observations as Training set (50 for testing)

Method Sample Min. 1st. Qu. Median Mean 3rd. Qu. Max.GLM Train. 100% 100% 100% 100% 100% 100%

Test 88.0% 96.0% 98.0% 97.5% 98.0% 100%GSAM Train. 100.0% 100.0% 100.0% 100% 100% 100%

Test 54.0% 92.0% 94.0% 93.8% 98.0% 100%GKAM Train. 97.58% 98.18% 98.8% 98.7% 98.8% 100%

Test. 90.0% 96.0% 98.0% 97.9% 100.0% 100%

Table : Statistics for percentage of good classi�cation in 500 replications.

Page 64: Functional Regression Analysis

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

●●

●●

●●●

●● ●

●●●●●●●●●●●● ● ●●●●●

●●● ●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●● ●●● ●●●● ●

●●

● ●

● ●●● ●● ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●

●●● ●● ●

● ●●●●

●●●●●●●●●●●●●● ●

●● ●

●●●●●●●●●●●

●●●●● ● ●

● ●● ●●

●●●●●●●●●● ●●● ●●

●●

●●● ●●●●●●●●●

−5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

η

µ

●●

●●

●●●

●●

● ● ●

● ● ●●

●●●●

● ●●●

● ●●●●

●●●

● ● ●●

●●

●●●

●●●●●●●●

●●●

●●●

●●●

●●●●

●●

●●●

●●

● ●

● ●

● ●

●●

●●●

●●

● ●●●

●●● ●●

●●●●

●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●

●● ●●●●

●●

●●●

●●

●● ●●● ●● ●●

● ●

●●

● ●

●●

●●●

●●

●●● ●

−1.0 0.0 1.0 2.0

−5

05

10

f1(abs)

η

●●

●●

●●●

●●

●●●

●●●●

●●●●

●●●●

●● ●●●

●●●●●●

●●

●●●

●●●●●●●●

●●●●●●

●●●●●●●●

●●

● ●●●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●● ●●

●●●

●●●●●

●●●

●●

● ●● ●

●●

●●●●●

●● ●● ●

●●

●●

●●●●●●

●●●●●

●●

●●●●●●●●●

●●

●●

●●

●●

● ●●

●●●●●

−5 0 5

−5

05

10

f2(abs″)

η

Figure : Estimation of the partial e�ects (gray=I(Fat≥ 15%)).

Page 65: Functional Regression Analysis

References References

References I

Aneiros-Pérez, G. and Vieu, P. (2006). Semi-functional partial linear regression.Statistics & Probability Letters, 76(11):1102�1110.

Antoniadis, A. and Sapatinas, T. (2003). Wavelet methods for continuous-timeprediction using hilbert-valued autoregressive processes. Journal of MultivariateAnalysis, 87(1):133�158.

Buja, A., Hastie, T., and Tibshirani, R. (1989). Linear smoothers and additive models.The Annals of Statistics, 17(2):pp. 453�510.

Cardot, H. (2000). Nonparametric estimation of smoothed principal componentsanalysis of sampled noisy functions. Journal of Nonparametric Statistics,12(4):503�538.

Cardot, H., Ferraty, F., and Sarda, P. (1999). Functional linear model. Statistics &Probability Letters, 45(1):11�22.

Cardot, H., Ferraty, F., and Sarda, P. (2003). Spline estimators for the functionallinear model. Statistica Sinica, 13(3):571�592.

Cardot, H., Mas, A., and Sarda, P. (2007). Clt in functional linear regression models.Probability Theory and Related Fields, 138(3):325�361.

Cardot, H. and Sarda, P. (2005). Estimation in generalized linear models for functionaldata via penalized likelihood. Journal of Multivariate Analysis, 92(1):24�41.

Page 66: Functional Regression Analysis

References References

References II

Escabias, M., Aguilera, A., and Valderrama, M. (2004). Principal componentestimation of functional logistic regression: discussion of two di�erent approaches.Journal of Nonparametric Statistics, 16(3-4):365�384.

Escabias, M., Aguilera, A., and Valderrama, M. (2005). Modeling environmental databy functional principal component logistic regression. Environmetrics, 16(1):95�107.

Escabias, M., Aguilera, A., and Valderrama, M. (2007). Functional pls logit regressionmodel. Computational Statistics & Data Analysis, 51(10):4891�4902.

Febrero-Bande, M. and González-Manteiga, W. (2013). Generalized additive modelsfor functional data. TEST, pages 1�15. 10.1007/s11749-012-0308-0.

Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). fda.usc: Functional DataAnalysis. Utilities for Statistical Computing. R package version 1.0.0.

Ferraty, F. and Vieu, P. (2006). Nonparametric functional data analysis: theory andpractice. Springer.

Hall, P., Müller, H., and Wang, J. (2006). Properties of principal component methodsfor functional and longitudinal data analysis. The Annals of Statistics,34(3):1493�1517.

James, G. (2002). Generalized linear models with functional predictors. Journal of theRoyal Statistical Society: Series B (Statistical Methodology), 64(3):411�432.

Krämer, N., Boulesteix, A., and Tutz, G. (2008). Penalized partial least squares withapplications to b-spline transformations and functional data. Chemometrics andIntelligent Laboratory Systems, 94(1):60�69.

Page 67: Functional Regression Analysis

References References

References III

Müller, H. and Stadtmüller, U. (2005). Generalized functional linear models. TheAnnals of Statistics, 33(2):774�805.

Müller, H. and Yao, F. (2008). Functional additive models. Journal of the AmericanStatistical Association, 103(484):1534�1544.

Preda, C. and Saporta, G. (2005). Pls regression on a stochastic process.Computational Statistics & Data Analysis, 48(1):149�158.

Ramsay, J. and Silverman, B. (2002). Applied functional data analysis: methods andcase studies, volume 77. Springer New York:.

Ramsay, J. and Silverman, B. (2005). Functional Data Analysis. Springer.

Silverman, B. (1996). Smoothed functional principal components analysis by choice ofnorm. The Annals of Statistics, 24(1):1�24.

Székely, G., Rizzo, M., and Bakirov, N. (2007). Measuring and testing dependence bycorrelation of distances. The Annals of Statistics, 35(6):2769�2794.

Yao, F. and Lee, T. (2005). Penalized spline models for functional principalcomponent analysis. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 68(1):3�25.