Functional Regression Analysis

Functional Regression Analysis

Manuel Febrero�Bande

Dpt. de Estadística e Inv. OperativaUniv. de Santiago de Compostela

Linear Models Non Linear and Semi Linear Models Generalized Models Examples

Table of Contents

1 Linear ModelsBasis representationPrincipal ComponentsPartial Least SquaresExamples

2 Non Linear and Semi Linear ModelsNon LinearSemi Linear Model

3 Generalized ModelsGeneralized Linear ModelsGeneralized Additive Models

4 ExamplesTecator


Table of Contents




4 ExamplesTecator


Introduction

Suppose that X ∈ L2(T ) and y ∈ R. Assume also thatE [X (t)] = 0,∀t ∈ [0,T ] and E [y ] = 0.The functional linear regression model states that

y = 〈X , β〉+ ε =

∫T

X (t)β(t)dt + ε

where β ∈ L2(T ) and ε is the error term.One way of estimating β, it is representing the parameter (and optionallyXi ) in a L2-basis in the following way:

β(t) =∑k

βkθk(t), Xi (t) =∑k

ci,kψk(t)


Representation in a basis

Observed the sample {(X1, y1), . . . , (Xn, yn)}, we can approximate Xi

and β using a �nite sum of basis elements:

Xi (t) =Kx∑k

cikψk(t), β(t) =

Kβ∑k

bkθk(t)

X = CΨ(t), β = θ′b

where y = 〈X , β〉+ ε ≈ CΨθ′b + ε = Zb + ε

b = (Z′Z)−1Z′y ,

y = CJψθb = Zb = Z(Z′Z)−1Z′y = Hy

with Jψθ = (〈ψi , θj〉)ij . The choice of the appropiate basis becomes nowin a crucial step.


Estimation of β

Fixed basis: B-spline, Wavelets, Fourier.Ramsay and Silverman (2005), Ramsay and Silverman (2002),Cardot (2000), Cardot et al. (2003), Antoniadis and Sapatinas(2003) . . .

Functional Principal Components (FPC).Silverman (1996), Cardot et al. (1999), Cardot and Sarda (2005),Hall et al. (2006), Cardot et al. (2007), Yao and Lee (2005),. . .

Partial Least Squares (FPLS).Preda and Saporta (2005), Krämer et al. (2008), . . .


Principal components (PC)

The principal components of X are linear combinations given by theeigenfunctions {vk}k≥1 of the covariance operator of X :

X (t) =∑k

ckvk(t), ck = 〈X , vk〉

where vk are the solution of the eigenvalue equation∫T

Σ(t, s)vk(s)ds = λkvk(t), 〈vk , vl〉 = 1{k=l},

and Σ(t, s) = Cov (X (s),X (t))∀t, s ∈ [0,T ]As in classical multivariate setting, the process X and the set of itsprincipal eigenfunctions, {vk}k≥1 span the same linear space.So, the PC's constitutes an orthonormal basis of L2.


Fitted, Residuals, Leverage

Once a Functional Linear Model is estimated, then

yi = 〈Xi , β(kn)〉 =kn∑k=1

vik βk =kn∑k=1

vikv ′·kY

nλk−→ Y = H(kn)Y

where H(kn) is the n × n hat matrix, given by:

H(kn) =1

n

(v·1v

′·1

λ1+ · · ·+

v·knv′·kn

λkn

).

So, the Cov(Y |X1, . . . ,Xn) = σ2H(kn). The leverage (0 ≤ H(kn),ii ≤ 1) isa measure of the in�uence a priori of a given observation in prediction.As Tr

(H(kn)

)= kn, we can mark that observations (Xi , yi ) with leverage

much larger than the average (kn/n).The residuals can now be written in matrix form:

e = Y − Y =(In − H(kn)

)Y = v(kn+1:n)β(kn+1:n) +

(In − H(kn)

)ε,


Residual Variance

Using Cardot et al. (2003) and Hall et al. (2006), the termv(kn+1:n)β(kn+1:n) can be neglected if n is large enough and kn has been

chosen suitably. Moreover, as Tr(In − H(kn)

)= n − kn, it is not di�cult

to see that:

E [e′e|X1, . . . ,Xn] = n

(β2kn+1

λkn+1

+ · · ·+ β2nλn

)+ (n − kn)σ2,

which suggests that the error variance σ2 may be estimated by thefunctional residual variance estimate, s2R , given by:

s2R =e′e

n − kn.


Measures of in�uence

The functional Cook's measure for prediction

CPi =

(y − y(−i,kn)

)′ (y − y(−i,kn)

)s2R

,

The functional Cook's measure for estimation

CEi =

∣∣∣∣∣∣β(kn) − β(−i,kn)∣∣∣∣∣∣2s2Rn

kn∑k=1

1λk

,

The functional Peña's measure for prediction

Pi =s ′i si

s2RH(kn),ii,

where si =(yi − y(−1,kn),i , . . . , yi − y(−n,kn),i

)′


Example with PC's

t = seq(0, 1, length = nt <- 51)

covexp = function(t1, t2) {

3 * exp(-abs(t1 - t2)/0.5)

}

Sigma = outer(t, t, covexp)

X = rproc2fdata(n <- 200, t, sigma = Sigma)

plot(X)

0.0 0.2 0.4 0.6 0.8 1.0

-6-2

26

Gaussian process

t

X(t)


Example with PC's cont'ed

res = eigen(Sigma)

pc5teo = fdata(t(res$vector[, 1:5]), argvals = t) #Theo. PC's

pc5teo[["data"]] = sweep(pc5teo[["data"]], 1, norm.fdata(pc5teo),

"/")

res.est = fdata2pc(X, ncomp = 5) # Estimated PC's

pc5est = res.est$rotation

0.0 0.2 0.4 0.6 0.8 1.0

-1.5

-0.5

0.5

1.5

Theo. PC's

t

X(t)

0.0 0.2 0.4 0.6 0.8 1.0

-1.5

-0.5

0.5

1.5

Estimated PC's

t

rotation


FLM with PC's I

betaf = t + log(t + 0.1)

betaf = fdata(betaf, argvals = t) #Theo. Beta

vteo = inprod.fdata(pc5teo, betaf) # Theo. Coefs

vest = inprod.fdata(pc5est, betaf) # Estim. coefs

comb.func = function(X, coefs) {

t = X$argvals

Xnew = sweep(X$data, 1, coefs, "*")

Xnew = fdata(apply(Xnew, 2, sum), argvals = t, rangeval = X$rangeval,

names = X$names)

return(Xnew)

}

betapc5t = comb.func(pc5teo, vteo)

betapc5e = comb.func(pc5est, vest)

y = 4 + drop(inprod.fdata(X, betaf)) + rnorm(n, sd = 0.5) # Simulated response

res.pc = fregre.pc(X, y, l = 1:5)


FLM with PC's II

0.0 0.2 0.4 0.6 0.8 1.0

-2.0

-1.0

0.0

1.0

fdataobj

t

X(t

)

Theor.Oracle Theo. (5)Oracle Est. PC(5)Estim. from data


FLM with PC's III

summary(res.pc)

> *** Summary Functional Data Regression with Principal Components ***

>

> Call:

> fregre.pc(fdataobj = X, y = y, l = 1:5)

>

> Residuals:

> Min 1Q Median 3Q Max

> -1.46463 -0.34188 -0.00754 0.36205 1.48351

>

> Coefficients:

> Estimate Std. Error t value Pr(>|t|)

> (Intercept) 4.03876 0.03721 108.553 < 2e-16 ***

> PC1 -0.12819 0.02836 -4.520 1.08e-05 ***

> PC2 -0.84670 0.04904 -17.265 < 2e-16 ***

> PC3 0.30974 0.08688 3.565 0.000458 ***

> PC4 -0.35799 0.10170 -3.520 0.000538 ***

> PC5 -0.11690 0.15306 -0.764 0.445917

> ---

> Signif. codes:

....


FLM with PC's IV

2 3 4 5

24

6

R-squared= 0.63

Fitted values

y

2 3 4 5

-1.5

0.0

1.5

Residuals vs fitted.values

Fitted valuesR

esid

uals

2 3 4 5

0.0

1.0

Scale-Location

Fitted values

Sta

ndar

dize

d re

sidu

als

0.02 0.04 0.06 0.08

010

020

0

Leverage

Leverage

Inde

x.cu

rves

-3 -2 -1 0 1 2 3

-1.5

0.0

1.5

Residuals

Theoretical Quantiles

Sam

ple

Qua

ntile

s

-1.5

0.0

1.5

Residuals


FLM with PC's V

....

>

> Residual standard error: 0.5262 on 194 degrees of freedom

> Multiple R-squared: 0.6349, Adjusted R-squared: 0.6255

> F-statistic: 67.46 on 5 and 194 DF, p-value: < 2.2e-16

>

>

> -With 5 Principal Components is explained 91.31 %

> of the variability of explicative variables.

>

> -Variability for each principal components -PC- (%):

> PC1 PC2 PC3 PC4 PC5

> 58.79 19.68 6.26 4.57 2.02

> -Names of possible atypical curves: No atypical curves

> -Names of possible influence curves:


FLM with PC's VI

2 3 4 5

24

6

R-squared= 0.63

Fitted values

y

2 3 4 5

-1.5

0.0

1.5


Fitted valuesR

esid

uals

2 3 4 5

0.0

1.0

Scale-Location

Fitted values

Sta

ndar

dize

d re

sidu

als

0.02 0.04 0.06 0.08

010

020

0

Leverage

Leverage

Inde

x.cu

rves

-3 -2 -1 0 1 2 3

-1.5

0.0

1.5

Residuals


Sam

ple

Qua

ntile

s

-1.5

0.0

1.5

Residuals


Choice of kn I

To avoid a perfect �t, Cardot et al. (1999) proposed to estimate β bytaking βk = 0, for k ≥ kn + 1, with 0 < kn < n and λkn > 0, andminimizing the residual sum of squares given by:

RSS(β(1:kn)

)=

n∑i=1

(yi −

kn∑k=1

cikβk

)2

=∥∥Y − c(1:kn)β(1:kn)

∥∥2 ,where Y = (y1, . . . , yn)′, β(1:kn) = (β1, . . . , βkn)′ and c(1:kn) is the n × knmatrix whose k-th column is the vector c·k = (c1k , . . . , cnk)′, the k-thprincipal component score, which veri�es c ′·kc·k = nλk and c ′·kc·l = 0, fork 6= l . So,

β(1:kn) =

(c ′·1Y

nλ1, . . . ,

c ′·knY

nλkn

)′, β(kn) =

kn∑k=1

βkvk =kn∑k=1

c ′·kY

nλkvk .


Choice of kn II

The optimal kn should be chosen taking into account the work by Hallet al. (2006) that establishes:

Hall et al. (2006)

E[∣∣∣∣∣∣β − β(kn)∣∣∣∣∣∣2 |X] =

σ2

n

kn∑k=1

1

λk+

∞∑k=kn+1

〈β, vk〉2

Predictive Cross-Validation:

PCV (k) = 1n

n∑i=1

(yi −

⟨Xi , β(−i,k)

⟩)2,

Model Selection Criteria:

MSC (k) = log

(1n

n∑i=1

(yi − 〈Xi , β(k)〉

)2)+ pn

kn ,

pn = 2 (AIC),pn = 2n/(n − k − 2) (AICc),pn = log(n)/n (SIC)


Example

res.pc3 = fregre.pc(X, y, l = 1:3)

res.pc7 = fregre.pc(X, y, l = 1:7)

basis.x = create.bspline.basis(c(0, 1), nbasis = 21)

basis.b5 = create.bspline.basis(c(0, 1), nbasis = 5)

basis.b7 = create.bspline.basis(c(0, 1), nbasis = 11)

res.basis5 = fregre.basis(X, y, basis.x = basis.x, basis.b = basis.b5)

res.basis7 = fregre.basis(X, y, basis.x = basis.x, basis.b = basis.b7)

0.0 0.2 0.4 0.6 0.8 1.0

-2.0

-1.0

0.0

1.0

PC's-Basis Example

t

X(t)

BetaPC(3)PC(7)Spl(5)Spl(11)


PC Ridge Regression

Cardot et al. (2007) have proposed to modi�ed the estimation of β inorder to solve its stability when some terms corresponding to smalleigenvalues are added to the model.

βRR(kn)

=kn∑k=1

Cov(c·k , y)

λk + rnvk .

where rn > 0 (ridge parameter).

E

[∥∥∥β − βRR(kn)

∥∥∥2 |X] =σ2

n

kn∑k=1

λk(λk + rn

)2 + r2n

kn∑k=1

〈β, vk〉2(λk + rn

)2 +

+∞∑

k=kn+1

〈β, vk〉2


Partial Least Squares (PLS) [Preda and Saporta (2005)]

The basis idea of PLS approach is to construct a set of orthogonalrandom variables {νi}i≥1 in the linear space spanned by X taking intoaccount the covariance between Y and X .The PLS components are obtained in the following iterative way:

1 De�ne y0 = y − y and X0 = X − X and let l = 0

2 Let tl+1 = 〈Xl ,wl+1〉, where wl+1 ∈ L2 such that Cov (yl , tl+1)2 ismaximal. Then wl+1 = Cov (yl ,Xl) / ||Cov (yl ,Xl)||

3 Let yl+1 = yl − ul+1tl+1 where ul+1 = Cov (yl , tl+1) /Var [tl+1] andXl+1 = Xl − νl+1tl+1 where νl+1 = Cov (Xl , tl+1) /Var [tl+1]

4 Let l = l + 1 and back to step 2.

Finally, X = X +∑

l tlνl and y = y +∑

l ul tl + e


MV PLS estimation I

Let X = (Xi (τj)) the (n × T ) matrix with the evaluations of functional

data at the discretization points {τj}Tj=1and y the response vector

(n × p).

1 Select a weight non-zero vector w of length T (for example a row ofX or the PC1) and normalize it.

2 Compute a score vector t = Xw, t is (n × 1)

3 Compute a y-loading vector q = y′t, q is (p × 1)

4 Compute a y-score vector u = yq, u is (n × 1)

5 Compute a new weight vector w1 = X′u and normalize it.

6 If ||w −w1|| < ε the convergence is obtained, otherwise w = w1 andgo to step 2.The pair (t,u) are the scores, respectively, for X and y.These six steps can be summarized obtaining the �rst eigenvector ofthe matrices X′YY′X and XX′YY′.

The components (p,b) for X and y are computed in the followingway:


MV PLS estimation II

7 Compute the loading vector p = X′t/(t′t)

8 De�act X computing X1 = X− tp′

9 Compute regression of Y onto t: b = y′t/(t′t)

10 Adjust y using b: y1 = y − tb′

11 If more are needed then set X = X1 and y = y1 and go to 1.


Best selection of components I

res.pc.cv = fregre.pc.cv(X, y, 5)

res.pc.cv2 = fregre.pc.cv(X, y, 5, rn = seq(0, 0.5, len = 11),

criteria = "CV")

res.basis.cv = fregre.basis.cv(X, y, basis.x = 13:17, basis.b = 5:11)

res.pls.cv = fregre.pls.cv(X, y, 4, criteria = "CV")

> Opt. PC: 2 1 4

> PCRR: 2 1 4 3 -

> Basis X 13 Basis B: 5

> PLS 1


Best selection of components II

0.0 0.2 0.4 0.6 0.8 1.0

-2.0

-1.0

0.0

0.5

1.0Beta

t

X(t)

BetaPCPCRRPLSSpl


Prediction

PC r^2: 0.655 s^2: 0.236

3 4 5 6

23

45

34

56

PLS r^2: 0.548 s^2: 0.306

2 3 4 5 2 3 4 5

23

45B-Spline

r^2: 0.665 s^2: 0.231


Remarks on FLM

Penalized versions of PC or PLS can also be applied simplysubstituting {X}ni=1 by

{X}ni=1

with Xi = (I + λP)−1 Xi and P apenalization matrix.Bootstrap methods can be adapted to test or study di�erent aspectsof the FLM

res.boot = fregre.bootstrap(res.pc3, nb = 500, wild = FALSE)

lines(betaf, lwd = 2)

0.0 0.2 0.4 0.6 0.8 1.0

-2.0

-1.0

0.0

1.0

beta.est bootstrap

t

X(t)


Bootstrap on Regression Models I

Fit the funcional linear model to the dataset and obtain β, yi , ei , . . ..

Consider the statistic θ you want to replicate

Depends on model and it is homoskedastic (β, r2, s2R ,...) ⇒ ObtainB standard bootstrap samples of size n from the dataset of samplecurves (denoted by X b

1 , . . . ,X bn where X b

i = Xi∗). �Optional�Smooth the bootstrap samples of both sets of curves and residuals.Obtain X b

i = X bi + Z b

i where Z bi is a Gaussian process with zero

mean and covariance operator γXΓX , (0 ≤ γX ≤ 1)

Depends on model and on i-element or it is heteroskedastic(yi , IFi , . . .) ⇒ Fix X b

i = Xi

Obtain B standard bootstrap samples of size n from the residuals

(denoted by eb =(eb1 , . . . , e

bn

)′).

Homoskedasticity. Naive boostrap (ebi = ei∗) or Smoothed bootstrap(ebi = ebi + zbi , where zbi is normally distributed with mean 0 andvariance γes

2

R , (0 ≤ γe ≤ 1).)


Bootstrap on Regression Models II

Heteroskedasticity. Wild Bootstrap. ebi = f (ei )v∗i with

f (ei ) =

ei√

nn−kn

Opt1

ei/√1− hii Opt2

ei/(1− hii ) Opt3

and

v∗i =

{−(√5− 1)/2 with prob. (

√5 + 1)/2

√5

−(√5 + 1)/2 with prob. (

√5− 1)/2

√5

(Golden rule).

Let{θb}B

b=1the statistic associated for each bootstrap dataset

The �nal estimated is:

Con�dence Interval: Consider the (1− α)-quantile (c1−α) of{∣∣∣∣∣∣θb − θ∣∣∣∣∣∣}B

b=1and de�ne IC(1− α) =

{θ :∣∣∣∣∣∣θ − θ∣∣∣∣∣∣ ≤ c1−α

}Hypothesis testing: pθ =

∑Bb=1 1

{∣∣∣∣∣∣θb∣∣∣∣∣∣ ≤ ∣∣∣∣∣∣θ∣∣∣∣∣∣} /B


Tecator dataset

850 900 950 1000 1050

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Spectrometric curves

Wavelength (mm)

Abs

orba

nces

850 900 950 1000 1050

−0.

02−

0.01

0.00

0.01

0.02

0.03

0.04

0.05


Wavelength (mm)

d(A

bsor

banc

es,1

)

850 900 950 1000 1050

−0.

004

−0.

002

0.00

00.

002

0.00

4


Wavelength (mm)

d(A

bsor

banc

es,2

)Figure : Tecator example. From left to right: Absorbances, �rst and secondderivative coloured by the content of fat (blue=low, red=high)


Tecator example

data(tecator)

ab = tecator$absorp.fdata

ab2 = fdata.deriv(ab, 2)

dataf = as.data.frame(tecator$y) # Fat, Protein, Water

tt = ab[["argvals"]]

b.pc0 = create.pc.basis(ab, 1:4)

b.pc2 = create.pc.basis(ab2, 1:4)

basis.x = list(ab = b.pc0, ab2 = b.pc2)

f = Fat ~ ab + ab2

ldata = list(df = dataf, ab = ab, ab2 = ab2)

res = fregre.lm(f, ldata, basis.x = basis.x)


Tecator results

summary(res)

>

> Call:

> lm(formula = pf, data = XX, x = TRUE)

>

> Residuals:

> Min 1Q Median 3Q Max

> -10.8067 -1.9219 0.2561 1.8306 9.0273

>

> Coefficients:

> Estimate Std. Error t value Pr(>|t|)

> (Intercept) 18.14233 0.20772 87.342 < 2e-16 ***

> ab.PC1 0.15511 0.08402 1.846 0.06633 .

> ab.PC2 4.70801 1.52557 3.086 0.00231 **

> ab.PC3 -13.37410 4.58308 -2.918 0.00391 **

> ab.PC4 0.26779 2.46191 0.109 0.91349

> ab2.PC1 3437.06617 386.05052 8.903 2.85e-16 ***

> ab2.PC2 2688.52106 1525.50024 1.762 0.07949 .

> ab2.PC3 932.68030 432.69736 2.156 0.03228 *

> ab2.PC4 628.03681 767.97070 0.818 0.41442

> ---

> Signif. codes:

> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

>



....


Tecator results II

summary(res)

....

> ab2.PC4 628.03681 767.97070 0.818 0.41442

> ---

> Signif. codes:

> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

>



....


Tecator Diagnosis I

10 20 30 40 50 60

-10

-50

510

Fitted values

Res

idua

ls

Residuals vs Fitted

43

44

7

-3 -2 -1 0 1 2 3

-4-2

02

4


Sta

ndar

dize

d re

sidu

als

Normal Q-Q

43

7

44

10 20 30 40 50 60

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale-Location43

744


Tecator Diagnosis II

850 900 950 1000 1050

2.0

3.5

5.0


Wavelength (mm)

Abs

orba

nces

850 900 950 1000 1050

-0.0

040.

002


Wavelength (mm)

d(A

bsor

banc

es,2

)

850 900 950 1000 1050

-1.5

0.0

1.5

Beta ab, r^2: 0.218

t

rota

tion

850 900 950 1000 1050

-100

00

1000

Beta ab2, r^2: 0.707

t

rota

tion


Table of Contents




4 ExamplesTecator


Non Linear Model [Ferraty and Vieu (2006)]

Suppose (X , y) are a pair of r.v. with y ∈ R and X ∈ E where E is asemi-metric space. To predict the response Y with X , the naturalestimator is the conditional expectation:

m(X ) = E(Y |X = X ),

where the NW estimator is given by:

m(X ) =

∑ni=1 YiK (h−1d(X ,Xi ))∑ni=1 K (h−1d(X ,Xi ))

,

where K is a asymmetric kernel function and h is the bandwidthparameter.Cross-Validation hopt = argmin CV (h)

CV (h) =n∑

i=1

(yi − m(−i)(Xi )

)2or any of the GCV methods (MSC).


Conditional distribution

Another alternative is to use the cumulative conditional distribution

FY |X=X (y) = FXY (y) = P(Y ≤ y |X = X )

and computing from this, for example, the median or the quantiles

med(X ) = inf {y ∈ R,FY |X=X (y) ≥ 1/2}

tα(X ) = inf {y ∈ R,FY |X=X (y) ≥ α}


Asymptotics

Conditions for regression function:

m : E→ R, limd(X ′,X )→0

m(X ′) = m(X ),

m : E→ R, |m (X )−m (X ′) | < Cd (X ′,X )β

Conditions for conditional distributions

F : E× R→ R, limd(X ′,X )→0

FX′

Y (y) = FXY (y), limd(y ′,y)→0

FXY (y ′) = FXY (y)

F : E× R→ R, |FX′

Y (y ′)− FXY (y)| < C(d (X ′,X )

β+ d (y ′, y)

β)

Indeed, the small ball probability condition is neededP(X ∈ B(X , ε)) = ϕX (ε) > 0 and the existence of conditional momentsgreater than 2.


Semi Linear Model [Aneiros-Pérez and Vieu (2006)]

Let (X ,Z, y) with y ∈ R (response), X ∈ E (functional) and Z ∈ Rp

(MV covariates).y = Zβ + m(X ) + ε

The parameters of the model are estimated by:

βh =(ZthZh

)−1Zthyh,

mh(X ) =∑n

i=1Wnh(X ,Xi )(yi − Zti βh)

whereZh = (I−Wh)Z, yh = (I−Wh) y, Wh = Wnh (Xi ,Xj)ij ,

Wnh (X ,Xi ) = K(d(X ,Xi )/h∑nj=1 K(d(X ,Xj )/h


Diagnosis, Residuals, In�uence

Fitted values: yi = HXY where HX is the projection or smoothingmatrix (n × n)

Residuals: e = (I − HX )Y

Eq. degrees of freedom: df (H) = tr(H)

Cov(Y |X1, . . . ,Xn) = σ2HX .

Residual variance: s2R = e′en−df (HX ) .

In�uence: (0 ≤ HX ,ii ≤ 1).So, we can label those observations (Xi , yi ) with more in�uence thanthe average (3 df (HX )/n).


Tecator example I

fat = tecator$y$Fat

res.np = fregre.np(ab2, fat, h = 5e-04)

summary(res.np)

> *** Summary Functional Non-linear Model ***

>

> -Call: fregre.np(fdataobj = ab2, y = fat, h = 5e-04)

>

> -Bandwidth (h): 5e-04

> -R squared: 0.9928937

> -Residual variance: 1.626762 on 151.737 degrees of freedom

> -Names of possible atypical curves: No atypical curves

> -Names of possible influence curves: 5 6 7 10 11 31 33 34 35 43

> It prints only the 10 most influence curves


Tecator example II

10 20 30 40 50

010

2030

4050

R-squared= 0.99

Fitted values

y

10 20 30 40 50

-3-1

12

3


Fitted values

Res

idua

ls10 20 30 40 50

0.0

0.5

1.0

1.5

Scale-Location

Fitted values

Sta

ndar

dize

d re

sidu

als

0.0 0.2 0.4 0.6 0.8 1.0

050

100

200

Leverage

Leverage

Inde

x.cu

rves

56710113133 343543

99

122131132140143

171174175183

-3 -2 -1 0 1 2 3

-3-1

12

3

Residuals


Sam

ple

Qua

ntile

s

-3-1

12

3

Residuals


Table of Contents




4 ExamplesTecator


Generalized Linear Models

Let y belonging to a Exponential Family PDF:

f (y ; θ, τ) = h(y ; τ) exp

(b(θ)T (y)− A(θ)

d(τ)

)where h(y ; τ), b(θ), T (y), A(θ) and d(τ) are known. In this case,E (Y ) = µ = A′(θ) and Var(Y ) = A′′(θ)d(τ).y is related with a covariate X(X ) through a linear predictor η = Xβ(〈X , β〉) and a link function g such that E (y) = µ = g−1(η).Distribution Link Function Mean VarianceNormal Identity: η = µ µ = η 1

Binomial Logit: η = ln( µ1−µ ) µ = 1

1+exp(−η) µ(1− µ)

Poisson Log: η = ln(µ) µ = exp(η) µGamma Inverse: η = 1/µ µ = 1/η µ2


Estimation of η

Typically, to estimate η, project X and β onto a �nite number ofelements of a functional basis:η = 〈X , β〉 ≈

∑pXi=1

∑pβj=1 xi 〈φi , ψj〉βj = xTJβ

with X (t) =∑pX

i=1 xiφi (t) and β(t) =∑pβ

j=1 βjψj(t)

Fixed basis: B-spline, Wavelets, Fourier.James (2002), . . .

Functional Principal Components (FPC).Cardot and Sarda (2005); Escabias et al. (2004, 2005); Müller andStadtmüller (2005),. . .

Partial Least Squares (FPLS).Preda and Saporta (2005), Escabias et al. (2007). . .


Estimation of Generalized Linear Models

Iterated Reweighted Least Squares (IRLS)

Let η0 = Xβ0 (〈X , β0〉) the initial or current estimate of the linearpredictor with �tted value µ0 = g−1(η0)

Form the adjusted dependent variate z0 = η0 + (y − µ0)g ′(µ0)

De�ne the weights W0 = 1/(Var [µ0] g ′(µ0)2)

Regress z0 on the covariates X with weights W0 to obtain newestimates β0, (η0, µ0)

Repeat until changes in parameters and/or deviance are small


Generalized Additive Models (MV)

As in GLM, the response variable y is estimated through a sum ofsmooth functions of the covariates X and a g link function.

E (y) = µ = g−1(β0 +K∑j=1

fj (Xj))

with Xj the columns of X and E (fj(Xj)) = 0ESTIMATION: IRLS mixed with BACKFITTING steps

Let η0 = β0 +∑K

j=1 fj (Xj), the initial or current estimate of the

linear predictor with �tted value µ0 = g−1(η0)

Form the adjusted dependent variate z0 = η0 + (y − µ0)g ′(µ0)

De�ne the weights W0 = 1/(V (µ0)g ′(µ0)2)

Regress using Back�tting steps z0 on the covariates X with weightsW0

Repeat until changes in functions and/or deviance are small


Functional Spectral Additive Models Müller and Yao (2008)

Consider the PC representation of X

X (t) = µ(t) +∑k

xkvk(t)

where vk(t) is the k eigenfunction and xk the scores. Then, theFunctional Spectral Additive Model is de�ned as:

Y = β0 +K∑

k=1

fk(xk) + ε

with with E (ε) = 0, Var [ε] = σ2 and E (fk(xk)) = 0,∀k = 1, 2, . . . ,K


Functional Generalized Spectral Additive Models

Consider (again) the PC representation of X (or other representation)

X (t) = µ(t) +∑k

xkvk(t)

where vk(t) is the k eigenfunction and xk the scores.Then, the Functional Generalized Spectral Additive Model is de�ned toverify:

E (y) = g−1

(β0 +

K∑k=1

fk(xk)

)with E (fk(xk)) = 0,∀k = 1, 2, . . . ,K


Functional Generalized Kernel Additive ModelsFebrero-Bande and González-Manteiga (2013)

Given several functional variables (X 1, d1), . . . , (X p, dp) (dj is asemi-metric)Then, the Functional Generalized Kernel Additive Model is de�ned toverify:

E (y) = µ = g−1

(β0 +

K∑k=1

fk(X k)

)with E (fk(X k)) = 0,∀k = 1, 2, . . . , pIn the back�tting step, the functional non parametric method is used

fk(X k0 ) =

N∑i=1

(yi − β0 −

∑j 6=k fj(X

ji ))K(dk(X k

0 ,X ki )/hk

)∑N

j=1 K(dk(X k

0 ,X kj )/hk

)


Practical considerations

Our model only uses distances between data → Other spaces thanL2.How to avoid concurvity in FDA? The Distance Correlation proposedby Székely et al. (2007) works although is not yet proved for FDA.

Avoiding over�tting. Control the global amount of smoothing ateach step. GCV.

Convergence. Using Buja et al. (1989), the global convergence isensured and also oracle property.

Boundary e�ect in FDA is closely related to small ball probabilities.Are your data closely surrounded with your chosen semi-metrics?


Table of Contents




4 ExamplesTecator


fda.usc Febrero-Bande and Oviedo de la Fuente (2012)

Let fat,ab,ab1 and ab2, the response and the covariates.

ldata=list(df=data.frame(fat=fat),

ab=ab,ab1=ab1,ab2=ab2)

b.pc0=create.pc.basis(ab,1:4)

b.pc1=create.pc.basis(ab1,1:4)

b.pc2=create.pc.basis(ab2,1:4)

basis.x=list(ab=b.pc0,ab1=b.pc1,ab2=b.pc2)

Correlation Distances Székely et al. (2007)

R d2(fat) d2(X ) d2(X ′) d2(X ′′)d2(fat) 1.000 0.454 0.886 0.956d2(X ) 0.454 1.000 0.669 0.497d2(X ′) 0.886 0.669 1.000 0.930d2(X ′′) 0.956 0.497 0.930 1.000


FGLM

res.glm=fregre.glm(fat∼ab+ab2,data=ldata, basis.x=basis.x)Estimate Std. Error t value Pr(> |t|)

(Intercept) 18.14233 0.20772 87.342 < 2e-16 ***ab.PC1 0.15511 0.08402 1.846 0.06633 .ab.PC2 4.70801 1.52557 3.086 0.00231 **ab.PC3 -13.37410 4.58308 -2.918 0.00391 **ab.PC4 0.26779 2.46191 0.109 0.91349ab2.PC1 3437.06617 386.05052 8.903 2.85e-16 ***ab2.PC2 2688.52106 1525.50024 1.762 0.07949 .ab2.PC3 932.68030 432.69736 2.156 0.03228 *ab2.PC4 628.03681 767.97070 0.818 0.41442

Residual standard error: 3.046 on 206 d.f.

Multiple R-squared: 0.945, Adjusted R-squared: 0.9428

F-statistic: 442.3 on 8 and 206 DF, p-value: < 2.2e-16

cor(fat, 〈β1, ab〉)2 = 21.8%, cor(fat, 〈β2, ab2〉)2 = 70.7%


850 900 950 1000 1050

2.0

3.0

4.0

5.0


Wavelength (mm)

Abs

orba

nces

850 900 950 1000 1050

−0.

004

0.00

00.

004


Wavelength (mm)

d(A

bsor

banc

es,2

)850 900 950 1000 1050

−1.

50.

01.

0

beta.est

t

rota

tion

850 900 950 1000 1050

−10

000

500

beta.est

t

rota

tion

Figure : Tecator example. Estimation of beta parameters


GSAM

res.gsam=fregre.gsam(fat∼s(ab)+s(ab2),data=ldata,basis.x=basis.x)Parametric coe�cients: Estimate Std. Error t value Pr(> |t|)

(Intercept) 18.14233 0.05041 359.9 <2e-16 ***

Approximate signi�cance of smooth termsedf Ref.df F p-value

s(ab.PC1) 5.548 6.654 4.696 0.000111 ***s(ab.PC2) 1.000 1.000 27.491 4.40e-07 ***s(ab.PC3) 1.980 2.536 17.891 8.23e-09 ***s(ab.PC4) 7.127 8.126 4.471 5.38e-05 ***s(ab2.PC1) 7.115 8.110 242.865 < 2e-16 ***s(ab2.PC2) 7.381 8.305 5.004 1.03e-05 ***s(ab2.PC3) 8.276 8.797 5.052 5.61e-06 ***s(ab2.PC4) 5.986 7.130 7.532 4.52e-08 ***

R-sq.(adj) = 0.997 Deviance explained = 99.7%

GCV score = 0.6927 Scale est. = 0.54638 n = 215

cor(fat,∑K

k=1 fk (xabk ))2 = 35.2%, cor(fat,

∑Kk=1 fk (x

ab2k ))2 = 89.6%


GKAM

res.gkam=fregre.gkam(fat∼s(ab)+s(ab2),data=ldata)

alpha= 18.2 n= 215 Converged? Yes Iterations:4Smoothed termsh cor(f(X),eta) edf

f(ab2) 0.000371 1.000 88.7f(ab) 9.410000 0.409 1.6

Residual deviance= 116.361 Null deviance= 34735.44

AIC= 662.88 Deviance explained= 99.7 %

R-sq.= 0.997 R-sq.(adj)= 0.994

cor(fat, f1(ab))2 = 16.9%, cor(fat, f2(ab2))2 = 99.6%


●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●●

●

●

●

●●

●●●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●●●

●●●●●

●●●●●●

●

●

●

●

●●●●

●

●

●●●●

●

●●●●

●●●●●

●●

●● ●●●

●

●●

●●

●●

●●

●

●

● ●

●

●

●

●●

●

●●

●

●●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●●

●

● ●

●

● ●

●

●

●

●●

●

●●●

●

●●

●●●● ●

●

●

● ●●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

● ●

●● ●

●●

●

●

●

●

●●

●●●

●

●

●●●

●● ●

●●

●

●●

●●

●● ● ●

●●

●

0 10 20 30 40 50

−10

−5

05

GLM

Fat

Res

idua

ls ●●

●● ● ●●

●

●● ●

●

●

●

●

●●●

●

●

●

●

●

●●

●●●●●

●

●●

● ● ● ●●

●●

●

●●●●●

●●

●●●●●

●●●●

●

●●●●

●●●●

●●

●

●

●

●●●●●

●●

●●

●

●●●●●●

●

●● ●●

●●●●

●●●●

●●

●

●

● ●●

●

●

●●

●

●●●●●

●●

●

●●

●

●

● ● ●

●●

●

●●

● ●

●● ●

●

●●

●

●●●

●

●●●●●●

●●

●●●●

●● ●

●●

●

●●

●●

●

● ●

●

●

●

●

●●

● ●●

●

●

●

● ● ●●●

●

●●●●

●

●●●

●●●●

●

●

●●

●

●●

●●

● ● ●●

● ●

0 10 20 30 40 50

−10

−5

05

GSAM

Fat

Res

idua

ls

● ●●● ● ●●● ●● ●●

●●● ●●●●●● ●●● ●●●●● ● ● ●● ● ● ● ●●● ●● ● ●●●● ●● ●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●● ● ●●● ●●●●●●●●●●●● ●●●●● ● ●●●●● ●● ● ● ● ● ● ● ● ● ● ●

●● ●●●●●●●● ●●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●●● ●●●●●●●● ●●●● ● ● ●●● ● ● ● ●● ● ● ● ● ●

0 10 20 30 40 50

−10

−5

05

GKAM

Fat

Res

idua

ls

●

●

●● ●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●●●

●●●●●●

●●●●●

●

●

●●

●●

●●

●

●

●●●

●

●

●●●●●

●●●●

●●

●● ●

●

●

●

●

●

●●

●●

●

●

●

●

● ●

●

●

●

●●

●

●●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●● ●

●

● ●

●

●

●

●

●

●

●●●

●

●●

●●●● ● ● ●

● ●

●

● ● ●

●

●

●

●

●

●

●

●

●

●

●

●●

●

● ●

●●

●

●

●

●

●●

●●●

●

●

●

●●●●

●

●

●

●

●

●

●

●

●● ● ●

●

●

●

0 10 20 30 40 50

02

46

810

GLM

Fat

|Res

idua

ls|

●●

●● ● ●●

●

●

●●●

●

●● ●●

●●●●

●●●●

●●●●

●

●

●●

● ● ●●●

●●

●●

●●●●●

● ●

●●

●●●●●●

●

●●●●●●

●●

●

●●●

●●

●●●

●●

●●●●

●

●●●●

●

●

●●

●●●

●

●●

●

●●●●●

●

●

● ●

●● ●

●●

●

●●●●●

●

●●

●●●

●

● ● ●●

●

●

●●

● ●

●●

●

●●

●●

●

●●

●

●●●●●●●●

●●

●●●

●●

●

●●

●

●

●●

●

● ●

●●

●

●

●● ●

●

●●

●

●

● ● ●●●

●

●●●●

●●●●

●●●●

●

●

●

●●●

●● ●● ● ●

● ● ●

0 10 20 30 40 50

02

46

810

GSAM

Fat

|Res

idua

ls|

● ●●● ● ●●● ●● ●

●●●● ●●●●●● ●●● ●●●●● ● ● ●● ● ● ● ●●● ●● ● ●●●● ●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●●●●●●●●●●●● ●● ● ●●● ●●●●●●●●●●●● ●●●●● ● ●●●●● ●

●● ● ● ● ● ● ● ● ● ●

●● ●●●●●●●● ●●●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●●● ●●●●●●●● ●●●● ● ● ●●● ● ● ● ●● ● ● ● ● ●

0 10 20 30 40 50

02

46

810

GKAM

Fat

|Res

idua

ls|

Figure : Diagnostic plots for Tecator example


Fat

10 20 30 40 50 60

●

●

●●

●

●●

●

●●

●

●●●●

●●●● ●●●● ●

● ●●●●●●

●●

●●●

●●●

●●

●

●●●

●●

●●●●●●●●●●●●●●●●●●● ●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●● ●

●●● ●●●● ●●●● ●●

●●●●●●● ●●

●● ●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●● ●●●

●●●●●●●●

●●●●

●●

●●

●●●

●●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●●●●●●● ●●

●●●●●●

●● ●●

●● ●●

●●

●

●

●

●

●

●●

●

●●

●

●●

●

●●●● ●●●● ●● ●● ●

● ●●●●●●

●●

●●●

●●●

● ●

●

●●●

●●

●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●● ●●●

●●●●●●●●●●●●

● ●●

●● ●●●● ●●●●●● ●●●

●●●●●●

● ●●

●

●

●

●●

●●

●●

●

●

●

●

●

●

●●●●

●●●●●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●

●

●●

●●

●●

●

●

●

●

●

●

●●●●●●●●●

●●●●●●

●●●●

●● ●●

●●

●

●

●

10 20 30 40 50

010

2030

4050

●

●

●●

●

●●

●

●●

●

●●●●

●●●●●●●● ●

●●●●●●

●●●

●●●

● ●●

●●

●

●●●

●●

●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●● ●●● ●●●●

●●●●●●●●● ●●●●●

● ●●

● ●●●●●●●●●● ●●●●●●●●

●●

●●●

●

●

●

●●●

●●

●

●

●

●

●

●

●

●●●●●●●●●

●●●●●

●●●●●

●●●●

●●

●●

●●

●

●

●●

●●

●●

●

●

●

●

●

●

●●●

●●●●●● ●●

●●●

●● ●●●

●●●●

●●

●

●

●

1030

50

●

●

●●

●

●●

●

●●

●

●

●●●●●●

●●●

●●

● ●●●●● ● ●

●●

●

●

●●●

●

●●

●

●●●

● ●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●

●●●

●●●

●●●

●●●●●●

●●●

●●●●

●●

●●

●

●

●

●

●●

● ● ●● ●

●

●●

●

●

●●●●●●●

●●●●●●●

● ●●●●

●●

●●

●●●

●

●

●

●

●●

● ●● ●

●

●

●

●

●

●

●●

●●●●●●●●●●●●

● ●●●

●●●

●●●

●●

●

●

●

FGLM Est.5.5%

Pr.CV 1.91 ●

●

●●

●

●●

●

●●

●

●

●●● ●●●

●●● ●●

● ●●●●●●●

●●

●

●

●●●●

● ●

●

●●●

● ●

●●●●●●●●●●●●●●●●●●●

●●●●● ●●●●●●●●●●●●●●●● ●●●

●●●●●●●●●

●●●●●●

●●●●

●●●●●●●●

●●●

●●

●●●●

●●

●

●

●

●

●●

● ● ●● ●

●

●●

●

●

●●●●●●●

●●●●●● ●● ●● ●●

●●

●●

●● ●

●

●

●

●

●●

● ●● ●

●

●

●

●

●

●

●●●●●●●

●●● ●●●●

●●●●

●●●

●●●

●●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●●● ●●●

●●●●●

● ●●●●● ● ●

●●

●

●

●● ●

●

●●

●

●●●

●●

●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●

●● ●●●● ●●● ●●●● ●●●●●●●●● ●●●●●●● ●

● ●●

●●●

●●●●● ●

●●●

●●●●●●

●●

●

●

●

●

●●● ●●

● ●

●

●●

●

●

●●

●●●●●●● ●●●●

●● ●●●●

●●●●

●●●

●

●

●

●

●●

● ●● ●

●

●

●

●

●

●

●●

●●●●●●●● ●●●

●● ●● ●

●●●

●●●

●●

●

●

●

●

●

●●

●

●●

●

●●

●

●

●●●●●

●●●●

●●

●●

●●●● ● ●

●●●

●●●●●

●●

●

●●●

●●

●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●

●

●●●

●

●

●

●●

●●

●

●●

●

●●

●

●

●

●●●●●●●●●●●

●●

●●●

●●

●● ●●

● ●●

●

●●

●

●

●

●● ●

●

●

●

●

●

●

●

●

● ●●●●●●●●

●●●●● ●

●●●●

●●

●●●

●●

●

●

●

●

●●

●

●●

●

●●

●

●

●●●●●

●●●●

●●

●●

●●●●●●

●●●

●●●● ●

●●

●

●●●

●●

●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●

●●●●

●●●●

●●●● ●●●●●

●●●● ●●

●● ●

●

●

●

●●

●●

●

●●

●

●●

●

●

●

● ●●●●●●●●

●●●

●●●●

●●

●●●●

●●●

●

●●

●

●

●

●●●

●

●

●

●

●

●

●

●

●●●●●●● ●●●●●●

●●●●

●●

●●

●●●

●●

●

●

GSAM Est. 0.3% Pr.CV 0.59

010

2030

4050

●

●

●●

●

●●

●

●●

●

●

●●●●●●●

●●●●

●●

●●●●● ●

●●●

●●● ●●

●●

●

●●●

●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●● ●

●● ●●●● ●●● ●●●● ●●

●●●●●●● ●●●●●●● ●

● ●●●

●●●●●●● ●

●●●●●●●●

●

●●●

●

●

●

●●●

●●

●●

●

●●

●

●

●

●●●●●●●● ●●●

●●

●●●●●

●●●●

● ●●

●

●●

●

●

●

●● ●

●

●

●

●

●

●

●

●

● ●●●●●●●●

●●●●● ●

● ●●●

●●

●●●

●●

●

●

0 10 20 30 40 50

1020

3040

50

●

●

●●

●

●●

●

●●

●

●●●●

●●●●●●●●●

●●●●●●

●●●

●●

● ●●●

●●

●

●●●

● ●

●

●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●●●●

●●

●●●●●●●●●●●

● ●●

●●

●●●●●●●●●●

●●●●●●●●●

●

●●●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●●

●●●●●●●●●●●●

●●

●●●●

●

● ●●●

●●

●

●●

●

●

●●

●●

●●

●

●

●

●

●

●

●●●●●●●●●

●●●●●

● ●●●●

●

●●●●

●●

●

●

●

●

●●

●

●●

●

●●

●

●●●●

●●●● ●●●●●

● ●●●●●●

●●

●●

● ●● ●

●●

●

●●●

●●

●

●●●●●●●●●●●●●●●●●● ●●●●●●●●

●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●

●●● ●●●● ●●●● ●

● ●●●●●●● ●

●

●● ●

●

●

●

●

● ●●●

●

●

●

●

●

●

●●

● ●●●●●●●

●●●●●

●●●●●

●

●●●●

●●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●●●●●●● ●●

●●●●●

● ●● ●●

●

● ●●●

●●

●

●

0 10 20 30 40 50

●

●

●●

●

●●

●

●●

●

●●●●

●●●● ●● ●●●

● ●●●●●●

●●

●●

● ●●●

● ●

●

●●●

● ●

●

●●●

●●●●●●●●●●●●●●● ●●●●● ●●●●●●

●●●●●●●

●●● ●

●●

●●●●●●●●●●●

●●●●

●● ●●

●● ●●●●●● ●●●

●●●●●●

● ●●

●

●

●

●

● ●● ●

●

●

●

●

●

●

●●

●●●●●●●●●●●●

●●

●● ●●

●

●●●●● ●

●

●●

●

●

●●

●●

●●

●

●

●

●

●

●

●●●●●●●●●

●●●● ●

● ●● ●●

●

● ●●●

●●

●

●

FGKAM Est. 0.007 %

Pr.CV 0.83

Figure : Fitted values for Tecator example


Bernouilli response: I(Fat≥ 15%)).165 random observations as Training set (50 for testing)

Method Sample Min. 1st. Qu. Median Mean 3rd. Qu. Max.GLM Train. 100% 100% 100% 100% 100% 100%

Test 88.0% 96.0% 98.0% 97.5% 98.0% 100%GSAM Train. 100.0% 100.0% 100.0% 100% 100% 100%

Test 54.0% 92.0% 94.0% 93.8% 98.0% 100%GKAM Train. 97.58% 98.18% 98.8% 98.7% 98.8% 100%

Test. 90.0% 96.0% 98.0% 97.9% 100.0% 100%

Table : Statistics for percentage of good classi�cation in 500 replications.


●●

●●

●●●

●

●● ●

●●●●●●●●●●●● ● ●●●●●

●●● ●●●●●●●●●●●●●

●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●● ●●● ●●●● ●

●

●●

●

●

● ●

●

● ●●● ●● ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●

●

●

●●● ●● ●

● ●●●●

●●●●●●●●●●●●●● ●

●

●● ●

●

●●●●●●●●●●●

●●●●● ● ●

● ●● ●●

●●●●●●●●●● ●●● ●●

●●

●●● ●●●●●●●●●

−5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

η

µ

●●

●●

●●●

●

●●

●

●

● ● ●

● ● ●●

●●●●

●

●

● ●●●

●

●

●

●

● ●●●●

●●●

● ● ●●

●●

●●●

●

●●●●●●●●

●●●

●●●

●

●●●

●●●●

●

●●

●●●

●●

●

● ●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●●

●●●

●●

● ●●●

●●● ●●

●●●●

●

●●●●●

●

●

●●

●

●●

●

●

●●●●

●

●●

●●●●

●●

●●●●

●

●

●

●

●

●

●

●

●●

●●

●● ●●●●

●●

●●●

●

●

●

●●

●

●

●

●● ●●● ●● ●●

● ●

●

●●

● ●

●

●●

●●●

●●

●●● ●

−1.0 0.0 1.0 2.0

−5

05

10

f1(abs)

η

●●

●●

●●●

●

●●

●

●

●●●

●●●●

●●●●

●

●

●●●●

●

●

●

●

●● ●●●

●●●●●●

●

●●

●●●

●

●●●●●●●●

●●●●●●

●

●●●●●●●●

●●

● ●●●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●●

●●●●

●●●

●●● ●●

●●●

●

●●●●●

●

●

●●●

●●

●

●

● ●● ●

●

●●

●●●●●

●● ●● ●

●

●

●

●

●

●

●

●

●●

●●

●●●●●●

●●●●●

●

●

●

●●

●

●

●

●●●●●●●●●

●●

●

●●

●●

●

●●

● ●●

●●●●●

●

−5 0 5

−5

05

10

f2(abs″)

η

Figure : Estimation of the partial e�ects (gray=I(Fat≥ 15%)).

References References

References I

Aneiros-Pérez, G. and Vieu, P. (2006). Semi-functional partial linear regression.Statistics & Probability Letters, 76(11):1102�1110.

Antoniadis, A. and Sapatinas, T. (2003). Wavelet methods for continuous-timeprediction using hilbert-valued autoregressive processes. Journal of MultivariateAnalysis, 87(1):133�158.

Buja, A., Hastie, T., and Tibshirani, R. (1989). Linear smoothers and additive models.The Annals of Statistics, 17(2):pp. 453�510.

Cardot, H. (2000). Nonparametric estimation of smoothed principal componentsanalysis of sampled noisy functions. Journal of Nonparametric Statistics,12(4):503�538.

Cardot, H., Ferraty, F., and Sarda, P. (1999). Functional linear model. Statistics &Probability Letters, 45(1):11�22.

Cardot, H., Ferraty, F., and Sarda, P. (2003). Spline estimators for the functionallinear model. Statistica Sinica, 13(3):571�592.

Cardot, H., Mas, A., and Sarda, P. (2007). Clt in functional linear regression models.Probability Theory and Related Fields, 138(3):325�361.

Cardot, H. and Sarda, P. (2005). Estimation in generalized linear models for functionaldata via penalized likelihood. Journal of Multivariate Analysis, 92(1):24�41.


References II

Escabias, M., Aguilera, A., and Valderrama, M. (2004). Principal componentestimation of functional logistic regression: discussion of two di�erent approaches.Journal of Nonparametric Statistics, 16(3-4):365�384.

Escabias, M., Aguilera, A., and Valderrama, M. (2005). Modeling environmental databy functional principal component logistic regression. Environmetrics, 16(1):95�107.

Escabias, M., Aguilera, A., and Valderrama, M. (2007). Functional pls logit regressionmodel. Computational Statistics & Data Analysis, 51(10):4891�4902.

Febrero-Bande, M. and González-Manteiga, W. (2013). Generalized additive modelsfor functional data. TEST, pages 1�15. 10.1007/s11749-012-0308-0.

Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). fda.usc: Functional DataAnalysis. Utilities for Statistical Computing. R package version 1.0.0.

Ferraty, F. and Vieu, P. (2006). Nonparametric functional data analysis: theory andpractice. Springer.

Hall, P., Müller, H., and Wang, J. (2006). Properties of principal component methodsfor functional and longitudinal data analysis. The Annals of Statistics,34(3):1493�1517.

James, G. (2002). Generalized linear models with functional predictors. Journal of theRoyal Statistical Society: Series B (Statistical Methodology), 64(3):411�432.

Krämer, N., Boulesteix, A., and Tutz, G. (2008). Penalized partial least squares withapplications to b-spline transformations and functional data. Chemometrics andIntelligent Laboratory Systems, 94(1):60�69.


References III

Müller, H. and Stadtmüller, U. (2005). Generalized functional linear models. TheAnnals of Statistics, 33(2):774�805.

Müller, H. and Yao, F. (2008). Functional additive models. Journal of the AmericanStatistical Association, 103(484):1534�1544.

Preda, C. and Saporta, G. (2005). Pls regression on a stochastic process.Computational Statistics & Data Analysis, 48(1):149�158.

Ramsay, J. and Silverman, B. (2002). Applied functional data analysis: methods andcase studies, volume 77. Springer New York:.

Ramsay, J. and Silverman, B. (2005). Functional Data Analysis. Springer.

Silverman, B. (1996). Smoothed functional principal components analysis by choice ofnorm. The Annals of Statistics, 24(1):1�24.

Székely, G., Rizzo, M., and Bakirov, N. (2007). Measuring and testing dependence bycorrelation of distances. The Annals of Statistics, 35(6):2769�2794.

Yao, F. and Lee, T. (2005). Penalized spline models for functional principalcomponent analysis. Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 68(1):3�25.

Functional Regression Analysis

Education