Top Banner
1 Functional Data Analysis in Matlab and R James Ramsay, Professor, McGill U., Montreal Hadley Wickham, Grad student, Iowa State, Ames, IA Spencer Graves, Statistician, PDF Solutions, San José, CA
25

Functional Data Analysis in Matlab and R

Jan 09, 2016

Download

Documents

Di Di

Functional Data Analysis in Matlab and R. James Ramsay, Professor, McGill U., Montreal Hadley Wickham, Grad student, Iowa State, Ames, IA Spencer Graves, Statistician, PDF Solutions, San José, CA. Outline. What is Functional Data Analysis? FDA and Differential Equations Examples : - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Functional Data Analysis  in Matlab and R

1

Functional Data Analysis in Matlab and R

James Ramsay, Professor, McGill U., Montreal

Hadley Wickham, Grad student, Iowa State, Ames, IA

Spencer Graves, Statistician, PDF Solutions, San José, CA

Page 2: Functional Data Analysis  in Matlab and R

2

Outline • What is Functional Data Analysis?

• FDA and Differential Equations

• Examples: – Squid Neurons– Continuously Stirred Tank Reactor (CSTR)

• Conclusions

• References

Page 3: Functional Data Analysis  in Matlab and R

3

What is FDA? • Functional data analysis is a collection of

techniques to model data from dynamic systems – possibly governed by differential equations – in terms of some set of basis functions

• The ‘fda’ package supports the use of 8 different types of basis functions: constant, monomial, polynomial, polygonal, B-splines, power, exponential, and Fourier.

Page 4: Functional Data Analysis  in Matlab and R

4

Observations of different lengths • Observation vectors of different lengths

can be mapped to coordinates of a fixed basis set

• All examples in the ‘fda’ package have the same numbers of observations

• No conceptual obstacles to handling observation vectors of different lengths

Page 5: Functional Data Analysis  in Matlab and R

5

Time Warping

• “start” and “stop” are sometimes determined by certain transitions

• Example: growth spurts in the life cycle of various species do not occur at exactly the same ages in different individuals (even within the same species)

Page 6: Functional Data Analysis  in Matlab and R

6

10 Girls: Berkeley Growth Study• Tuddenham, R. D.,

and Snyder, M. M. (1954) "Physical growth of California boys and girls from birth to age 18", _University of California Publications in Child Development_, 1, 183-364.

ooo

ooo

oo

oo

ooooooooooooooooooooo

5 10 15

8010

012

014

016

018

0

age

Hei

ght

(cm

.)

ooooo

oo

oo

ooo

ooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

ooo

ooo

oo

oo

oooooooooo

ooooooooooo

oooo

oo

oo

oo

ooooooo

oooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

ooooo

oo

oo

oo

oo

oo

oooooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

oooo

o

oo

oo

ooooooo

ooooooooooooooo

Page 7: Functional Data Analysis  in Matlab and R

7

Acceleration • Growth spurts

occur at different ages

• Average shows the basic trend, but features are damped by improper registration

ooo

ooo

oo

oo

ooooooooooooooooooooo

5 10 1580

100

120

140

160

180

age

Hei

ght

(cm

.)

ooooo

oo

oo

ooo

ooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

oooo

oo

oo

oo

ooooooooooooooooooooo

ooo

ooo

oo

oo

oooooooooo

ooooooooooo

oooo

oo

oo

oo

ooooooo

oooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

ooooo

oo

oo

oo

oo

oo

oooooooooooooooo

ooo

ooo

oo

oo

ooooooooooooooooooooo

oooo

o

oo

oo

ooooooo

ooooooooooooooo

5 10 15

-4-3

-2-1

01

2

age

Gro

wth

acc

eler

atio

n (c

m/y

ear^

2)

Page 8: Functional Data Analysis  in Matlab and R

8

Registration • register.fd all

to the mean

• Not perfect, but better

5 10 15

-4-3

-2-1

01

2

ageG

row

th a

ccel

erat

ion

(cm

/yea

r^2)

5 10 15

-4-3

-2-1

01

2

warped age

Gro

wth

acc

eler

atio

n (c

m/y

r^2)

Page 9: Functional Data Analysis  in Matlab and R

9

A Stroll Along the Beach

• Light intensity over 365 days at each of 190*143 = 27140 pixels was – smoothed – functional principal components

• http://www.stat.berkeley.edu/~wickham/userposter.pdf

Page 10: Functional Data Analysis  in Matlab and R

10

Other fda capabilities

• Correlations – even with

series of different lengths!

• Phase plane plots – good

estimates of derivatives

Month

Me

an

Te

mp

era

ture

Jan Apr Jun Sep Dec

-10

05

15

j F

m

A

M

JJ A

S

O

N

D

Montreal average daily tempdeviation from average (C)

-10 -5 0 5 10 15 20

-0.0

06

0.0

00

0.0

06

Temperature (C)

Acc

ele

ratio

n

jF

m

A

M JJ

AS

O

N

D

j

Montreal average daily tempdeviation from average (C)

afda-ch03.Rfda-ch01.Rfda-ch02.R

Page 11: Functional Data Analysis  in Matlab and R

11

Script files for fda books • Ramsay and Silverman

– (2002) Applied Functional Data Analysis (Springer)

– (2006) Functional Data Analysis, 2nd ed. (Springer)

• ~R\library\fda\scripts– Some but not all data sets discussed in the

books are in the ‘fda’ package – Script files are available to reproduce some but

not all of the analyses in the books. – plus CSTR demo

Page 12: Functional Data Analysis  in Matlab and R

12

FDA and Differential Equations

• Many dynamic systems are believed to follow processes where output changes are a function of the outputs, x, and inputs, u (and unknown parameters ):

Tttt ,0,|, θux,fx

• Matlab was designed in part for these types of models

Page 13: Functional Data Analysis  in Matlab and R

13

Squid Neurons • FitzHugh (1961) - Nagumo et al. (1962) Equations:

Estimate a, b and c in: cbRaVR

RVVcV

33

Vol

tage

acr

oss

Axo

n M

embr

ane

Rec

over

y vi

a O

utw

ard

Cur

rent

s

V

R

Page 14: Functional Data Analysis  in Matlab and R

14

Tank Reactions • Continuously Stirred Tank Reactor (CSTR)

Tem

pera

ture

C

once

ntra

tion

Page 15: Functional Data Analysis  in Matlab and R

15

Functional Data Analysis Process1. Select Basis Set

2. Select Smoothing Operator – e.g., differential equation– equivalent to a Bayesian prior over coefficients

to estimate

3. Estimate coefficients to optimize some objective function

4. Model criticism, residual plots, etc.

5. Hypothesis testing

Page 16: Functional Data Analysis  in Matlab and R

16

Inputs to Tank Reaction Simulation

Page 17: Functional Data Analysis  in Matlab and R

17

ba

aFFaFF

FTFT

FFFF

FTTFT

TFTFCFTTFFdtdT

CFCFTdtdC

bb

CCTC

TT

CC

TCTT

CC

,,,:parameters 4

2

,130,

,

1110exp,

,,

,

co

co

1 coco

inin

incoinco

inref4

in

cocoininininco

ininin

Computations: Nonlinear ODE

• Compute Input vectors

• Define functions

• Call differential equation solver

• Summarize, plot

Tem

pera

ture

C

once

ntra

tion

estimate parameters (, , a, b)

Page 18: Functional Data Analysis  in Matlab and R

18

Three problems

• Estimate (, , a, b) to minimize SSE in Temperature only

function SSE SSE-minMatlab lsqnonlin 5.09888 0.00236R nls 5.09652 0

optim Nelder-Mead 5.09652 0BFGS 5.09652 0CG 5.09900 0.00248SANN 5.17504 0.07852

nlminb 5.09652 0

Page 19: Functional Data Analysis  in Matlab and R

19

0 10 20 30 40 50 601.2

1.4

1.6

C(t

)

Concentration (red = true, blue = estimated)

0 10 20 30 40 50 60330

340

350

360

T(t

)

Temperature

SSE(Temp, Conc)

• Matlab: lsqnonlin • R: nls

0 10 20 30 40 50 60

1.2

1.4

1.6

1.8

Concentration (red = true, blue = estimate)

C(t

)

0 10 20 30 40 50 60

33

03

40

35

03

60

Temperature

C(t

)

Matlab RConcentration 1.149E-03 1.145E-03Temperature 2.640E-04 2.636E-04

Median absolute relative error

Page 20: Functional Data Analysis  in Matlab and R

20

R vs. Matlab • Gave comparable answers

• R code for CSTR slightly more accurate but requires much more compute time – coded by different people

• R has helper functions not so easily replicated in Matlab – summary.nls – confint.nls – profile.nls

Estimate StdErr t Pr(>|t|) kref 0.466 0.004 113.0 < 2e-16 ***EoverR 0.840 0.009 94.7 < 2e-16 ***a 1.720 0.232 7.4 8.2e-13 ***b 0.496 0.050 10.0 < 2e-16 ***

Page 21: Functional Data Analysis  in Matlab and R

21

confint.nls• Likelihood-based confidence intervals:

generally more accurate than Wald intervals – Wald subject to parameter effects curvature – Likelihood: only affected by intrinsic curvature

> confintNlsFit 2.5% 97.5%kref 0.458 0.474EoverR 0.823 0.858a 1.300 2.222b 0.401 0.599

Page 22: Functional Data Analysis  in Matlab and R

22

0.455 0.465 0.475

0.0

1.0

2.0

0.82 0.84 0.86

0.0

1.0

2.0

1.2 1.6 2.0 2.4

0.0

1.0

2.0

0.40 0.50 0.60

0.0

1.0

2.0

plot.profile.nls• for a plot

showing the sqrt(log(LR))

0.455 0.465 0.475

0.0

1.0

2.0

0.82 0.84 0.86

0.0

1.0

2.0

1.2 1.6 2.0 2.4

0.0

1.0

2.0

0.40 0.50 0.60

0.0

1.0

2.0

kref EoverR

a b

50

99

80

9590

Page 23: Functional Data Analysis  in Matlab and R

23

Conclusions

• R and Matlab give comparable answers

• R:nls has helper functions absent from Matlab:lsqnonlin

• Functional data analysis tools are key for – estimating derivatives and – working with differential operators

Page 24: Functional Data Analysis  in Matlab and R

24

References

• www.functionaldata.org

• Ramsay and Silverman (2006) Functional Data Analysis, 2nd ed. (Springer)

• ________(2002) Applied Functional Data Analysis (Springer)

• Ramsay, J. O., Hooker, G., Cao, J. and Campbell, D. (2007) Parameter estimation for differential equations: A generalized smoothing approach (with discussion). Journal of the Royal Statistical Society, Series B. To appear.

Page 25: Functional Data Analysis  in Matlab and R

25

NOT free-knot splines

• For this, see – DierckxSpline package – Companion to Dierckx, P. (1993). Curve and

Surface Fitting with Splines. Oxford Science Publications, New York.

• R package by Sundar Dorai-Raj – links to Fortran code by Dierckx available from

www.netlib.org/dierckx

• soon to appear on CRAN