Data mining and statistical learning - lecture 6 1 Overview Basis expansion Splines (Natural) cubic splines Smoothing splines Nonparametric logistic regression.

Data mining and statistical learning - lecture 6

1

Overview

• Basis expansion

• Splines

• (Natural) cubic splines

• Smoothing splines

• Nonparametric logistic regression

• Multidimensional splines

• Wavelets


2

Linear basis expansion (1)

Linear regression

True model:

Question: How to find ?

Answer: Solve a system of linear equations to obtain

x1 x2 x3 y

1 -3 6 12

… … … …

332211)( xxxxfy

f̂

321ˆ,ˆ,ˆ


3


Nonlinear model

True model:

Question: How to find ?

Answer: A) Introduce new variables

x1 x2 x3 y

1 -3 -1 12

… … … … 2143322211 sin3 xxexxxy x

f̂

21433

22211

,sin

,, 3

xuxu

exuxxu x


4


Nonlinear model

B) Transform the data set

True model:

C) Apply linear regression to obtain

u1 u2 u3 u4 y

-3 -1.1 -0.84 1 12

… … … … 44332211 uuuuy

4321ˆ,ˆ,ˆ,ˆ


5


Conclusion:

We can easily fit any model of the type

i.e., we can easily undertake a linear basis expansion in X

Example: If the model is known to be nonlinear, but the exact form is unknown, we can try to introduce interaction terms

M

mmm XhXf

1

21122

11111 XXXXXXf pp


6

Piecewise polynomial functions

Assume X is one-dimesional

Def. Assume the domain [a, b] of X is split into intervals [a, ξ1], [ξ 1, ξ 2], ..., [ξ n, b]. Then f(X) is said to be piecewise polynomial if f(X) is represented by separate polynomials in the different intervals.

Note The points ξ1,..., ξ n are called knots


7

Piecewise polynomials

Example. Continuous piecewise linear function

Alternative A. Introduce linear functions on each interval and a set of constraints

(4 free parameters) INS. FIG 5.1 lower left

Alternative B. Use a basis expansion (4 free parameters)

Theorem. The given formulations are equivalent.

2322

1211

333

222

111

yy

yy

xy

xy

xy

241321 ,,,1 XXhXXhXXhXh


8

Splines

Definition A piecewise polynomial is called order-M spline if it has continuous derivatives up to order M-1 at the knots.

Alternative definition An order-M spline is a function which can be represented by basis functions ( K= #knots )

Theorem. The definitions above are equivalent.

Terrminology. Order-4 spline is called cubic spline INS. FIG 5.2 LR

(look at basis and compare #free parameters)

Note. Cubic splines: knot-discontinuity is not visible

KlXXh

MjXXhM

llM

jj

,,1,

,,1,1

1


9

Variance of spline estimators – boundary effects

INSERT FIG 5.3


10

Natural cubic spline

Def. A cubic spline f is called natural cubic spline if the its 2nd and 3rd derivatives are zero at a and b

Note It implies that f is linear on extreme intervals

Basis functions of natural cubic splines

kK

Kkk

Kkk

XXXd

KkXdXdNXXNXN

33

1221

where

2...,,1,,,1


11

Fitting smooth functions to data

Minimize a penalized sum of squared residuals

where λ is smoothing parameter.

λ=0 : any function interpolating data

λ=+ : least squares line fit

dttfxfyfRSSN

iii

2

1

2,


12

Optimality of smoothing splines

Theorem The function f minimizing RSS for a given is a natural cubic spline with knots at all unique values of xi (NOTE: N knots!)

The optimal spline can be computed as follows.

yNNN

N

NyNy

TN

T

jiijNijij

NTT

TN

jjj

dttNtNxN

RSS

xNxNxf

1

''''

1

ˆ

,


13

A smoothing spline is a linear smoother

The fitted function

is linear in the response values.

ySyNNNN T

NTf

1ˆ


14

Degrees of freedom of smoothing splines

The effective degrees of freedom is

dfλ = trace(Sλ)

i.e., the sum of the diagonal elements of S.


15

Smoothing splines and eigenvectors

It can be shown that

where K is the so-called penalty matrix

Furthermore, the eigen-decomposition is

Note: dk and uk are eigenvalues and

eigenvectors, respectively, of K

1 KIS

k

k

N

k

Tkkk

d

1

11

uuS


16

Smoothing splines and shrinkage

• Smoothing spline decomposes vector y with respect to basis of eigenvectors and shrinks respective contributions

• The eigenvectors ordered by ρ increase in complexity. The higher the complexity, the more the contribution is shrunk.

N

k

Tkkk

1

,yuuyS


17

Smoothing splines and local curve fitting

• Eigenvalues are reverse functions of λ. The higher λ, the higher penalization.

• Smoother matrix is has banded nature -> local fitting method

• INSERT fig 5.8

N

k kdtracedf

1 1

1

S


18

Fitting smoothing splines in practice (1)

Reinsch form:

Theorem. If f is natural cubic spline with values at knots f and second derivative at knots then

where Q and R are band matrices, dependent on ξ only.

Theorem.

1 KIS

RQT f

TQQRK 1


19

Fitting smoothing splines in practice (2)

Reinsch algorithm

• Evaluate QTy

• Compute R+λQTQ and find Cholesky decomposition (in linear time!)

• Solve matrix equation (in linear time!)

• Obtain f=y-λQγ


20

Automated selection of smoothing parameters (1)

What can be selected:

Regression splines

• Degree of spline

• Placement of knots

->MARS procedure

Smoothing spline

• Penalization parameter


21


Fixing the degrees of freedom

• If we fix dfλ then we can find λ by solving the equation numerically

• One could try two different dfλ and choose one based on F-tests, residual plots etc.

N

k kdtracedf

1 1

1

S


22


The bias-variance trade off

INSERT FIG. 5.9

EPE – integrated squared

prediction error,

CV- cross validation

N

k kdtracedf

1 1

1

S


23

Nonparametric logistic regression

Logistic regression model

Note: X is one-dimensional

What is f:

Linear -> ordinary logistic regression (Chapter 4)

• Enough smooth -> nonparametric logistic regression (splines+others)

• Other choices are possible

)(

|0Pr

|1Prlog Xf

xXY

xXY


24


Problem formulation:

Minimize penalized log-likelihood

Good news: Solution is still a natural cubic spline.

Bad news: There is no analytic expression of that spline function

dttfflfl up

2

2

1,,min


25


How to proceed?

Use Newton-Rapson to compute spline numerically, i.e

• Compute (analytically)

1. Compute Newton direction using current value of parameter and derivative information

2. Compute new value of parameter using old value and update formula

T

pp

pp

ll

ll

2

2,

ppoldnew ll

12


26

Multidimensional splines

How to fit data smoothly in higher dimensions?

A) Use basis of one dimensional functions and produce basis by tensor product

Problem: Exponential INS FIG. 6.10

growth of basis with dim

XgXg

XhXhXg

jkjk

kjjk

,2211


27

Multidimensional splines

How to fit data smoothly in higher dimensions?

B) Formulate a new problem

• The solution is thin-plate splines

• The similar properties for λ=0.

• The solution in 2 dimension is essentially sum of radial basis functions

fJxfyi

ii 2min

jjT xxxxf 0


28

Wavelets

Introduction

• The idea: to fit bumpy function by removing noise

• Application area: Signal processing, compression

• How it works: The function is represented in the basis of bumpy functions. The small coefficients are filtered.


29

Wavelets

Basis functions (Haar Wavelets, Symmlet-8 Wavelets)

INSERT FIG 5.13


30

Wavelets

Example

Insert FIG 5.14

Data mining and statistical learning - lecture 6 1 Overview Basis expansion Splines (Natural) cubic splines Smoothing splines Nonparametric logistic regression.

Documents