Generalized Cross Validation - KTHszepessy/inversfor/marten.pdfGeneralized Cross Validation (GCV) The Generalized Cross Validation (GCV) De nition Let A ( ) be the in uence matrix

Innehåll Generals What Regularization Parameter λ is Optimal? Generalized Cross-Validation Discussion

Generalized Cross Validation

Mårten Marcus

26 november 2009

Mårten Marcus Generalized Cross Validation


Plan

1 Generals

2 What Regularization Parameter λ is Optimal?

Examples

Dening the Optimal λ

3 Generalized Cross-Validation

Cross Validation

Generalized Cross Validation (GCV)

Convergence Result

4 Discussion



Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion



References

Chapter 4 of Spline models for Observational Data (1990) -

Grace Wahba

Optimal Estimation of Contour Properties by Cross-Validated

Regularization (1989) - Behzad Shahraray, David Anderson

Smoothing Noisy Data with Spline Function (1979) - Peter

Craven, Grace Wahba



Conditional Expectations

From probability theory, we have for X ∈ L2Ω,F ,P

E[X ] = argminθ∈R

E[(X − θ)2

](1)

The least squares estimate therefore gives discretized estimate of θand a natural norm to choose when searching for data disturbed by

white noise



Ill-posedness

Consider the model yi = g(ti ) + εi , i = 1, 2, . . . , n, ti ∈ [0, 1] where

g ∈W(m)2

= f |f ′, f ′′, . . . , f (m−1)abs.cont., f (m) ∈ L2[0, 1] andεi ∼WN(0, σ2) where σ is unknown.

The least squares estimate gives

mingi∈W

(m)2

n∑i=1

(yi − gi )2 = 0 ∀yini=1 (2)

The minimum does not depend on data and is clearly ill-posed.



Regularization

We may choose to regularize, or smooth, the data as

gn,λ = argmingi∈W

(m)2

n∑i=1

(yi − g(ti ))2 + λ

∫1

0

(g (m)(t))2dt λ ≥ 0

(3)

which has a unique solution.



Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion



Properties of gn,λ

The reason for the term smoothing spline is the following

For λ = 0, gn,λ can be seen as an interpolating spline

For λ =∞, gn,λ is a single polynomial of degree m − 1

(optimal in the least squares sense)

For 0 < λ <∞, it can be shown that gn,λ is composed by

polynomials of degree at most 2m − 1 on the intevals

[ti , ti+1], i ∈ 1, 2, . . . , n − 1 such that the function and its

derivatives up to and including the 2m − 2 derivative are

continuous at the knots, and f (k)(t1) = f (k)(tn) = 0 for

k = m,m + 1, . . . , 2m − 2, i.e. natural conditions in the end

points.

From which we can see that there is a strong dependence between

the quality of the result and a good choice of λ.



Examples

Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion



Examples

Example

Example of impact of dierent λ on gn,λ and g ′n,λ for

yi = g(ti ) + εi where g(t) = sin( πt180

), t ∈ [0, 360]




Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion




True Mean Square Error R(λ) and Optimal λ

Using the notation above we dene the true mean square error,

R(λ), as

R(λ) :=1

n

n∑i=1

(gn,λ(ti )− gti )2 (4)

The optimal λ is then dened as

λ∗ = argminλ∈R+

R(λ) (5)



Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion



Cross Validation

Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion



Cross Validation

Intuition

Given our sample of n measurements, we leave one measurement

out and predict that data point by the use of the n − 1 remaining

measurements.

The idea is now that the best modell for the measurements is the

one that best predicts each measurement as a function of the

others.



Cross Validation

The Ordinary Cross-Validation (OCV)

Denition

Let gkn,λ, k ∈ 1, 2, . . . , n be dened as

g[k]n,λ = argmin

gi∈W(m)2

n∑i=1i 6=k

(yi − g(ti ))2 + λ

∫1

0

(g (m)(t))2dt λ ≥ 0 (6)

Then we dene the OCV mean square error as

V0(λ) =1

n

n∑i=1

(g[i ]n,λ(ti )− yi )

2 (7)



Cross Validation

The Ordinary Cross-Validation (OCV)

Denition

Let V0(λ) be dened as in the previous denition, the Ordinary

Cross-Validation Estimate of λ is

λ0 = argminλ∈R+

V0(λ) (8)



Cross Validation

Why we need more

The OCV estimate have proven good accuracy for certain periodic

functions, but in general it lacks a feature shown by the following:

Consider the function g(t) ∈W(m)2

, g(t) = g(t + 1) sampled

equidistantly e.g. tj = jn, j ∈ 1, 2, . . . , n adding some

uniform noise.

In this case all data are treated symmetrically. On the other hand,

dropping the conditions of periodicity and equidistant sampling, the

points interact dierently, giving rise to the need of weighting the

samples dierently.




Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion




Rewriting V0

There exists an n × n matrix A(λ), the inuence matrix, with the

property gn,λ(t1)gn,λ(t2)

...

gn,λ(tn)

= A(λ)y (9)

Such that V0(λ) can be rewritten as

V0(λ) =1

n

n∑k=1

∑ni=1

(akjyj − yk)2

(1− akk)2(10)

Where akj , k , j ∈ 1, 2, . . . , n is element k , j of A(λ)




The Generalized Cross Validation (GCV)

Denition

Let A(λ) be the inuence matrix dened above, then the GCV

function is dened as

V (λ) =1

n||(I− A(λ))y ||2[1

ntr(I− A(λ))

]2 (11)

We say that the Generalized Cross-Validation Estimate of λ is

λ = argminλ∈R+

V (λ) (12)




Similarity to OCV

By rewriting V (λ) we get

V (λ) =1

n

n∑i=1

(g[i ]n,λ(ti )− yi )

2wk(λ) (13)

where wk(λ) are given by

wk(λ) =

(1− akk(λ)

1

ntr(I− A(λ))

)2

(14)

wk(λ) have been shown to adjust V0(λ) for periodicity and

non-equidistant samples.




Comparison

Comparison of the true mean square error with the GCV function

for the example λ on gn,λ and g ′n,λ for yi = g(ti ) + εi whereg(t) = sin( πt

180), t ∈ [0, 360]



Convergence Result

Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion



Convergence Result

Convergence Result

Theorem

Let g(·) ∈W(m)2

and let tini=1satisfy

∫ ti0w(u)du = i

n, where

w(u) is a strictly positive continuous weight function. Then there

exists sequences λn∞n=1and λ∗n∞n=1

of minimas of E[V (λ)] andE[R(λ)] such that the expectation ineciency I ∗,

I ∗ =E[V (λn)]

E[R(λ∗n)]→ 1, as n→∞ (15)



Plan

1 Generals


Examples



Cross Validation


Convergence Result

4 Discussion



An implicit assumption that the true function g(t) is smooth

GCV was applied eectively to problems such as:

Finding the right order of splines in regression

Regularized solution of the Fredholm integral equations of the

rst kind

It provides an estimate of the regularization parameter from

general assumptions on the data.


Generalized Cross Validation - KTHszepessy/inversfor/marten.pdfGeneralized Cross Validation (GCV) The Generalized Cross Validation (GCV) De nition Let A ( ) be the in uence matrix

Documents