Lecture 17: Wavelets - Pratheepa Jeganathan · Lecture 17: Wavelets Author: Pratheepa Jeganathan Created Date: 5/22/2019 10:15:07 AM ...

Post on 14-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Lecture 17: Wavelets

Pratheepa Jeganathan

05/10/2019

Recall

I One sample sign test, Wilcoxon signed rank test, large-sampleapproximation, median, Hodges-Lehman estimator,distribution-free confidence interval.

I Jackknife for bias and standard error of an estimator.I Bootstrap samples, bootstrap replicates.I Bootstrap standard error of an estimator.I Bootstrap percentile confidence interval.I Hypothesis testing with the bootstrap (one-sample problem.)I Assessing the error in bootstrap estimates.I Example: inference on ratio of heart attack rates in the

aspirin-intake group to the placebo group.I The exhaustive bootstrap distribution.

I Discrete data problems (one-sample, two-sample proportiontests, test of homogeneity, test of independence).

I Two-sample problems (location problem - equal variance,unequal variance, exact test or Monte Carlo, large-sampleapproximation, H-L estimator, dispersion problem, generaldistribution).

I Permutation tests (permutation test for continuous data,different test statistic, accuracy of permutation tests).

I Permutation tests (discrete data problems, exchangeability.)I Rank-based correlation analysis (Kendall and Spearman

correlation coefficients.)I Rank-based regression (straight line, multiple linear regression,

statistical inference about the unknown parameters,nonparametric procedures - does not depend on thedistribution of error term.)

I Smoothing (density estimation, bias-variance trade-off, curse ofdimensionality)

I Nonparametric regression (Local averaging, local regression,kernel smoothing, local polynomial, penalized regression)

I Cross-validation, Variance Estimation, Confidence Bands,Bootstrap Confidence Bands.

Wavelets

Spatially inhomogeneous functions

Example

I Doppler functionlibrary(ggplot2)r = function(x){

sqrt(x*(1-x))*sin(2.1*pi/(x+.05))}ep = rnorm(1000)y = r(seq(1, 1000, by = 1)/1000) + .1 * epdf = data.frame(x = seq(1, 1000, by = 1)/1000, y = y)ggplot(df) +

geom_point(aes(x = x, y = y))

Example

−0.8

−0.4

0.0

0.4

0.8

0.00 0.25 0.50 0.75 1.00

x

y

Example

I Doppler function is spatially inhomogeneous (smoothness variesover x).

I Estimate by local linear regressionlibrary(np)doppler.npreg <- npreg(bws=.005,

txdat=df$x,tydat=df$y,

ckertype="epanechnikov")

doppler.npreg.fit = data.frame(x = df$x,y = df$y,kernel.fit = fitted(doppler.npreg))

p = ggplot(doppler.npreg.fit) +geom_point(aes(x = x, y = y)) +geom_line(aes(x = x, y= kernel.fit), color = "red")

Example

−0.8

−0.4

0.0

0.4

0.8

0.00 0.25 0.50 0.75 1.00

x

y

Example

I Doppler function fit using local linear regression.I Effective degrees of freedom 166.I Fitted function is very wiggly.I If we smooth more, right-hand side of the fit would look better

at the cost of missing structure near x = 0.

Introduction

I Construct basis functions that areI multiscale.I spatially/ locally adaptive.

I Find sparse set of coefficients for a given basis.

I Function f belongs to a class of functions F possessing moregeneral characteristics, such as a certain level of smoothness.

I Estimate f by representing the function in another domain.I Use an orthogonal series representation of the function f .I Estimating a set of scalar coefficients that represent f in the

orthogonal series domain.I Tool: Wavelets

I ability to estimate both global and local features in theunderlying function

Sparseness

I W 2006 Chapter 9I A function f =

∑j βjφj is sparse in a basis φ1, φ2, · · · if most

of the βj ’s are zero.I Sparseness generalizes smoothness: smooth functions are

sparse but there are also non smooth functions that are sparse.I Sparseness is not captured by L2 norm.

I Example a = (1, 0, · · · , 0) and b =(1/√

n, 1/√

n, · · · , 1/√

n).

I a is sparse.I L2 norms are ||a||2 = ||b||2 = 1.I L1 norms are |a|1 = 1 and |b|1 =

√n.

Wavelets

I Data: There are n pairs of observations(x1,Y1) , (x2,Y2) , · · · , (xn,Yn).

I AssumptionsI Yi = f (xi) + εi .I εi are IID.I∫

f 2 <∞ and f is defined on a close interval [a, b]. Forsimplicity, we will consider [a, b] = [0, 1].

Wavelet representation of a function

Basis functions

I ψ = {ψ1, ψ2, · · · } is called a basis for a class of functions F .Then, for f ∈ F ,

f (x) =∞∑

i=1θiψi (x) .

I θi - scalar constants/coefficientsI Basis functions are orthogonal if < ψi , ψj >= 0 for i 6= j .I If basis functions are orthonormal, they are orthogonal and< ψi , ψi >= 1.

I How do we construct basis functions ψi ’s ?

Basis functions

I If ψ is a wavelet function, then the collection of functions

Ψ = {ψij ; j .k integers},

whereψij = 2j/2ψ

(2jx − k

),

forms a basis for square integrable functions.I Ψ is a collection of translation (shift) and dilation (scaling) ofψ.

I ψ can be defined in any range of real line.I∫ψ = 1I value of ψ is near 0 except over a small range.

Some examples for wavelets

I Haar wavelets (1910)

Some examples for wavelets

I Daubechies wavelets (1992)

Multiresolution analysis (MRA)

I Carefully construct wavelet function ψ.I MRA: interpretation of the wavelet representation of f in terms

of location and scale.I Translation and dilation of ψ gives

f (x) =∑j∈Z

∑k∈Z

θjkψ (x) ,

where Z is a set of integers.I scale - frequency.I For fixed j , k represents the behavior of f at resolution scale j

and a particular location.I function f at differing resolution (scale, frequency) levels j and

locations k - MRA.

Multiresolution analysis (MRA)

I Cumulative approximation of f using j < J ,

fJ (x) =∑j<J

∑k∈Z

θjkψ (x) .

I J increases - fJ models smaller scales (higher frequency) of f -changes occur in the small interval of x .

I J decreases - fJ models larger scale (lower frequency) behaviorof f .

I A complete representation of f is the limit of fJ .

Multiresolution analysis (MRA)

I Write fJ (x) as follows:

fJ (x) =∑k∈Z

ξj0φj0k (x) +∑

j0≤j<J

∑k∈Z

θjkψjk (x) ,

where fj0 =∑

k∈Z ξj0φj0k (x) .I Add second term to fj0 allows for modeling higher

scale-frequency behavior of f .I fj0 approximation at the smooth resolution level.I Each of the remaining resolution level series is a “detail” level.I φ - scaling function (Father wavelet).I ψ - wavelet function (Mother wavelet).

MRA Using the Haar Wavelet (Example)

I Approximate f (x) = x , x ∈ (0, 1).I Define Haar wavelet function

ψ (x) ={1 x ∈ [0, 1/2),−1 x ∈ [1/2, 1),

(1)

andφ (x) = 1, x ∈ [0, 1]. (2)

I Haar wavelet allows exact determination of the waveletcoefficients θjk .

I Source HWC

MRA Using the D2 wavelet

I Source HWC

MRA Using the D2 wavelet

I To avoid boundary issues using D2I Specify using reflection at the boundaries, rather than

periodicity.I increase the number of indices k that must be considered at

each resolution level j .

Discrete wavelet transform

I Cascade algorithm provides MRA (Mallat 1989).I Some restrictions

I J = log2 (n) .I The number of resolution levels in the wavelet series is

truncated both above and below in practice, resulting inJ − j0 + 1 series, each representing a resolution level.

I Commands in R that make use of the DWT are dwt, idwt,and mra in package waveslim (Whitcher (2010)).

Discrete wavelet transform (Example)

I yi = xi = (i − 1) /n, i = 1, 2, · · · , n.n = 2^12xi = (seq(1, n, by =1) - 1)/nyi = xilibrary(waveslim)

I Haar basis.I Number of resolution levels J = 12.I Decompose the sample data y .

dwt.fit = mra(yi, method="dwt", wf="haar", J=12)

I Output is a list of 13 vectors.I The first vector is the change necessary to go from the

approximation f12 to f13- approximation at the highest detailresolution level.

I The next to last vector is f1 − f0.I The final, thirteenth vector is the smooth approximation f0.I Summing the thirteenth vector and the twelfth vector results in

f1.I f13 − f12, f12 − f11, · · · , f1 − f0, f0.

f0 = dwt.fit[[13]]f1 = dwt.fit[[13]]+dwt.fit[[12]]df = data.frame(x = xi, y = yi,

f0=f0, f1 = f1)p1 = ggplot() +

geom_point(data = df ,aes(x = xi, y = yi))+

geom_line(data = df ,aes(x = xi, y = f0), color = "blue") +

geom_line(data = df,aes(x = xi, y = f1), color = "red")

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

xi

yi

Wavelet representation with resolution J = 0 and 1

f2 = dwt.fit[[13]]+dwt.fit[[12]] + dwt.fit[[11]]df = data.frame(x = xi, y = yi,

f0 = f0, f1 = f1, f2 = f2)p2 = ggplot() +

geom_point(data = df ,aes(x = xi, y = yi)) +

geom_line(data = df,aes(x = xi, y = f0), color = "blue")+

geom_line(data = df,aes(x = xi, y = f1), color = "red")+

geom_line(data = df,aes(x = xi, y = f2), color = "brown")

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

xi

yi

Wavelet representation with resolution J = 0, 1, 2

f5 = dwt.fit[[13]]+dwt.fit[[12]] + dwt.fit[[11]] + dwt.fit[[10]]+dwt.fit[[9]] + dwt.fit[[8]]df = data.frame(x = xi, y = yi,

f0 = f0, f1 = f1, f2 = f2, f5 = f5)p5 = ggplot() +

geom_point(data = df ,aes(x = xi, y = yi)) +

geom_line(data = df,aes(x = xi, y = f0), color = "blue")+

geom_line(data = df,aes(x = xi, y = f1), color = "red")+

geom_line(data = df,aes(x = xi, y = f2), color = "brown")+

geom_line(data = df,aes(x = xi, y = f5), color = "darkgreen")

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

xi

yi

Wavelet representation with increasing resolution J = 0, 1, 2, 5

I What if we choose J is less than 12 for this example?I Set J = 3I j0 > 0, for example, when J = 3, j0 = 9.

dwt.fit.J3 = mra(yi, method="dwt", wf="haar", J=3)

length(dwt.fit.J3)

## [1] 4

f9 = dwt.fit.J3[[4]] # f0f10 = dwt.fit.J3[[4]] + dwt.fit.J3[[3]] # f1f11 = dwt.fit.J3[[4]] + dwt.fit.J3[[3]] + dwt.fit.J3[[2]]# f2df = data.frame(x = xi, y = yi,

f0 = f9, f1 = f10, f2 = f11)p.J3 = ggplot() +

geom_point(data = df ,aes(x = xi, y = yi)) +

geom_line(data = df,aes(x = xi, y = f0), color = "blue")+

geom_line(data = df,aes(x = xi, y = f1), color = "red")+

geom_line(data = df,aes(x = xi, y = f2), color = "brown")

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

xi

yi

I dwt determines the wavelet coefficients at each resolution level.I n.levels - resolution levels to determine.

I Read Page 637 for more detail.y.dwt <- dwt(yi, wf="haar", n.levels=12)

I The resulting R list of coefficients may be used to reconstructthe original vector of sampled data y .

reconstruct.y = idwt(y.dwt)plot(xi, reconstruct.y)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

xi

reco

nstr

uct.y

Wavelet Thresholding

I We saw how a function f may be represented with a waveletbasis.

I DWT, a sample of length n from f may be decomposed into nwavelet coefficients making up a single smooth approximationand up to J = log2 (n) . detail resolution levels.

I Sparsity - the ability of wavelets to represent a function byconcentrating or compressing the information about f into afew large magnitude coefficients and many small magnitudecoefficients.

I Compression (thresholding) is applied to the waveletcoefficients of a sampled function f prior to its reconstruction.

I Thresholding - provides a significant level of data reduction forthe problem.

Sparsity of the Wavelet Representation

I HWC Example 13.3I Monthly sunspot numbers from 1749 to 1983.I Sunspots are temporary phenomena on the photosphere of the

sun that appear visibly as dark spots compared to surroundingregions.

I Sunspots correspond to concentrations of magnetic field fluxthat inhibit convection and result in reduced surfacetemperature compared to the surrounding photosphere.

I The original data has length 2820, but only the first 2048 areused here to make it a dyadic number.

I So the filtered data is monthly sunspot data from January 1749through July 1919.

library(datasets)data(sunspots)

plot.ts(sunspots[1:2048],ylab = "Number of sunspots",xlab = "Months")

Months

Num

ber

of s

unsp

ots

0 500 1000 1500 2000

050

100

150

200

I The DWT is applied to this data resulting in 2048 coefficients.dwt.sunspot = dwt(sunspots[1:2048], n.levels = 4, wf = "la8")

I These coefficients are sorted in magnitude and the smallest50% (1024) are set to 0.

I Reconstruction nearly indistinguishable from the original data.

dwt.sunspot.coeff = unlist(dwt.sunspot)dwt.sunspot.coeff = sort(dwt.sunspot.coeff,

decreasing = T)val = as.numeric(quantile(dwt.sunspot.coeff,

p =.5))manual.thresholding = manual.thresh(dwt.sunspot,

value = val)

I The inverse DWT is applied to this compressed (50%thresholding) set of coefficients, resulting in the reconstruction.

y.idwt.manual.thresholding = idwt(manual.thresholding)plot(y.idwt.manual.thresholding,

ylab = "Number of sunspots",xlab = "Months",main = "50% Thresholding")

0 500 1000 1500 2000

050

100

150

200

50% Thresholding

Months

Num

ber

of s

unsp

ots

I Set smallest 95% of the coefficients to 0 prior to reconstruction.I Reconstruction with the basic shape of the original data, but

with the very localized variability mostly removed.

0 500 1000 1500 2000

050

100

150

95% Thresholding

Months

Num

ber

of s

unsp

ots

Thresholding

I A drawback to compression - need to specify the amount ofreduction.

I Thresholding specifies a data-driven compression.I Many methods of thresholding are based on assuming that the

errors are normally distributed.

Thresholding

I Let θ is a coefficient estimated with the DWT and λ is aspecified threshold value.

I Hard thresholdingI sets a coefficient to 0 if it has small magnitude and leaves the

coefficient unmodified otherwise.I Soft thresholding

I threshold sets small coefficients to 0 and shrinks the larger onesby λ toward 0.

I DWT operation may be represented as a matrix operator W

θ̃ = Wf + W ε.

I θ = Wf represents the wavelet coefficients of the unobservedsampled function f .

I ε̃ = W ε represents the coefficients of the errors.

Thresholding

Thresholding - VisuShrink (Donoho and Johnstone (1994))

I Applying a single threshold λ (Donoho and Johnstone 1994).y = sunspots[1:2048]y.dwt = dwt(sunspots[1:2048])y.visuShrink = universal.thresh(y.dwt, hard = TRUE)y.idwt.visuShrink = idwt(y.visuShrink)plot(y.idwt.visuShrink,

ylab = "Number of sunspots",xlab = "Months",main = "VisuShrink-Hard Thresholding")

0 500 1000 1500 2000

050

100

150

200

VisuShrink−Hard Thresholding

Months

Num

ber

of s

unsp

ots

y.visuShrink.soft = universal.thresh(y.dwt, hard = FALSE)y.idwt.visuShrink.soft = idwt(y.visuShrink.soft)plot(y.idwt.visuShrink.soft,

ylab = "Number of sunspots",xlab = "Months",main = "VisuShrink-Soft Thresholding")

0 500 1000 1500 2000

050

100

150

VisuShrink−Soft Thresholding

Months

Num

ber

of s

unsp

ots

Thresholding - SureShrink (Donoho and Johnstone (1995))

I Uses a different threshold at each resolution level of thewavelet decomposition of f (Donoho and Johnstone 1995).

I SureShrink is actually a hybrid threshold methodI certain resolution levels can be too sparse.I revert SureShrink to using the universal threshold of

VisuShrink at the resolution level in question.

y.sureshrink = hybrid.thresh(y.dwt, max.level = 4)y.sureshrink.idwt = idwt(y.sureshrink)plot(y.sureshrink.idwt,

ylab = "Number of sunspots",xlab = "Months",main = "SureShrink-Soft Thresholding")

0 500 1000 1500 2000

050

100

150

SureShrink−Soft Thresholding

Months

Num

ber

of s

unsp

ots

Other use of wavelets

I Nonparametric density estimation (Vidakovic (1999)).I Use for understanding the properties of time series and random

processes.

Notes

I Can do thresholding without strong distributional assumptionson the errors using cross-validation (Nason 1996).

I Practical, simultaneous confidence bands for wavelet estimatorsare not available (Wasserman 2006).

I Standard wavelet basis functions are not invariant totranslation and rotations.

I Recent work by (Mallat 2012) and (Bruna and Mallat 2013)extend wavelets to handle these kind of invariances.

I Promising new direction for the theory of convolutional neuralnetwork.

References for this lectureHWC Chapter 13 (Wavelets)

W Chapter 9

Bruna, Joan, and Stéphane Mallat. 2013. “Invariant ScatteringConvolution Networks.” IEEE Transactions on Pattern Analysis andMachine Intelligence 35 (8). IEEE: 1872–86.

Donoho, David L, and Iain M Johnstone. 1995. “Adapting toUnknown Smoothness via Wavelet Shrinkage.” Journal of theAmerican Statistical Association 90 (432). Taylor & Francis Group:1200–1224.

Donoho, David L, and Jain M Johnstone. 1994. “Ideal SpatialAdaptation by Wavelet Shrinkage.” Biometrika 81 (3). OxfordUniversity Press: 425–55.

Mallat, Stéphane. 2012. “Group Invariant Scattering.”Communications on Pure and Applied Mathematics 65 (10). WileyOnline Library: 1331–98.

Nason, Guy P. 1996. “Wavelet Shrinkage Using Cross-Validation.”Journal of the Royal Statistical Society: Series B (Methodological)58 (2). Wiley Online Library: 463–79.

top related