Lecture 17: Wavelets Pratheepa Jeganathan 05/10/2019
Lecture 17: Wavelets
Pratheepa Jeganathan
05/10/2019
Recall
I One sample sign test, Wilcoxon signed rank test, large-sampleapproximation, median, Hodges-Lehman estimator,distribution-free confidence interval.
I Jackknife for bias and standard error of an estimator.I Bootstrap samples, bootstrap replicates.I Bootstrap standard error of an estimator.I Bootstrap percentile confidence interval.I Hypothesis testing with the bootstrap (one-sample problem.)I Assessing the error in bootstrap estimates.I Example: inference on ratio of heart attack rates in the
aspirin-intake group to the placebo group.I The exhaustive bootstrap distribution.
I Discrete data problems (one-sample, two-sample proportiontests, test of homogeneity, test of independence).
I Two-sample problems (location problem - equal variance,unequal variance, exact test or Monte Carlo, large-sampleapproximation, H-L estimator, dispersion problem, generaldistribution).
I Permutation tests (permutation test for continuous data,different test statistic, accuracy of permutation tests).
I Permutation tests (discrete data problems, exchangeability.)I Rank-based correlation analysis (Kendall and Spearman
correlation coefficients.)I Rank-based regression (straight line, multiple linear regression,
statistical inference about the unknown parameters,nonparametric procedures - does not depend on thedistribution of error term.)
I Smoothing (density estimation, bias-variance trade-off, curse ofdimensionality)
I Nonparametric regression (Local averaging, local regression,kernel smoothing, local polynomial, penalized regression)
I Cross-validation, Variance Estimation, Confidence Bands,Bootstrap Confidence Bands.
Wavelets
Spatially inhomogeneous functions
Example
I Doppler functionlibrary(ggplot2)r = function(x){
sqrt(x*(1-x))*sin(2.1*pi/(x+.05))}ep = rnorm(1000)y = r(seq(1, 1000, by = 1)/1000) + .1 * epdf = data.frame(x = seq(1, 1000, by = 1)/1000, y = y)ggplot(df) +
geom_point(aes(x = x, y = y))
Example
−0.8
−0.4
0.0
0.4
0.8
0.00 0.25 0.50 0.75 1.00
x
y
Example
I Doppler function is spatially inhomogeneous (smoothness variesover x).
I Estimate by local linear regressionlibrary(np)doppler.npreg <- npreg(bws=.005,
txdat=df$x,tydat=df$y,
ckertype="epanechnikov")
doppler.npreg.fit = data.frame(x = df$x,y = df$y,kernel.fit = fitted(doppler.npreg))
p = ggplot(doppler.npreg.fit) +geom_point(aes(x = x, y = y)) +geom_line(aes(x = x, y= kernel.fit), color = "red")
Example
−0.8
−0.4
0.0
0.4
0.8
0.00 0.25 0.50 0.75 1.00
x
y
Example
I Doppler function fit using local linear regression.I Effective degrees of freedom 166.I Fitted function is very wiggly.I If we smooth more, right-hand side of the fit would look better
at the cost of missing structure near x = 0.
Introduction
I Construct basis functions that areI multiscale.I spatially/ locally adaptive.
I Find sparse set of coefficients for a given basis.
I Function f belongs to a class of functions F possessing moregeneral characteristics, such as a certain level of smoothness.
I Estimate f by representing the function in another domain.I Use an orthogonal series representation of the function f .I Estimating a set of scalar coefficients that represent f in the
orthogonal series domain.I Tool: Wavelets
I ability to estimate both global and local features in theunderlying function
Sparseness
I W 2006 Chapter 9I A function f =
∑j βjφj is sparse in a basis φ1, φ2, · · · if most
of the βj ’s are zero.I Sparseness generalizes smoothness: smooth functions are
sparse but there are also non smooth functions that are sparse.I Sparseness is not captured by L2 norm.
I Example a = (1, 0, · · · , 0) and b =(1/√
n, 1/√
n, · · · , 1/√
n).
I a is sparse.I L2 norms are ||a||2 = ||b||2 = 1.I L1 norms are |a|1 = 1 and |b|1 =
√n.
Wavelets
I Data: There are n pairs of observations(x1,Y1) , (x2,Y2) , · · · , (xn,Yn).
I AssumptionsI Yi = f (xi) + εi .I εi are IID.I∫
f 2 <∞ and f is defined on a close interval [a, b]. Forsimplicity, we will consider [a, b] = [0, 1].
Wavelet representation of a function
Basis functions
I ψ = {ψ1, ψ2, · · · } is called a basis for a class of functions F .Then, for f ∈ F ,
f (x) =∞∑
i=1θiψi (x) .
I θi - scalar constants/coefficientsI Basis functions are orthogonal if < ψi , ψj >= 0 for i 6= j .I If basis functions are orthonormal, they are orthogonal and< ψi , ψi >= 1.
I How do we construct basis functions ψi ’s ?
Basis functions
I If ψ is a wavelet function, then the collection of functions
Ψ = {ψij ; j .k integers},
whereψij = 2j/2ψ
(2jx − k
),
forms a basis for square integrable functions.I Ψ is a collection of translation (shift) and dilation (scaling) ofψ.
I ψ can be defined in any range of real line.I∫ψ = 1I value of ψ is near 0 except over a small range.
Some examples for wavelets
I Haar wavelets (1910)
Some examples for wavelets
I Daubechies wavelets (1992)
Multiresolution analysis (MRA)
I Carefully construct wavelet function ψ.I MRA: interpretation of the wavelet representation of f in terms
of location and scale.I Translation and dilation of ψ gives
f (x) =∑j∈Z
∑k∈Z
θjkψ (x) ,
where Z is a set of integers.I scale - frequency.I For fixed j , k represents the behavior of f at resolution scale j
and a particular location.I function f at differing resolution (scale, frequency) levels j and
locations k - MRA.
Multiresolution analysis (MRA)
I Cumulative approximation of f using j < J ,
fJ (x) =∑j<J
∑k∈Z
θjkψ (x) .
I J increases - fJ models smaller scales (higher frequency) of f -changes occur in the small interval of x .
I J decreases - fJ models larger scale (lower frequency) behaviorof f .
I A complete representation of f is the limit of fJ .
Multiresolution analysis (MRA)
I Write fJ (x) as follows:
fJ (x) =∑k∈Z
ξj0φj0k (x) +∑
j0≤j<J
∑k∈Z
θjkψjk (x) ,
where fj0 =∑
k∈Z ξj0φj0k (x) .I Add second term to fj0 allows for modeling higher
scale-frequency behavior of f .I fj0 approximation at the smooth resolution level.I Each of the remaining resolution level series is a “detail” level.I φ - scaling function (Father wavelet).I ψ - wavelet function (Mother wavelet).
MRA Using the Haar Wavelet (Example)
I Approximate f (x) = x , x ∈ (0, 1).I Define Haar wavelet function
ψ (x) ={1 x ∈ [0, 1/2),−1 x ∈ [1/2, 1),
(1)
andφ (x) = 1, x ∈ [0, 1]. (2)
I Haar wavelet allows exact determination of the waveletcoefficients θjk .
I Source HWC
MRA Using the D2 wavelet
I Source HWC
MRA Using the D2 wavelet
I To avoid boundary issues using D2I Specify using reflection at the boundaries, rather than
periodicity.I increase the number of indices k that must be considered at
each resolution level j .
Discrete wavelet transform
I Cascade algorithm provides MRA (Mallat 1989).I Some restrictions
I J = log2 (n) .I The number of resolution levels in the wavelet series is
truncated both above and below in practice, resulting inJ − j0 + 1 series, each representing a resolution level.
I Commands in R that make use of the DWT are dwt, idwt,and mra in package waveslim (Whitcher (2010)).
Discrete wavelet transform (Example)
I yi = xi = (i − 1) /n, i = 1, 2, · · · , n.n = 2^12xi = (seq(1, n, by =1) - 1)/nyi = xilibrary(waveslim)
I Haar basis.I Number of resolution levels J = 12.I Decompose the sample data y .
dwt.fit = mra(yi, method="dwt", wf="haar", J=12)
I Output is a list of 13 vectors.I The first vector is the change necessary to go from the
approximation f12 to f13- approximation at the highest detailresolution level.
I The next to last vector is f1 − f0.I The final, thirteenth vector is the smooth approximation f0.I Summing the thirteenth vector and the twelfth vector results in
f1.I f13 − f12, f12 − f11, · · · , f1 − f0, f0.
f0 = dwt.fit[[13]]f1 = dwt.fit[[13]]+dwt.fit[[12]]df = data.frame(x = xi, y = yi,
f0=f0, f1 = f1)p1 = ggplot() +
geom_point(data = df ,aes(x = xi, y = yi))+
geom_line(data = df ,aes(x = xi, y = f0), color = "blue") +
geom_line(data = df,aes(x = xi, y = f1), color = "red")
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
xi
yi
Wavelet representation with resolution J = 0 and 1
f2 = dwt.fit[[13]]+dwt.fit[[12]] + dwt.fit[[11]]df = data.frame(x = xi, y = yi,
f0 = f0, f1 = f1, f2 = f2)p2 = ggplot() +
geom_point(data = df ,aes(x = xi, y = yi)) +
geom_line(data = df,aes(x = xi, y = f0), color = "blue")+
geom_line(data = df,aes(x = xi, y = f1), color = "red")+
geom_line(data = df,aes(x = xi, y = f2), color = "brown")
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
xi
yi
Wavelet representation with resolution J = 0, 1, 2
f5 = dwt.fit[[13]]+dwt.fit[[12]] + dwt.fit[[11]] + dwt.fit[[10]]+dwt.fit[[9]] + dwt.fit[[8]]df = data.frame(x = xi, y = yi,
f0 = f0, f1 = f1, f2 = f2, f5 = f5)p5 = ggplot() +
geom_point(data = df ,aes(x = xi, y = yi)) +
geom_line(data = df,aes(x = xi, y = f0), color = "blue")+
geom_line(data = df,aes(x = xi, y = f1), color = "red")+
geom_line(data = df,aes(x = xi, y = f2), color = "brown")+
geom_line(data = df,aes(x = xi, y = f5), color = "darkgreen")
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
xi
yi
Wavelet representation with increasing resolution J = 0, 1, 2, 5
I What if we choose J is less than 12 for this example?I Set J = 3I j0 > 0, for example, when J = 3, j0 = 9.
dwt.fit.J3 = mra(yi, method="dwt", wf="haar", J=3)
length(dwt.fit.J3)
## [1] 4
f9 = dwt.fit.J3[[4]] # f0f10 = dwt.fit.J3[[4]] + dwt.fit.J3[[3]] # f1f11 = dwt.fit.J3[[4]] + dwt.fit.J3[[3]] + dwt.fit.J3[[2]]# f2df = data.frame(x = xi, y = yi,
f0 = f9, f1 = f10, f2 = f11)p.J3 = ggplot() +
geom_point(data = df ,aes(x = xi, y = yi)) +
geom_line(data = df,aes(x = xi, y = f0), color = "blue")+
geom_line(data = df,aes(x = xi, y = f1), color = "red")+
geom_line(data = df,aes(x = xi, y = f2), color = "brown")
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
xi
yi
I dwt determines the wavelet coefficients at each resolution level.I n.levels - resolution levels to determine.
I Read Page 637 for more detail.y.dwt <- dwt(yi, wf="haar", n.levels=12)
I The resulting R list of coefficients may be used to reconstructthe original vector of sampled data y .
reconstruct.y = idwt(y.dwt)plot(xi, reconstruct.y)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
xi
reco
nstr
uct.y
Wavelet Thresholding
I We saw how a function f may be represented with a waveletbasis.
I DWT, a sample of length n from f may be decomposed into nwavelet coefficients making up a single smooth approximationand up to J = log2 (n) . detail resolution levels.
I Sparsity - the ability of wavelets to represent a function byconcentrating or compressing the information about f into afew large magnitude coefficients and many small magnitudecoefficients.
I Compression (thresholding) is applied to the waveletcoefficients of a sampled function f prior to its reconstruction.
I Thresholding - provides a significant level of data reduction forthe problem.
Sparsity of the Wavelet Representation
I HWC Example 13.3I Monthly sunspot numbers from 1749 to 1983.I Sunspots are temporary phenomena on the photosphere of the
sun that appear visibly as dark spots compared to surroundingregions.
I Sunspots correspond to concentrations of magnetic field fluxthat inhibit convection and result in reduced surfacetemperature compared to the surrounding photosphere.
I The original data has length 2820, but only the first 2048 areused here to make it a dyadic number.
I So the filtered data is monthly sunspot data from January 1749through July 1919.
library(datasets)data(sunspots)
plot.ts(sunspots[1:2048],ylab = "Number of sunspots",xlab = "Months")
Months
Num
ber
of s
unsp
ots
0 500 1000 1500 2000
050
100
150
200
I The DWT is applied to this data resulting in 2048 coefficients.dwt.sunspot = dwt(sunspots[1:2048], n.levels = 4, wf = "la8")
I These coefficients are sorted in magnitude and the smallest50% (1024) are set to 0.
I Reconstruction nearly indistinguishable from the original data.
dwt.sunspot.coeff = unlist(dwt.sunspot)dwt.sunspot.coeff = sort(dwt.sunspot.coeff,
decreasing = T)val = as.numeric(quantile(dwt.sunspot.coeff,
p =.5))manual.thresholding = manual.thresh(dwt.sunspot,
value = val)
I The inverse DWT is applied to this compressed (50%thresholding) set of coefficients, resulting in the reconstruction.
y.idwt.manual.thresholding = idwt(manual.thresholding)plot(y.idwt.manual.thresholding,
ylab = "Number of sunspots",xlab = "Months",main = "50% Thresholding")
0 500 1000 1500 2000
050
100
150
200
50% Thresholding
Months
Num
ber
of s
unsp
ots
I Set smallest 95% of the coefficients to 0 prior to reconstruction.I Reconstruction with the basic shape of the original data, but
with the very localized variability mostly removed.
0 500 1000 1500 2000
050
100
150
95% Thresholding
Months
Num
ber
of s
unsp
ots
Thresholding
I A drawback to compression - need to specify the amount ofreduction.
I Thresholding specifies a data-driven compression.I Many methods of thresholding are based on assuming that the
errors are normally distributed.
Thresholding
I Let θ is a coefficient estimated with the DWT and λ is aspecified threshold value.
I Hard thresholdingI sets a coefficient to 0 if it has small magnitude and leaves the
coefficient unmodified otherwise.I Soft thresholding
I threshold sets small coefficients to 0 and shrinks the larger onesby λ toward 0.
I DWT operation may be represented as a matrix operator W
θ̃ = Wf + W ε.
I θ = Wf represents the wavelet coefficients of the unobservedsampled function f .
I ε̃ = W ε represents the coefficients of the errors.
Thresholding
Thresholding - VisuShrink (Donoho and Johnstone (1994))
I Applying a single threshold λ (Donoho and Johnstone 1994).y = sunspots[1:2048]y.dwt = dwt(sunspots[1:2048])y.visuShrink = universal.thresh(y.dwt, hard = TRUE)y.idwt.visuShrink = idwt(y.visuShrink)plot(y.idwt.visuShrink,
ylab = "Number of sunspots",xlab = "Months",main = "VisuShrink-Hard Thresholding")
0 500 1000 1500 2000
050
100
150
200
VisuShrink−Hard Thresholding
Months
Num
ber
of s
unsp
ots
y.visuShrink.soft = universal.thresh(y.dwt, hard = FALSE)y.idwt.visuShrink.soft = idwt(y.visuShrink.soft)plot(y.idwt.visuShrink.soft,
ylab = "Number of sunspots",xlab = "Months",main = "VisuShrink-Soft Thresholding")
0 500 1000 1500 2000
050
100
150
VisuShrink−Soft Thresholding
Months
Num
ber
of s
unsp
ots
Thresholding - SureShrink (Donoho and Johnstone (1995))
I Uses a different threshold at each resolution level of thewavelet decomposition of f (Donoho and Johnstone 1995).
I SureShrink is actually a hybrid threshold methodI certain resolution levels can be too sparse.I revert SureShrink to using the universal threshold of
VisuShrink at the resolution level in question.
y.sureshrink = hybrid.thresh(y.dwt, max.level = 4)y.sureshrink.idwt = idwt(y.sureshrink)plot(y.sureshrink.idwt,
ylab = "Number of sunspots",xlab = "Months",main = "SureShrink-Soft Thresholding")
0 500 1000 1500 2000
050
100
150
SureShrink−Soft Thresholding
Months
Num
ber
of s
unsp
ots
Other use of wavelets
I Nonparametric density estimation (Vidakovic (1999)).I Use for understanding the properties of time series and random
processes.
Notes
I Can do thresholding without strong distributional assumptionson the errors using cross-validation (Nason 1996).
I Practical, simultaneous confidence bands for wavelet estimatorsare not available (Wasserman 2006).
I Standard wavelet basis functions are not invariant totranslation and rotations.
I Recent work by (Mallat 2012) and (Bruna and Mallat 2013)extend wavelets to handle these kind of invariances.
I Promising new direction for the theory of convolutional neuralnetwork.
References for this lectureHWC Chapter 13 (Wavelets)
W Chapter 9
Bruna, Joan, and Stéphane Mallat. 2013. “Invariant ScatteringConvolution Networks.” IEEE Transactions on Pattern Analysis andMachine Intelligence 35 (8). IEEE: 1872–86.
Donoho, David L, and Iain M Johnstone. 1995. “Adapting toUnknown Smoothness via Wavelet Shrinkage.” Journal of theAmerican Statistical Association 90 (432). Taylor & Francis Group:1200–1224.
Donoho, David L, and Jain M Johnstone. 1994. “Ideal SpatialAdaptation by Wavelet Shrinkage.” Biometrika 81 (3). OxfordUniversity Press: 425–55.
Mallat, Stéphane. 2012. “Group Invariant Scattering.”Communications on Pure and Applied Mathematics 65 (10). WileyOnline Library: 1331–98.
Nason, Guy P. 1996. “Wavelet Shrinkage Using Cross-Validation.”Journal of the Royal Statistical Society: Series B (Methodological)58 (2). Wiley Online Library: 463–79.