Image Priors and the Sparse-Land Model
Jan 21, 2016
Image Priors and the
Sparse-Land Model
Lets Start with a Virtual Experiment … Suppose that we take a VERY LARGE set of small images – say that we have
accumulated 1e12 patches, each of size 20×20 pixels.
Clearly, every such image is a point in R400. Lets put these point in this 400-dimensional Euclidean space, in the cube [0,1]400.
Now, LETS STEP INTO THIS SPACE and look at the cloud of points we have just generated.
What are we Expected to See?
1. Deserts! Vast emptiness! Why?
2. Concentration of points in some regions.
3. Different densities from one place to another.
4. Filaments, manifold structure …
In this experiment we have actually created an empirical estimate of the Probability Density Function (PDF) of … images – Call it P(x)
So, Lets Talk about This
We “experimented” with small images, but actually the same phenomena will be found in audio, seismic data, financial data, text-files, … and practically any source of information you are familiar with.
Nevertheless, we will stick to images for the discussion.
Imagine this: a function that can be given an image and return its chances to exist! amazing, No?
Well, what could you do with such a function?
Answer: EVERYTHING
P(x)
Signal/Image Prior P(x)
What is it good for?
0 2y x v, v
0x
Denoising: The measurement is
and we are trying to recover
Region where P(x) is high
0x
2y x yRecall that for
random noise we have E{(y-x0)Tx0}=0
Signal/Image Prior P(x)
What is it good for?
0 2y x v, v
0x
2xx̂ ArgMax P(x) s.t. y x
x y
2
2x̂ E x | y x xP(x)dx
Denoising: The measurement is
and we are trying to recover :
Option 1: MAP
Option 2: MMSE
Signal/Image Prior P(x)
What is it good for?
0x
Inverse Problems: The measurement is
and we are trying to recover , as before.
H could be blur, projection, downscaling, subsampling, …
0 2y x v, v H
2xx̂ ArgMax P(x) s.t. y x H
Signal/Image Prior P(x)
What is it good for?
Compression: We are given x and a budget of B bits. Our goal is to get the best possible compression (i.e. minimize the error). The approach we take is to divide the whole domain into 2B disjoint sets (Voronoi) and minimize the error w.r.t. the representation vectors (VQ):
B
k x Sk k
22
k 2x k 1
Min x x P(x)dx
Signal/Image Prior P(x)
What is it good for?
Sampling: Our goal is to propose sampling and reconstruction strategies, each (or just the first) is parameterized, and optimize the parameters for the smallest possible error:
2
2Min x Reconst Sample x P(x)dx
Signal/Image Prior P(x)
What is it good for?
Separation: We are given
Where and are two different signals from two different distributions, and our goal is to separate the signal into its ingredients:
1 2 2y x x v, v
2x1x
1 2
1 21 2 1 2 2x ,x
ArgMax P (x ) P (x ) s.t. y x x
Signal/Image Prior P(x)
What is it good for?
Anomaly Detection: We are given x and we are supposed to say if it is an anomaly. This is done by testing
P(x) T
Signal/Image Prior P(x)
Question: What is it good for?Answer: Many great things.
P(x)=?
smoothness
22
x~ e L
for imagesP(x) ? W
LS
22
WLx~ e
wavelet1
x~ e T
LearnT 1x x~ e R
FoE
k kk
x~ e L
Robust stat. 1
x~ e L
PDE TV(x)~ e
Transform22
Tx~ e
time70’s 80’s 90’s 00’s
Major themes: - L2 → L1 - Linear vs. Non-
Linear Approx.- Training on examples- Random generator for x
sparse00~ e
where x
D
The Evolution of Priors for Images
GMM,Co-Sparse Analysis, Low-Rank, …
Signal/Image Prior P(x)
Here is an untold secret:
The vast literature in image processing over the past 4-5 decades is
NOTHING BUT
an evolution of ideas on the identity of P(x), and ways to use it in actual tasks
By the way, the same is true for many other data sources and signals …
Linear Versus Non-Linear Approximation
22
xP(x) ~ e
T
2 222x
1 1min x y x
2 2 T
1
k
1 1
n
0 0
100
0 0
100
Suppose that our prior is the following (T is unitary):
The matrix weights the transform elements:
Our goal: Denoising a signal with this prior by solving
Linear Versus Non-Linear Approximation
2 222x
1 1min x y x
2 2 T
2 21
2n
1 1T T T 2opt
1 11 1
T T
1 1
1011
x̂ I y I y
y y
T T T T
T T T T
The solution is given by
Implication: We leave the transform coefficients with the small weights and remove the ones with the high weight. The decision who survives the process is fixed by - This is Linear Approximation
Linear Versus Non-Linear Approximation
1x
P(x) ~ e T
2
12x
1min x y x
2 T
Suppose now that our prior is the following (T is unitary):
Our goal: Denoising a signal with this prior by solving
We have seen that the solution for this problem is given by soft shrinkage
Implications: Just like before, we filter the signal in the transform domain. This time we leave the dominant coefficients and discard of the small ones. This is known as Nonlinear Approximation.
Topt Sx̂ yT T
MGenerator of signals from
P(x)x
Draw k0 locations, and generate the representation
Multiply by the dictionary
D
Draw k0 – the cardinality of the representation
kP(k) ~ e
Draw k0 non-zero
values22
vP(v) ~ e
Add random iid (model) noise e
x+ =
e x D
0 20 2
2
P(x) ~ e
where x ~ 0,
D I
Sparse-Land
Sparse-Land Signal Generation
00P(x) ~ e
where x D
00P(x) ~ e where x
D
Assume no noise in the model and that D is square and invertible
The Sparse-land model generalizes the previous method by (1) adopting over-completeness***, and (2) daring to work with true sparsity and L0
01 00 0
x xP(x) ~ e e
D T
*** What about redundant T? this will be addressed later!
Sparse-Land vs. Earlier Models
Geometrical Insight
0x
n N1 2 3 N0 0 0 0
| | | |
x x x x
| |
x
|
x x
|
x
E
The effective rank d of E (found by SVD) is expected to be very low:
d<<n
This is universally true for signals we operate on
The orientation and dimension of this subspace changes form one
point to another (smoothly?)
Geometrical Insight – Implications
0x
Given a noisy version of x0
How shall we denoise it?
By projecting to the subspace around x0 (chicken and egg)
How come y is not on the subspace itself?
1. The relative volume of the subspace is negligible
2. Recall that E{(y-x0)Tx0}=0
0y vx y
Geometrical Insight – Denoising in Practice
0x
Given a noisy version of x0
How shall we denoise it?
1. Non-parametric: Nearest Neighbor (NN), or K-NN
2. Local-Parametric: Group neighbors, estimate the subspace and project
3. Parametric: Cluster the DB into K subgroups, and estimate a subspace per each. When a signal is to be denoised, assign it to the closest subgroup, and the project on the corresponding subspace (K=1: PCA)
4. Sparse-Land: one dictionary encapsulates many such clusters
0y vx
y
Union of Subspaces (UoS) We said that with Sparse-Land, one dictionary encapsulates many
such clusters
Consider all the signals x that emerge from the same k atoms in D – all of them reside in the same subspace, spanned by these columns.
Thus, every possible support (and there are m-choose-k of them) represent one such subspace which the signal could belong to.
The pursuit task: Given a noisy signal we are searching the “closest subspace” and projecting onto it. It is so hard because of the number of the subspaces involved in this union.
=
Sparse-Land
00P(x) ~ e
where x
D
Processing Sparse-Land Signals 00P(x) ~ e where x
D
Objective Given Data
Goal
Most effective transform - getting the sparsest possible set of iid coefficients
Signal Transform
Cleanest possible signalSignal Denoising
We have a budget of B bits and we want to best represent the signal
Compress
Treat blur, subsampling, missing values, projection, compressed-sensing
Inverse Problem
The two signals are form different sources and thus have different models
Separate
0min s.t. x Dx
2
y x v
v
0 2
min s.t. y D
x
[bits]
2
0
min y
s.t. B
D
2
y x v
v
H0 2
min s.t. y HD
1 2
2
y x x
v, v
1 20 0
1 21 2 2
min
s.t. y
D D
00P(x) ~ e where x
D
Objective Given Data
Goal
Most effective transform - getting the sparsest possible set of iid coefficients
Signal Transform
Cleanest possible signalSignal Denoising
We have a budget of B bits and we want to best represent the signal
Compress
Treat blur, subsampling, missing values, projection, compressed-sensing
Inverse Problem
The two signals are form different sources and thus have different models
Separate
0min s.t. x Dx
2
y x v
v
0 2
min s.t. y D
x
[bits]
2
0
min y
s.t. B
D
2
y x v
v
H0 2
min s.t. y HD
1 2
2
y x x
v, v
1 20 0
1 21 2 2
min
s.t. y
D D
Processing Sparse-Land Signals
All these (and other) processing methods boil down to the solution of
For which we now know that1. It is theoretically sensible, and 2. There are numerical ways to handle it
0P
To Summarize
The Sparse-Land forms a general Union of Subspaces, all encapsulated by the concise matrix D.
This follows many earlier work that aims to model signals using a union of subspaces (or mixture of Gaussians – think about it – it is the same).
Sparse-Land is Rooted on solid modeling ideas , while improving on them due to its generality and
it solid mathematical foundations