Image Priors and the Sparse-Land Model. Lets Start with a Virtual Experiment … Suppose that we take a VERY LARGE set of small images – say that we have.

Image Priors and the

Sparse-Land Model

Lets Start with a Virtual Experiment … Suppose that we take a VERY LARGE set of small images – say that we have

accumulated 1e12 patches, each of size 20×20 pixels.

Clearly, every such image is a point in R400. Lets put these point in this 400-dimensional Euclidean space, in the cube [0,1]400.

Now, LETS STEP INTO THIS SPACE and look at the cloud of points we have just generated.

What are we Expected to See?

1. Deserts! Vast emptiness! Why?

2. Concentration of points in some regions.

3. Different densities from one place to another.

4. Filaments, manifold structure …

In this experiment we have actually created an empirical estimate of the Probability Density Function (PDF) of … images – Call it P(x)

So, Lets Talk about This

We “experimented” with small images, but actually the same phenomena will be found in audio, seismic data, financial data, text-files, … and practically any source of information you are familiar with.

Nevertheless, we will stick to images for the discussion.

Imagine this: a function that can be given an image and return its chances to exist! amazing, No?

Well, what could you do with such a function?

Answer: EVERYTHING

P(x)

Signal/Image Prior P(x)

What is it good for?

0 2y x v, v

0x

Denoising: The measurement is

and we are trying to recover

Region where P(x) is high

0x

2y x yRecall that for

random noise we have E{(y-x0)Tx0}=0



0 2y x v, v

0x

2xx̂ ArgMax P(x) s.t. y x

x y

2

2x̂ E x | y x xP(x)dx

Denoising: The measurement is

and we are trying to recover :

Option 1: MAP

Option 2: MMSE



0x

Inverse Problems: The measurement is

and we are trying to recover , as before.

H could be blur, projection, downscaling, subsampling, …

0 2y x v, v H

2xx̂ ArgMax P(x) s.t. y x H



Compression: We are given x and a budget of B bits. Our goal is to get the best possible compression (i.e. minimize the error). The approach we take is to divide the whole domain into 2B disjoint sets (Voronoi) and minimize the error w.r.t. the representation vectors (VQ):

B

k x Sk k

22

k 2x k 1

Min x x P(x)dx



Sampling: Our goal is to propose sampling and reconstruction strategies, each (or just the first) is parameterized, and optimize the parameters for the smallest possible error:

2

2Min x Reconst Sample x P(x)dx



Separation: We are given

Where and are two different signals from two different distributions, and our goal is to separate the signal into its ingredients:

1 2 2y x x v, v

2x1x

1 2

1 21 2 1 2 2x ,x

ArgMax P (x ) P (x ) s.t. y x x



Anomaly Detection: We are given x and we are supposed to say if it is an anomaly. This is done by testing

P(x) T


Question: What is it good for?Answer: Many great things.

P(x)=?

smoothness

22

x~ e L

for imagesP(x) ? W

LS

22

WLx~ e

wavelet1

x~ e T

LearnT 1x x~ e R

FoE

k kk

x~ e L

Robust stat. 1

x~ e L

PDE TV(x)~ e

Transform22

Tx~ e

time70’s 80’s 90’s 00’s

Major themes: - L2 → L1 - Linear vs. Non-

Linear Approx.- Training on examples- Random generator for x

sparse00~ e

where x

D

The Evolution of Priors for Images

GMM,Co-Sparse Analysis, Low-Rank, …


Here is an untold secret:

The vast literature in image processing over the past 4-5 decades is

NOTHING BUT

an evolution of ideas on the identity of P(x), and ways to use it in actual tasks

By the way, the same is true for many other data sources and signals …

Linear Versus Non-Linear Approximation

22

xP(x) ~ e

T

2 222x

1 1min x y x

2 2 T

1

k

1 1

n

0 0

100

0 0

100

Suppose that our prior is the following (T is unitary):

The matrix weights the transform elements:

Our goal: Denoising a signal with this prior by solving


2 222x

1 1min x y x

2 2 T

2 21

2n

1 1T T T 2opt

1 11 1

T T

1 1

1011

x̂ I y I y

y y

T T T T

T T T T

The solution is given by

Implication: We leave the transform coefficients with the small weights and remove the ones with the high weight. The decision who survives the process is fixed by - This is Linear Approximation


1x

P(x) ~ e T

2

12x

1min x y x

2 T

Suppose now that our prior is the following (T is unitary):

Our goal: Denoising a signal with this prior by solving

We have seen that the solution for this problem is given by soft shrinkage

Implications: Just like before, we filter the signal in the transform domain. This time we leave the dominant coefficients and discard of the small ones. This is known as Nonlinear Approximation.

Topt Sx̂ yT T

MGenerator of signals from

P(x)x

Draw k0 locations, and generate the representation

Multiply by the dictionary

D

Draw k0 – the cardinality of the representation

kP(k) ~ e

Draw k0 non-zero

values22

vP(v) ~ e

Add random iid (model) noise e

x+ =

e x D

0 20 2

2

P(x) ~ e

where x ~ 0,

D I

Sparse-Land

Sparse-Land Signal Generation

00P(x) ~ e

where x D

00P(x) ~ e where x

D

Assume no noise in the model and that D is square and invertible

The Sparse-land model generalizes the previous method by (1) adopting over-completeness***, and (2) daring to work with true sparsity and L0

01 00 0

x xP(x) ~ e e

D T

*** What about redundant T? this will be addressed later!

Sparse-Land vs. Earlier Models

Geometrical Insight

0x

n N1 2 3 N0 0 0 0

| | | |

x x x x

| |

x

|

x x

|

x

E

The effective rank d of E (found by SVD) is expected to be very low:

d<<n

This is universally true for signals we operate on

The orientation and dimension of this subspace changes form one

point to another (smoothly?)

Geometrical Insight – Implications

0x

Given a noisy version of x0

How shall we denoise it?

By projecting to the subspace around x0 (chicken and egg)

How come y is not on the subspace itself?

1. The relative volume of the subspace is negligible

2. Recall that E{(y-x0)Tx0}=0

0y vx y

Geometrical Insight – Denoising in Practice

0x

Given a noisy version of x0

How shall we denoise it?

1. Non-parametric: Nearest Neighbor (NN), or K-NN

2. Local-Parametric: Group neighbors, estimate the subspace and project

3. Parametric: Cluster the DB into K subgroups, and estimate a subspace per each. When a signal is to be denoised, assign it to the closest subgroup, and the project on the corresponding subspace (K=1: PCA)

4. Sparse-Land: one dictionary encapsulates many such clusters

0y vx

y

Union of Subspaces (UoS) We said that with Sparse-Land, one dictionary encapsulates many

such clusters

Consider all the signals x that emerge from the same k atoms in D – all of them reside in the same subspace, spanned by these columns.

Thus, every possible support (and there are m-choose-k of them) represent one such subspace which the signal could belong to.

The pursuit task: Given a noisy signal we are searching the “closest subspace” and projecting onto it. It is so hard because of the number of the subspaces involved in this union.

=

Sparse-Land

00P(x) ~ e

where x

D

Processing Sparse-Land Signals 00P(x) ~ e where x

D

Objective Given Data

Goal

Most effective transform - getting the sparsest possible set of iid coefficients

Signal Transform

Cleanest possible signalSignal Denoising

We have a budget of B bits and we want to best represent the signal

Compress

Treat blur, subsampling, missing values, projection, compressed-sensing

Inverse Problem

The two signals are form different sources and thus have different models

Separate

0min s.t. x Dx

2

y x v

v

0 2

min s.t. y D

x

[bits]

2

0

min y

s.t. B

D

2

y x v

v

H0 2

min s.t. y HD

1 2

2

y x x

v, v

1 20 0

1 21 2 2

min

s.t. y

D D

00P(x) ~ e where x

D

Objective Given Data

Goal

Most effective transform - getting the sparsest possible set of iid coefficients

Signal Transform

Cleanest possible signalSignal Denoising

We have a budget of B bits and we want to best represent the signal

Compress

Treat blur, subsampling, missing values, projection, compressed-sensing

Inverse Problem

The two signals are form different sources and thus have different models

Separate

0min s.t. x Dx

2

y x v

v

0 2

min s.t. y D

x

[bits]

2

0

min y

s.t. B

D

2

y x v

v

H0 2

min s.t. y HD

1 2

2

y x x

v, v

1 20 0

1 21 2 2

min

s.t. y

D D

Processing Sparse-Land Signals

All these (and other) processing methods boil down to the solution of

For which we now know that1. It is theoretically sensible, and 2. There are numerical ways to handle it

0P

To Summarize

The Sparse-Land forms a general Union of Subspaces, all encapsulated by the concise matrix D.

This follows many earlier work that aims to model signals using a union of subspaces (or mixture of Gaussians – think about it – it is the same).

Sparse-Land is Rooted on solid modeling ideas , while improving on them due to its generality and

it solid mathematical foundations

Image Priors and the Sparse-Land Model. Lets Start with a Virtual Experiment … Suppose that we take a VERY LARGE set of small images – say that we have.

Documents

mmse signalimage prior

signalimage prior pxhere

pxsignalimage prior

x sparse

image processing

different signals

images wls wavelet

large set of small images