A hand-waving introduction to sparsity for compressed tomography reconstruction

Dense slides. For future reference: http://www.slideshare.net/GaelVaroquaux

A hand-waving introduction to sparsityfor compressed tomography reconstruction

Gael Varoquaux and Emmanuelle Gouillart

http://www.slideshare.net/GaelVaroquaux

1 Sparsity for inverse problems

2 Mathematical formulation

3 Choice of a sparse representation

4 Optimization algorithms

G Varoquaux 2

1 Sparsity for inverse problems

Problem setting

Intuitions

G Varoquaux 3

1 Tomography reconstruction: a linear problem

y = A x

0 50 100 150 200 250

0

20

40

60

80

y ∈ Rn, A ∈ Rn×p, x ∈ Rp

n ∝ number of projectionsp: number of pixels in reconstructed image

We want to find x knowing A and y

G Varoquaux 4

1 Small n: an ill-posed linear problem

y = A x admits multiple solutions

The sensing operator A has a large null space:images that give null projectionsIn particular it is blind to high spatial frequencies:

Large number of projectionsIll-conditioned problem:

“short-sighted” rather than blind,⇒ captures noise on those components

G Varoquaux 5

1 Small n: an ill-posed linear problem

y = A x admits multiple solutions

The sensing operator A has a large null space:images that give null projectionsIn particular it is blind to high spatial frequencies:

Large number of projectionsIll-conditioned problem:

“short-sighted” rather than blind,⇒ captures noise on those components

G Varoquaux 5

1 A toy example: spectral analysisRecovering the frequency spectrum

Signal Freq. spectrum

signal = A · frequencies

0 10 20 30 40 50 60

02468 ...

G Varoquaux 6

1 A toy example: spectral analysisSub-sampling


signal = A · frequencies

0 10 20 30 40 50 60

02468 ...

Recovery problem becomes ill-posed

G Varoquaux 6

1 A toy example: spectral analysisProblem: aliasing


Information in the null-space of A is lost

Solution: incoherent measurementsSignal Freq. spectrum

i.e. careful choice of null-space of A

G Varoquaux 7

1 A toy example: spectral analysisProblem: aliasing


Solution: incoherent measurementsSignal Freq. spectrum

i.e. careful choice of null-space of AG Varoquaux 7

1 A toy example: spectral analysisIncoherent measurements, but scarcity of data


The null-space of A is spread out in frequencyNot much data ⇒ large null-space= captures “noise”

Sparse Freq. spectrum

Impose sparsityFind a small number of frequenciesto explain the signal

G Varoquaux 8

1 A toy example: spectral analysisIncoherent measurements, but scarcity of data


The null-space of A is spread out in frequencyNot much data ⇒ large null-space= captures “noise” Sparse Freq. spectrum

Impose sparsityFind a small number of frequenciesto explain the signal

G Varoquaux 8

1 And for tomography reconstruction?

Original image Non-sparse reconstruction Sparse reconstruction

128× 128 pixels, 18 projections

http://scikit-learn.org/stable/auto_examples/applications/plot_tomography_l1_reconstruction.html

G Varoquaux 9



1 Why does it work: a geometric explanationTwo coefficients of x not in the null-space of A:

y=

Ax

x2

x1

xtrue

The sparsest solution is in the blue crossIt corresponds to the true solution (xtrue)if the slope is > 45◦

G Varoquaux 10


y=

Ax

x2

x1

xtrue


The cross can be replaced by its convex hull

G Varoquaux 10


y=

Ax

x2

x1

xtrue


In high dimension: large acceptable set

G Varoquaux 10

Recovery of sparse signal

Null space of sensing operator incoherentwith sparse representation

⇒ Excellent sparse recovery with little projections

Minimum number of observations necessary:nmin ∼ k log p, with k : number of non zeros

[Candes 2006]Rmk Theory for i.i.d. samples

Related to “compressive sensing”G Varoquaux 11

2 Mathematical formulation

Variational formulation

Introduction of noise

G Varoquaux 12

2 Maximizing the sparsity`0 number of non-zeros

minx `0(x) s.t. y = A x

y=

Ax

x2

x1

xtrue

“Matching pursuit” problem [Mallat, Zhang 1993]“Orthogonal matching pursuit” [Pati, et al 1993]

Problem: Non-convex optimizationG Varoquaux 13

2 Maximizing the sparsity`1(x) =

∑i |xi |

minx `1(x) s.t. y = A x

y=

Ax

x2

x1

xtrue

“Basis pursuit” [Chen, Donoho, Saunders 1998]

G Varoquaux 13

2 Modeling observation noise

y = A x + e e = observation noise

New formulation:minx `1(x) s.t. y = A x ‖y− A x‖2

2 ≤ ε2

Equivalent: “Lasso estimator” [Tibshirani 1996]

minx ‖y− Ax‖22 + λ `1(x)

G Varoquaux 14

2 Modeling observation noise

y = A x + e e = observation noise

New formulation:minx `1(x) s.t. y = A x ‖y− A x‖2

2 ≤ ε2

Equivalent: “Lasso estimator” [Tibshirani 1996]

minx ‖y− Ax‖22 + λ `1(x)

Data fit Penalization

x2

x1

Rmk: kink in the `1 ball creates sparsity

G Varoquaux 14

2 Probabilistic modeling: Bayesian interpretation

P(x|y) ∝ P(y|x) P(x) (?)

“Posterior”Quantity of interest

Forward model “Prior”Expectations on x

Forward model: y = A x + e, e: Gaussian noise⇒ P(y|x) ∝ exp− 1

2σ2‖y− A x‖22

Prior: Laplacian P(x) ∝ exp− 1µ‖x‖1

Negated log of (?): 12σ2‖y− A x‖2

2 +1µ`1(x)

Maximum of posterior is Lasso estimateNote that this picture is limited and the Lasso is not a good

Bayesian estimator for the Laplace prior [Gribonval 2011].G Varoquaux 15

3 Choice of a sparserepresentation

Sparse in wavelet domain

Total variation

G Varoquaux 16

3 Sparsity in wavelet representation

Typical imagesare not sparse

Haar decompositionLevel 1 Level 2 Level 3

Level 4 Level 5 Level 6

⇒ Impose sparsity in Haar representation

A→ A H where H is the Haar transform

Original image Non-sparse reconstruction

Sparse image Sparse in Haar

G Varoquaux 17

3 Sparsity in wavelet representation

Typical imagesare not sparse

Haar decompositionLevel 1 Level 2 Level 3

Level 4 Level 5 Level 6

⇒ Impose sparsity in Haar representation

A→ A H where H is the Haar transform

Original image Non-sparse reconstruction

Sparse image Sparse in Haar

G Varoquaux 17

3 Total variation

Original image Haar wavelet TV penalization

Impose a sparse gradientminx ‖y− Ax‖2

2 + λ∑i

∥∥∥(∇x)i∥∥∥2

`12 norm: `1 norm of thegradient magnitude

Sets ∇x and ∇y to zero jointly

G Varoquaux 18

3 Total variation

Original image Error for Haar wavelet Error for TV penalization

Impose a sparse gradientminx ‖y− Ax‖2

2 + λ∑i

∥∥∥(∇x)i∥∥∥2

`12 norm: `1 norm of thegradient magnitude

Sets ∇x and ∇y to zero jointly

G Varoquaux 18

3 Total variation + interval

Original image TV penalization TV + interval

Bound x in [0, 1]minx ‖y−Ax‖2

2 +λ∑i

∥∥∥(∇x)i∥∥∥2+I([0, 1])

G Varoquaux 19


Original image TV penalization TV + interval

Bound x in [0, 1]

0.0 0.5 1.0

TV

TV + interval

Rmk: Constraintdoes more than fold-ing values outside ofthe range back in.

minx ‖y−Ax‖22 +λ

∑i

∥∥∥(∇x)i∥∥∥2+I([0, 1])

Histograms:

G Varoquaux 19


Original image Error for TV penalization Error for TV + interval

Bound x in [0, 1]

0.0 0.5 1.0

TV

TV + interval

Rmk: Constraintdoes more than fold-ing values outside ofthe range back in.

minx ‖y−Ax‖22 +λ

∑i

∥∥∥(∇x)i∥∥∥2+I([0, 1])

Histograms:

G Varoquaux 19

Analysis vs synthesis

Wavelet basis min ‖y− A H x‖22 + ‖x‖1

H Wavelet transform

Total variation min ‖y− A x‖22 + ‖Dx‖1

D Spatial derivation operator (∇)

G Varoquaux 20

Analysis vs synthesis

Wavelet basis min ‖y− A H x‖22 + ‖x‖1

H Wavelet transform“synthesis” formulation

Total variation min ‖y− A x‖22 + ‖Dx‖1

D Spatial derivation operator (∇)“analysis” formulation

Theory and algorithms easier for synthesisEquivalence iif D is invertible

G Varoquaux 20

4 Optimization algorithmsNon-smooth optimization⇒ “proximal operators”

G Varoquaux 21

4 Smooth optimization fails!

Gradient descent

Iterations

Energ

y

Gradient descent

x2

x1

Smooth optimization fails in non-smooth regionsThese are specifically the spots that interest us

G Varoquaux 22

4 Iterative Shrinkage-Thresholding AlgorithmSettings: min f + g ; f smooth, g non-smoothf and g convex, ∇f L-Lipschitz

Typically f is the data fit term, and g the penalty

ex: Lasso 12σ2‖y− A x‖2

2 +1µ`1(x)

G Varoquaux 23


Minimize successively:(quadratic approx of f ) + g

f (x) < f (y) +⟨x− y,∇f (y)

⟩+L

2

∥∥∥x− y∥∥∥2

2Proof: by convexity f (y) ≤ f (x) +∇f (y) (y− x)

in the second term: ∇f (y)→ ∇f (x) + (∇f (y)−∇f (x))upper bound last term with Lipschitz continuity of ∇f

xk+1 = argminx

g(x) + L2∥∥∥x− (

xk −1L∇f (xk)

)∥∥∥22

[Daubechies 2004]

G Varoquaux 23


Minimize successively:(quadratic approx of f ) + g

f (x) < f (y) +⟨x− y,∇f (y)

⟩+L

2

∥∥∥x− y∥∥∥2

2Proof: by convexity f (y) ≤ f (x) +∇f (y) (y− x)

in the second term: ∇f (y)→ ∇f (x) + (∇f (y)−∇f (x))upper bound last term with Lipschitz continuity of ∇f

xk+1 = argminx

g(x) + L2∥∥∥x− (

xk −1L∇f (xk)

)∥∥∥22

[Daubechies 2004]

Step 1: Gradient descent on f

Step 2: Proximal operator of g :proxλg(x)

def= argminy‖y− x‖2

2 + λ g(y)

Generalization of Euclidean projectionon convex set {x, g(x) ≤ 1} Rmk: if g is the indicator function

of a set S, the proximal operatoris the Euclidean projection.

proxλ`1(x) = sign(xi)(xi − λ

)+

“soft thresholding”

G Varoquaux 23

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Gradient descent step

G Varoquaux 24


Iterations

Energ

y


x2

x1

Projection on `1 ball

G Varoquaux 24


Iterations

Energ

y


x2

x1


G Varoquaux 24


Iterations

Energ

y


x2

x1


G Varoquaux 24


Iterations

Energ

y


x2

x1


G Varoquaux 24


Iterations

Energ

y


x2

x1


G Varoquaux 24


Iterations

Energ

y


x2

x1


G Varoquaux 24


Iterations

Energ

y


x2

x1


G Varoquaux 24


Iterations

Energ

y


x2

x1

G Varoquaux 24

4 Fast Iterative Shrinkage-Thresholding Algorithm

FISTA

Iterations

Energ

y

Gradient descentISTAFISTA

x2

x1As with conjugate gradient: add a memory term

dxk+1 = dxISTAk+1 + tk−1

tk+1(dxk − dxk−1)

t1 = 1, tk+1 =1+√

1+4 t2k

2⇒ O(k−2) convergence [Beck Teboulle 2009]

G Varoquaux 25

4 Proximal operator for total variationReformulate to smooth + non-smooth with a simpleprojection step and use FISTA: [Chambolle 2004]

proxλTVx = argminx‖y− x‖2

2 + λ∑i

∥∥∥(∇x)i∥∥∥2

Proof:“dual norm”: ‖v‖1 = max

‖z‖∞≤1〈v, z〉

div is the adjoint of ∇: 〈∇v, z〉 = 〈v,−div z〉Swap min and max and solve for x

Duality: [Boyd 2004] This proof: [Michel 2011]

G Varoquaux 26

4 Proximal operator for total variationReformulate to smooth + non-smooth with a simpleprojection step and use FISTA: [Chambolle 2004]

proxλTVx = argminx‖y− x‖2

2 + λ∑i

∥∥∥(∇x)i∥∥∥2

= argmaxz, ‖z‖∞≤1

‖λ div z + y‖22

Proof:“dual norm”: ‖v‖1 = max

‖z‖∞≤1〈v, z〉

div is the adjoint of ∇: 〈∇v, z〉 = 〈v,−div z〉Swap min and max and solve for x

Duality: [Boyd 2004] This proof: [Michel 2011]G Varoquaux 26

Sparsity for compressed tomography reconstruction

@GaelVaroquaux 27

Sparsity for compressed tomography reconstructionAdd penalizations with kinksChoice of prior/sparse representationNon-smooth optimization (FISTA)

x2

x1

Further discussion: choice of prior/parametersMinimize reconstruction error from degradeddata of gold-standard acquisitionsCross-validation: leave half of the projectionsand minimize projection error of reconstruction

Python code available:https://github.com/emmanuelle/tomo-tv

@GaelVaroquaux 27

https://github.com/emmanuelle/tomo-tv

Bibliography (1/3)

[Candes 2006] E. Candes, J. Romberg and T. Tao, Robust uncertaintyprinciples: Exact signal reconstruction from highly incomplete frequencyinformation, Trans Inf Theory, (52) 2006

[Wainwright 2009] M. Wainwright, Sharp Thresholds forHigh-Dimensional and Noisy Sparsity Recovery Using`1 constrainedquadratic programming (Lasso), Trans Inf Theory, (55) 2009

[Mallat, Zhang 1993] S. Mallat and Z. Zhang, Matching pursuits withTime-Frequency dictionaries, Trans Sign Proc (41) 1993

[Pati, et al 1993] Y. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonalmatching pursuit: Recursive function approximation with plications towavelet decomposition, 27th Signals, Systems and Computers Conf 1993

@GaelVaroquaux 28

Bibliography (2/3)

[Chen, Donoho, Saunders 1998] S. Chen, D. Donoho, M. Saunders,Atomic decomposition by basis pursuit, SIAM J Sci Computing (20) 1998

[Tibshirani 1996] R. Tibshirani, Regression shrinkage and selection via thelasso, J Roy Stat Soc B, 1996

[Gribonval 2011] R. Gribonval, Should penalized least squares regressionbe interpreted as Maximum A Posteriori estimation?, Trans Sig Proc,(59) 2011

[Daubechies 2004] I. Daubechies, M. Defrise, C. De Mol, An iterativethresholding algorithm for linear inverse problems with a sparsityconstraint, Comm Pure Appl Math, (57) 2004

@GaelVaroquaux 29

Bibliography (2/3)

[Beck Teboulle 2009], A. Beck and M. Teboulle, A fast iterativeshrinkage-thresholding algorithm for linear inverse problems, SIAM JImaging Sciences, (2) 2009

[Chambolle 2004], A. Chambolle, An algorithm for total variationminimization and applications, J Math imag vision, (20) 2004

[Boyd 2004], S. Boyd and L. Vandenberghe, Convex Optimization,Cambridge University Press 2004

— Reference on convex optimization and duality

[Michel 2011], V. Michel et al., Total variation regularization forfMRI-based prediction of behaviour, Trans Med Imag (30) 2011

— Proof of TV reformulation: appendix C

@GaelVaroquaux 30

A hand-waving introduction to sparsity for compressed tomography reconstruction

Technology

coecients of x

large null space

sparsity1 x

null projections

data large nullspace

spectrum solution

x forward model

sparse representation