Top Banner
Dense slides. For future reference: http://www.slideshare.net/GaelVaroquaux A hand-waving introduction to sparsity for compressed tomography reconstruction Ga¨ el Varoquaux and Emmanuelle Gouillart
55

A hand-waving introduction to sparsity for compressed tomography reconstruction

Jun 21, 2015

Download

Technology

GaelVaroquaux

An introduction to the basic concepts needed to understand sparse reconstruction in computed tomography.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A hand-waving introduction to sparsity for compressed tomography reconstruction

Dense slides. For future reference: http://www.slideshare.net/GaelVaroquaux

A hand-waving introduction to sparsityfor compressed tomography reconstruction

Gael Varoquaux and Emmanuelle Gouillart

Page 2: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 Sparsity for inverse problems

2 Mathematical formulation

3 Choice of a sparse representation

4 Optimization algorithms

G Varoquaux 2

Page 3: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 Sparsity for inverse problems

Problem setting

Intuitions

G Varoquaux 3

Page 4: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 Tomography reconstruction: a linear problem

y = A x

0 50 100 150 200 250

0

20

40

60

80

y ∈ Rn, A ∈ Rn×p, x ∈ Rp

n ∝ number of projectionsp: number of pixels in reconstructed image

We want to find x knowing A and y

G Varoquaux 4

Page 5: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 Small n: an ill-posed linear problem

y = A x admits multiple solutions

The sensing operator A has a large null space:images that give null projectionsIn particular it is blind to high spatial frequencies:

Large number of projectionsIll-conditioned problem:

“short-sighted” rather than blind,⇒ captures noise on those components

G Varoquaux 5

Page 6: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 Small n: an ill-posed linear problem

y = A x admits multiple solutions

The sensing operator A has a large null space:images that give null projectionsIn particular it is blind to high spatial frequencies:

Large number of projectionsIll-conditioned problem:

“short-sighted” rather than blind,⇒ captures noise on those components

G Varoquaux 5

Page 7: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 A toy example: spectral analysisRecovering the frequency spectrum

Signal Freq. spectrum

signal = A · frequencies

0 10 20 30 40 50 60

02468 ...

G Varoquaux 6

Page 8: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 A toy example: spectral analysisSub-sampling

Signal Freq. spectrum

signal = A · frequencies

0 10 20 30 40 50 60

02468 ...

Recovery problem becomes ill-posed

G Varoquaux 6

Page 9: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 A toy example: spectral analysisProblem: aliasing

Signal Freq. spectrum

Information in the null-space of A is lost

Solution: incoherent measurementsSignal Freq. spectrum

i.e. careful choice of null-space of A

G Varoquaux 7

Page 10: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 A toy example: spectral analysisProblem: aliasing

Signal Freq. spectrum

Solution: incoherent measurementsSignal Freq. spectrum

i.e. careful choice of null-space of AG Varoquaux 7

Page 11: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 A toy example: spectral analysisIncoherent measurements, but scarcity of data

Signal Freq. spectrum

The null-space of A is spread out in frequencyNot much data ⇒ large null-space= captures “noise”

Sparse Freq. spectrum

Impose sparsityFind a small number of frequenciesto explain the signal

G Varoquaux 8

Page 12: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 A toy example: spectral analysisIncoherent measurements, but scarcity of data

Signal Freq. spectrum

The null-space of A is spread out in frequencyNot much data ⇒ large null-space= captures “noise” Sparse Freq. spectrum

Impose sparsityFind a small number of frequenciesto explain the signal

G Varoquaux 8

Page 13: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 And for tomography reconstruction?

Original image Non-sparse reconstruction Sparse reconstruction

128× 128 pixels, 18 projections

http://scikit-learn.org/stable/auto_examples/applications/plot_tomography_l1_reconstruction.html

G Varoquaux 9

Page 14: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 Why does it work: a geometric explanationTwo coefficients of x not in the null-space of A:

y=

Ax

x2

x1

xtrue

The sparsest solution is in the blue crossIt corresponds to the true solution (xtrue)if the slope is > 45◦

G Varoquaux 10

Page 15: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 Why does it work: a geometric explanationTwo coefficients of x not in the null-space of A:

y=

Ax

x2

x1

xtrue

The sparsest solution is in the blue crossIt corresponds to the true solution (xtrue)if the slope is > 45◦

The cross can be replaced by its convex hull

G Varoquaux 10

Page 16: A hand-waving introduction to sparsity for compressed tomography reconstruction

1 Why does it work: a geometric explanationTwo coefficients of x not in the null-space of A:

y=

Ax

x2

x1

xtrue

The sparsest solution is in the blue crossIt corresponds to the true solution (xtrue)if the slope is > 45◦

In high dimension: large acceptable set

G Varoquaux 10

Page 17: A hand-waving introduction to sparsity for compressed tomography reconstruction

Recovery of sparse signal

Null space of sensing operator incoherentwith sparse representation

⇒ Excellent sparse recovery with little projections

Minimum number of observations necessary:nmin ∼ k log p, with k : number of non zeros

[Candes 2006]Rmk Theory for i.i.d. samples

Related to “compressive sensing”G Varoquaux 11

Page 18: A hand-waving introduction to sparsity for compressed tomography reconstruction

2 Mathematical formulation

Variational formulation

Introduction of noise

G Varoquaux 12

Page 19: A hand-waving introduction to sparsity for compressed tomography reconstruction

2 Maximizing the sparsity`0 number of non-zeros

minx `0(x) s.t. y = A x

y=

Ax

x2

x1

xtrue

“Matching pursuit” problem [Mallat, Zhang 1993]“Orthogonal matching pursuit” [Pati, et al 1993]

Problem: Non-convex optimizationG Varoquaux 13

Page 20: A hand-waving introduction to sparsity for compressed tomography reconstruction

2 Maximizing the sparsity`1(x) =

∑i |xi |

minx `1(x) s.t. y = A x

y=

Ax

x2

x1

xtrue

“Basis pursuit” [Chen, Donoho, Saunders 1998]

G Varoquaux 13

Page 21: A hand-waving introduction to sparsity for compressed tomography reconstruction

2 Modeling observation noise

y = A x + e e = observation noise

New formulation:minx `1(x) s.t. y = A x ‖y− A x‖2

2 ≤ ε2

Equivalent: “Lasso estimator” [Tibshirani 1996]

minx ‖y− Ax‖22 + λ `1(x)

G Varoquaux 14

Page 22: A hand-waving introduction to sparsity for compressed tomography reconstruction

2 Modeling observation noise

y = A x + e e = observation noise

New formulation:minx `1(x) s.t. y = A x ‖y− A x‖2

2 ≤ ε2

Equivalent: “Lasso estimator” [Tibshirani 1996]

minx ‖y− Ax‖22 + λ `1(x)

Data fit Penalization

x2

x1

Rmk: kink in the `1 ball creates sparsity

G Varoquaux 14

Page 23: A hand-waving introduction to sparsity for compressed tomography reconstruction

2 Probabilistic modeling: Bayesian interpretation

P(x|y) ∝ P(y|x) P(x) (?)

“Posterior”Quantity of interest

Forward model “Prior”Expectations on x

Forward model: y = A x + e, e: Gaussian noise⇒ P(y|x) ∝ exp− 1

2σ2‖y− A x‖22

Prior: Laplacian P(x) ∝ exp− 1µ‖x‖1

Negated log of (?): 12σ2‖y− A x‖2

2 +1µ`1(x)

Maximum of posterior is Lasso estimateNote that this picture is limited and the Lasso is not a good

Bayesian estimator for the Laplace prior [Gribonval 2011].G Varoquaux 15

Page 24: A hand-waving introduction to sparsity for compressed tomography reconstruction

3 Choice of a sparserepresentation

Sparse in wavelet domain

Total variation

G Varoquaux 16

Page 25: A hand-waving introduction to sparsity for compressed tomography reconstruction

3 Sparsity in wavelet representation

Typical imagesare not sparse

Haar decompositionLevel 1 Level 2 Level 3

Level 4 Level 5 Level 6

⇒ Impose sparsity in Haar representation

A→ A H where H is the Haar transform

Original image Non-sparse reconstruction

Sparse image Sparse in Haar

G Varoquaux 17

Page 26: A hand-waving introduction to sparsity for compressed tomography reconstruction

3 Sparsity in wavelet representation

Typical imagesare not sparse

Haar decompositionLevel 1 Level 2 Level 3

Level 4 Level 5 Level 6

⇒ Impose sparsity in Haar representation

A→ A H where H is the Haar transform

Original image Non-sparse reconstruction

Sparse image Sparse in Haar

G Varoquaux 17

Page 27: A hand-waving introduction to sparsity for compressed tomography reconstruction

3 Total variation

Original image Haar wavelet TV penalization

Impose a sparse gradientminx ‖y− Ax‖2

2 + λ∑i

∥∥∥(∇x)i∥∥∥2

`12 norm: `1 norm of thegradient magnitude

Sets ∇x and ∇y to zero jointly

G Varoquaux 18

Page 28: A hand-waving introduction to sparsity for compressed tomography reconstruction

3 Total variation

Original image Error for Haar wavelet Error for TV penalization

Impose a sparse gradientminx ‖y− Ax‖2

2 + λ∑i

∥∥∥(∇x)i∥∥∥2

`12 norm: `1 norm of thegradient magnitude

Sets ∇x and ∇y to zero jointly

G Varoquaux 18

Page 29: A hand-waving introduction to sparsity for compressed tomography reconstruction

3 Total variation + interval

Original image TV penalization TV + interval

Bound x in [0, 1]minx ‖y−Ax‖2

2 +λ∑i

∥∥∥(∇x)i∥∥∥2+I([0, 1])

G Varoquaux 19

Page 30: A hand-waving introduction to sparsity for compressed tomography reconstruction

3 Total variation + interval

Original image TV penalization TV + interval

Bound x in [0, 1]

0.0 0.5 1.0

TV

TV + interval

Rmk: Constraintdoes more than fold-ing values outside ofthe range back in.

minx ‖y−Ax‖22 +λ

∑i

∥∥∥(∇x)i∥∥∥2+I([0, 1])

Histograms:

G Varoquaux 19

Page 31: A hand-waving introduction to sparsity for compressed tomography reconstruction

3 Total variation + interval

Original image Error for TV penalization Error for TV + interval

Bound x in [0, 1]

0.0 0.5 1.0

TV

TV + interval

Rmk: Constraintdoes more than fold-ing values outside ofthe range back in.

minx ‖y−Ax‖22 +λ

∑i

∥∥∥(∇x)i∥∥∥2+I([0, 1])

Histograms:

G Varoquaux 19

Page 32: A hand-waving introduction to sparsity for compressed tomography reconstruction

Analysis vs synthesis

Wavelet basis min ‖y− A H x‖22 + ‖x‖1

H Wavelet transform

Total variation min ‖y− A x‖22 + ‖Dx‖1

D Spatial derivation operator (∇)

G Varoquaux 20

Page 33: A hand-waving introduction to sparsity for compressed tomography reconstruction

Analysis vs synthesis

Wavelet basis min ‖y− A H x‖22 + ‖x‖1

H Wavelet transform“synthesis” formulation

Total variation min ‖y− A x‖22 + ‖Dx‖1

D Spatial derivation operator (∇)“analysis” formulation

Theory and algorithms easier for synthesisEquivalence iif D is invertible

G Varoquaux 20

Page 34: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Optimization algorithmsNon-smooth optimization⇒ “proximal operators”

G Varoquaux 21

Page 35: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Smooth optimization fails!

Gradient descent

Iterations

Energ

y

Gradient descent

x2

x1

Smooth optimization fails in non-smooth regionsThese are specifically the spots that interest us

G Varoquaux 22

Page 36: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding AlgorithmSettings: min f + g ; f smooth, g non-smoothf and g convex, ∇f L-Lipschitz

Typically f is the data fit term, and g the penalty

ex: Lasso 12σ2‖y− A x‖2

2 +1µ`1(x)

G Varoquaux 23

Page 37: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding AlgorithmSettings: min f + g ; f smooth, g non-smoothf and g convex, ∇f L-Lipschitz

Minimize successively:(quadratic approx of f ) + g

f (x) < f (y) +⟨x− y,∇f (y)

⟩+L

2

∥∥∥x− y∥∥∥2

2Proof: by convexity f (y) ≤ f (x) +∇f (y) (y− x)

in the second term: ∇f (y)→ ∇f (x) + (∇f (y)−∇f (x))upper bound last term with Lipschitz continuity of ∇f

xk+1 = argminx

g(x) + L2∥∥∥x− (

xk −1L∇f (xk)

)∥∥∥22

[Daubechies 2004]

G Varoquaux 23

Page 38: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding AlgorithmSettings: min f + g ; f smooth, g non-smoothf and g convex, ∇f L-Lipschitz

Minimize successively:(quadratic approx of f ) + g

f (x) < f (y) +⟨x− y,∇f (y)

⟩+L

2

∥∥∥x− y∥∥∥2

2Proof: by convexity f (y) ≤ f (x) +∇f (y) (y− x)

in the second term: ∇f (y)→ ∇f (x) + (∇f (y)−∇f (x))upper bound last term with Lipschitz continuity of ∇f

xk+1 = argminx

g(x) + L2∥∥∥x− (

xk −1L∇f (xk)

)∥∥∥22

[Daubechies 2004]

Step 1: Gradient descent on f

Step 2: Proximal operator of g :proxλg(x)

def= argminy‖y− x‖2

2 + λ g(y)

Generalization of Euclidean projectionon convex set {x, g(x) ≤ 1} Rmk: if g is the indicator function

of a set S, the proximal operatoris the Euclidean projection.

proxλ`1(x) = sign(xi)(xi − λ

)+

“soft thresholding”

G Varoquaux 23

Page 39: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Gradient descent step

G Varoquaux 24

Page 40: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Projection on `1 ball

G Varoquaux 24

Page 41: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Gradient descent step

G Varoquaux 24

Page 42: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Projection on `1 ball

G Varoquaux 24

Page 43: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Gradient descent step

G Varoquaux 24

Page 44: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Projection on `1 ball

G Varoquaux 24

Page 45: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Gradient descent step

G Varoquaux 24

Page 46: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

Projection on `1 ball

G Varoquaux 24

Page 47: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Iterative Shrinkage-Thresholding Algorithm

Iterations

Energ

y

Gradient descentISTA

x2

x1

G Varoquaux 24

Page 48: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Fast Iterative Shrinkage-Thresholding Algorithm

FISTA

Iterations

Energ

y

Gradient descentISTAFISTA

x2

x1As with conjugate gradient: add a memory term

dxk+1 = dxISTAk+1 + tk−1

tk+1(dxk − dxk−1)

t1 = 1, tk+1 =1+√

1+4 t2k

2⇒ O(k−2) convergence [Beck Teboulle 2009]

G Varoquaux 25

Page 49: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Proximal operator for total variationReformulate to smooth + non-smooth with a simpleprojection step and use FISTA: [Chambolle 2004]

proxλTVx = argminx‖y− x‖2

2 + λ∑i

∥∥∥(∇x)i∥∥∥2

Proof:“dual norm”: ‖v‖1 = max

‖z‖∞≤1〈v, z〉

div is the adjoint of ∇: 〈∇v, z〉 = 〈v,−div z〉Swap min and max and solve for x

Duality: [Boyd 2004] This proof: [Michel 2011]

G Varoquaux 26

Page 50: A hand-waving introduction to sparsity for compressed tomography reconstruction

4 Proximal operator for total variationReformulate to smooth + non-smooth with a simpleprojection step and use FISTA: [Chambolle 2004]

proxλTVx = argminx‖y− x‖2

2 + λ∑i

∥∥∥(∇x)i∥∥∥2

= argmaxz, ‖z‖∞≤1

‖λ div z + y‖22

Proof:“dual norm”: ‖v‖1 = max

‖z‖∞≤1〈v, z〉

div is the adjoint of ∇: 〈∇v, z〉 = 〈v,−div z〉Swap min and max and solve for x

Duality: [Boyd 2004] This proof: [Michel 2011]G Varoquaux 26

Page 51: A hand-waving introduction to sparsity for compressed tomography reconstruction

Sparsity for compressed tomography reconstruction

@GaelVaroquaux 27

Page 52: A hand-waving introduction to sparsity for compressed tomography reconstruction

Sparsity for compressed tomography reconstructionAdd penalizations with kinksChoice of prior/sparse representationNon-smooth optimization (FISTA)

x2

x1

Further discussion: choice of prior/parametersMinimize reconstruction error from degradeddata of gold-standard acquisitionsCross-validation: leave half of the projectionsand minimize projection error of reconstruction

Python code available:https://github.com/emmanuelle/tomo-tv

@GaelVaroquaux 27

Page 53: A hand-waving introduction to sparsity for compressed tomography reconstruction

Bibliography (1/3)

[Candes 2006] E. Candes, J. Romberg and T. Tao, Robust uncertaintyprinciples: Exact signal reconstruction from highly incomplete frequencyinformation, Trans Inf Theory, (52) 2006

[Wainwright 2009] M. Wainwright, Sharp Thresholds forHigh-Dimensional and Noisy Sparsity Recovery Using`1 constrainedquadratic programming (Lasso), Trans Inf Theory, (55) 2009

[Mallat, Zhang 1993] S. Mallat and Z. Zhang, Matching pursuits withTime-Frequency dictionaries, Trans Sign Proc (41) 1993

[Pati, et al 1993] Y. Pati, R. Rezaiifar, P. Krishnaprasad, Orthogonalmatching pursuit: Recursive function approximation with plications towavelet decomposition, 27th Signals, Systems and Computers Conf 1993

@GaelVaroquaux 28

Page 54: A hand-waving introduction to sparsity for compressed tomography reconstruction

Bibliography (2/3)

[Chen, Donoho, Saunders 1998] S. Chen, D. Donoho, M. Saunders,Atomic decomposition by basis pursuit, SIAM J Sci Computing (20) 1998

[Tibshirani 1996] R. Tibshirani, Regression shrinkage and selection via thelasso, J Roy Stat Soc B, 1996

[Gribonval 2011] R. Gribonval, Should penalized least squares regressionbe interpreted as Maximum A Posteriori estimation?, Trans Sig Proc,(59) 2011

[Daubechies 2004] I. Daubechies, M. Defrise, C. De Mol, An iterativethresholding algorithm for linear inverse problems with a sparsityconstraint, Comm Pure Appl Math, (57) 2004

@GaelVaroquaux 29

Page 55: A hand-waving introduction to sparsity for compressed tomography reconstruction

Bibliography (2/3)

[Beck Teboulle 2009], A. Beck and M. Teboulle, A fast iterativeshrinkage-thresholding algorithm for linear inverse problems, SIAM JImaging Sciences, (2) 2009

[Chambolle 2004], A. Chambolle, An algorithm for total variationminimization and applications, J Math imag vision, (20) 2004

[Boyd 2004], S. Boyd and L. Vandenberghe, Convex Optimization,Cambridge University Press 2004

— Reference on convex optimization and duality

[Michel 2011], V. Michel et al., Total variation regularization forfMRI-based prediction of behaviour, Trans Med Imag (30) 2011

— Proof of TV reformulation: appendix C

@GaelVaroquaux 30