1 Blind Compressed Sensing Using Sparsifying Transforms Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laboratory University of Illinois at Urbana-Champaign May 29, 2015 S. Ravishankar & Y. Bresler Blind Compressed Sensing
29
Embed
Blind Compressed Sensing Using Sparsifying Transformstransformlearning.csl.illinois.edu/assets/Sai/Conference... · 2015. 5. 29. · 1 Blind Compressed Sensing Using Sparsifying Transforms
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Learning structured overcomplete transforms with block cosparsity(OCTOBOS) [IJCV, 2014]
Applications: Sparse representation, Image & Video denoising,Classification, Blind compressed sensing (BCS) for imaging.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
11
Square Transform Learning Formulation
(P1) minW ,B
Sparsification Error︷ ︸︸ ︷
N∑
j=1
‖WRjx − bj‖22 +λ
Regularizer︷ ︸︸ ︷(
0.5 ‖W ‖2F − log |detW |)
s.t. ‖bj‖0 ≤ s ∀ j
Sparsification error - measures deviation of data in transform domainfrom perfect sparsity.
Regularizer enables complete control over conditioning & scaling of W .
If ∃ (W , B) such that the condition number κ(W ) = 1, WRjx = bj ,
||bj ||0 ≤ s ∀ j ⇒ globally identifiable by solving (P1).
(P1) favors both a low sparsification error and good conditioning.
The solution to (P1) is unitary as λ→ ∞.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
12
Transform-based Blind Compressed Sensing (BCS)
(P2) minx,W ,B
Sparsification Error︷ ︸︸ ︷
N∑
j=1
‖WRjx − bj‖22+νData Fidelity︷ ︸︸ ︷
‖Ax − y‖22 +λRegularizer︷ ︸︸ ︷
v(W )
s.t.N∑
j=1
‖bj‖0 ≤ s, ‖x‖2 ≤ C .
(P2) learns W ∈ Cn×n, and reconstructs x , from only undersampledy ⇒ transform adaptive to underlying image.
v(W ) , − log |detW | + 0.5 ‖W ‖2F controls scaling and κ of W .
‖x‖2 ≤ C is an energy/range constraint. C > 0.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
13
Transform BCS: Identifiability & Uniqueness
Proposition 1
Let x ∈ Cp, and let y = Ax with A ∈ Cm×p. Suppose
‖x‖2 ≤ C
W ∈ Cn×n is a unitary transform∑N
j=1 ‖WRjx‖0 ≤ s
Further, let B denote the matrix that has WRjx as its columns.Then, (x ,W ,B) is a global minimizer of Problem (P2), i.e., it isidentifiable by solving (P2).
Given minimizer (x ,W ,B) of (P2), (x ,ΘW ,ΘB) is anotherequivalent minimizer ∀Θ s.t. ΘHΘ = I ,
∑
j ‖Θbj‖0 ≤ s. Theoptimal x is invariant to such transformations of (W ,B).
When W is constrained to be doubly sparse and unitary, uniquenesscan be guaranteed under additional (e.g., spark) conditions.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
14
Alternative Transform BCS Formulations
(P3) minx,W ,B
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22
s.t. WHW = I ,
N∑
j=1
‖bj‖0 ≤ s, ‖x‖2 ≤ C .
(P3) is also a unitary synthesis dictionary-based BCS problem,with WH the synthesis dictionary.
(P4) minx,W ,B
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22 + λ v(W ) + η2N∑
j=1
‖bj‖0
s.t. ‖x‖2 ≤ C .
S. Ravishankar & Y. Bresler Blind Compressed Sensing
15
Block Coordinate Descent (BCD) Algorithm for (P2)
(P2) solved by alternating between updating W , B, and x .
Alternate a few times between the W and B updates, beforeperforming an image update.
Sparse Coding Step solves (P2) for B with fixed x , W .
minB
N∑
j=1
‖WRjx − bj‖22 s.t.N∑
j=1
‖bj‖0 ≤ s. (3)
Cheap Solution: Let Z ∈ Cn×N be the matrix with WRjx as its
columns. Solution B = Hs(Z ) computed exactly by zeroing out allbut the s largest magnitude coefficients in Z .
S. Ravishankar & Y. Bresler Blind Compressed Sensing
16
BCD Algorithm for (P2)
Transform Update Step solves (P2) for W with fixed x , B.
minW
N∑
j=1
‖WRjx − bj‖22 + 0.5λ ‖W ‖2F − λ log |detW | (4)
Let X ∈ Cn×N be the matrix with Rjx as its columns.
Closed-form solution:
W = 0.5R
(
Σ+(
Σ2 + 2λI) 1
2
)
VHL−1 (5)
where XXH + 0.5λI = LLH , and L−1XBH has a full SVD of VΣRH .
Solution is unique if and only if XBH is non-singular.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
17
BCD Algorithm for (P2)
Image Update Step solves (P2) for x with fixed W , B.
minx
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22 s.t. ‖x‖2 ≤ C . (6)
Least squares problem with ℓ2 norm constraint.
Solution is unique as long as the set of overlapping patches cover allimage pixels.
Solve Least squares Lagrangian formulation:
minx
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22 + µ
(
‖x‖22 − C)
(7)
The optimal multiplier µ ∈ R+ is the smallest real such that‖x‖2 ≤ C . µ and x can be found cheaply.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
18
BCS Convergence Guarantees - Notations
Define the barrier function ψs(B) as
ψs(B) =
{
0,
+∞,
∑N
j=1 ‖bj‖0 ≤ s
else
χC (x) is the barrier function corresponding to ‖x‖2 ≤ C .
(P2) is equivalent to the problem of minimizing the objective
For H ∈ Cp×q , ρj(H) is the magnitude of the j th largest element(magnitude-wise) of H .
X ∈ Cn×N denotes a matrix with Rjx , 1 ≤ j ≤ N , as its columns.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
19
Transform BCS Convergence Guarantees
Theorem 1
For the sequence {W t ,B t , x t} generated by the BCD Algorithm withinitial (W 0,B0, x0), we have
{g (W t ,B t , x t)} → g∗ = g∗(W 0,B0, x0).
{W t ,B t , x t} is bounded, and all its accumulation points areequivalent, i.e., they achieve the same value g∗ of the objective.∥∥x t − x t−1
∥∥2→ 0 as t → ∞.
Every accumulation point (W ,B, x) is a critical point of gsatisfying the following partial global optimality conditions
x ∈ argminx
g (W ,B, x) (8)
W ∈ argminW
g(
W ,B, x)
, B ∈ argminB
g(
W , B , x)
(9)
S. Ravishankar & Y. Bresler Blind Compressed Sensing
20
Transform BCS Convergence Guarantees
Theorem 2
Each accumulation point (W ,B, x) of {W t ,B t , x t} also satisfies thefollowing partial local optimality conditions
g(W +∆W ,B +∆B, x) ≥g(W ,B, x) = g∗ (10)
g(W ,B +∆B, x +∆x) ≥g(W ,B, x) = g∗ (11)
The conditions each hold for all ∆x ∈ Cp, and all ∆W ∈ Cn×n satisfying‖∆W ‖F ≤ ǫ for some ǫ = ǫ(W ) > 0, and all ∆B ∈ Cn×N in R1 ∪ R2
R1. The half-space Re(tr{(WX − B)∆BH
})≤ 0.
R2. The local region defined by ‖∆B‖∞ < ρs(WX ).
Furthermore, if ‖WX‖0 ≤ s, then ∆B can be arbitrary.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
21
Global Convergence Guarantees
Proposition 2
For each initialization, the iterate sequence in the BCD algorithmconverges to an equivalence class (same objective values) of criticalpoints of the objective, that are also partial global/local minimizers.
Proposition 3
The BCD algorithm is globally convergent to a subset of the set ofcritical points of the objective. The subset includes all (W ,B, x) that areat least partial global and partial local minimizers.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
22
Computational Advantages of Transform BCS
Cost per iteration of transform BCS: O(p4NL)
N overlapping patches of size p × p, W ∈ Cn×n, n , p2.
L : # inner alternations between transform update & sparse coding.
Cost per iteration of Synthesis BCS method DLMRI3: O(p6NJ).
D ∈ Cn×K , n , p2, K ∝ n, sparsity s ∝ n.
J : # of inner iterations of dictionary learning using K-SVD4.
In practice, transform BCS converges quickly and is much cheaperfor large p.
In 3D or 4D imaging, n = p3 or p4, and the gain in computations isabout a factor n in order.
3 [Ravishankar & Bresler ’11] 4 [Aharon et al. ’06],
S. Ravishankar & Y. Bresler Blind Compressed Sensing
23
TLMRI Convergence - 4x Undersampling (s = 3.4%)
Reference Sampling mask
100
1019.45
9.5
9.55
9.6
9.65
9.7x 105
Iteration Number
Obj
ectiv
e F
unct
ion
100
10110
−2
10−1
100
101
Iteration Number (t)
∥ ∥
xt−xt−1∥ ∥
2
Objective∥∥x t − x t−1
∥∥2vs. t
S. Ravishankar & Y. Bresler Blind Compressed Sensing
24
Convergence & Learning - 4x Undersampling (s = 3.4%)
0
0.05
0.1
0.15
0.2
Zero-filling (28.94 dB) Zero-filling Error
TLMRI (32.66 dB) real (top), imaginary (bottom)parts of learnt 36× 36 W
S. Ravishankar & Y. Bresler Blind Compressed Sensing
TLMRI is up to 5.5 dB better than LDP, that uses Wavelets + TV.
TLMRI provides up to 1 dB improvement in PSNR over the PBDWS methodthat uses redundant Wavelets and trained patch-based geometric directions, andis up to 1.6 dB better than the non-local PANO method.
It is up to 0.35 dB better than DLMRI, that learns 4x overcomplete dictionary.
TLMRI is 10x faster than DLMRI, and 4x faster than the PBDWS method.
TLMRI provides the best reconstructions, and is the fastest.
5 [Lustig et al. ’07] 6 [Ning et al. ’13] 7 [Ravishankar & Bresler ’11] 8 [Qu et al. ’14]
S. Ravishankar & Y. Bresler Blind Compressed Sensing
26
Example - 2D random 5x Undersampling
Reference DLMRI (28.54 dB) TLMRI (30.47 dB)
0
0.05
0.1
0.15
0.2
0.25
0
0.05
0.1
0.15
0.2
0.25
Sampling Mask DLMRI Error TLMRI Error
S. Ravishankar & Y. Bresler Blind Compressed Sensing
27
Conclusions
We introduced a transform-based BCS framework
Proposed BCS algorithms have a low computational cost.
We provided novel convergence guarantees for the algorithms, thatdo not require any restrictive assumptions.
For CSMRI, the proposed approach is better than leading imagereconstruction methods, while being much faster.
Future work: convergence of algorithm to global minimizer &convergence rate.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
28
Thank you! Questions??
S. Ravishankar & Y. Bresler Blind Compressed Sensing
29
Convergence Guarantees - Definitions
Definition 1
Let φ : Rq 7→ (−∞,+∞] be a proper function and let z ∈ domφ. TheFrechet sub-differential of the function φ at z is the following set:
∂φ(z) ,{
h ∈ Rq : lim infb→z,b 6=z
1‖b−z‖ (φ(b) − φ(z) − 〈b − z , h〉) ≥ 0
}
(12)
If z /∈ domφ, then ∂φ(z) = ∅. The sub-differential of φ at z is defined as
∂φ(z) ,{
h ∈ Rq : ∃zk → z , φ(zk ) → φ(z), hk ∈ ∂φ(zk ) → h
}. (13)
Lemma 1
A necessary condition for z ∈ Rq to be a minimizer of the functionφ : Rq 7→ (−∞,+∞] is that z is a critical point of φ, i.e., 0 ∈ ∂φ(z). Ifφ is a convex function, this condition is also sufficient.
S. Ravishankar & Y. Bresler Blind Compressed Sensing