Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent Qing Qu ∗ , Xiao Li † , Zhihui Zhu ⋄ ∗ Center for Data Science, New York University, † EE Department, Chinese University of Hong Kong, ⋄ MINDS, the Johns Hopkins University Basic Task Given multiple y i ∈ R n of circulant convolution y i = a ⊛ x i , (1 ≤ i ≤ p), can we recover both unknown kernel a ∈ R n and sparse signal {x i } p i=1 ∈ R n simultaneously? Our Contribution With random initializations, a vanilla RGD fol- lowed by a subgradient method converges exactly to the target solution in a linear rate. Motivations in Imaging Science • Computational Microscopy Imaging. • Geophyiscs and Seismic Imaging. • Neuroscience. Calcium imaging, Functional MRI. • Image Deblurring. Symmetric Solutions in MCS-BD • Scaled & shifts of (a, x i ) are solutions to MCS-BD −11 1 −20 −1 0 2 = ⊛ ⊛ y i = αs ℓ [ a] (1/α)s −ℓ [ x i ] ⊛ - W.l.o.g., fix the scaling ∥a∥ =1 for a. - Hope to recover a up to signed shifts ±{s ℓ [ a 0 ]} n−1 ℓ=−n+1 . Assumptions & Problem Formulation • Assumptions. - Sparse signal x i : x i ∼ i.i.d. Bernoulli − Gaussian(θ ), θ ∈ (0, 1); - Invertible kernel a with its circulant matrix C a |{z} invertible = F ∗ diag ( b a) F , or | b a| > 0. • Problem Formulation. Let us denote Y = h y 1 y 2 ··· y p i , X = h x 1 x 2 ··· x p i - Let h be the inverse kernel of a, b h = b a ⊙−1 or a ⊛ h = 1, C h · Y = C h · C a | {z } = I · X = X |{z} sparse - Ideally, we want to solve the problem min q 1 np ∥C q Y ∥ 0 | {z } sparsity = 1 np p X i=1 C y i q 0 , s.t. q ̸= 0 | {z } prevent trivial solution , to recover b a =s ℓ h α b q ⊙−1 ⋆ i up to a shift-scaling symmetry. • Nonconvex Relaxation. We consider min q φ(q ) := 1 np p X i=1 H µ C y i P q | {z } smooth sparsity function , s.t. q ∈ S n−1 | {z } sphere constraint . - H µ (·) is smooth Huber loss for promoting sparsity. H µ (Z ) := n X i=1 p X j =1 h µ (Z ij ), h µ (z ) := |z | |z |≥ µ z 2 2µ + µ 2 |z | <µ , - P is a preconditioning matrix. P = 1 θnp p X i=1 C ⊤ y i C y i −1/2 ≈ C ⊤ a C a −1/2 . - Preconditioning orthogonalizes the kernel C a C y i P = C x i C a P | {z } R ≈ C x i C a C ⊤ a C a −1/2 | {z } orthogonal Q . Given C y i Pq ≈ C x i Qq and suppose Q = I , it reduces to min q f (q ) := 1 np p X i=1 H µ (C x i q ) , s.t. q ∈ S n−1 . This implies that standard basis {±e i } n i=1 are global solutions. (a) ℓ 1 -loss, ✗ (b) Huber-loss, ✗ (c) ℓ 4 -loss, ✗ (d) ℓ 1 -loss, ✓ (e) Huber-loss, ✓ (f) ℓ 4 -loss, ✓ Geometric Property Study optimization landscape for union of sets S i± ξ := q ∈ S n−1 | |q i | ∥q −i ∥ ∞ ≥ q 1+ ξ, q i ≷ 0 , for some ξ ∈ (0, +∞), where for each set - It contains exactly one solution ±e i ; - It excludes all saddle points; - For some small ξ = 1 5 log n , random initialization falls in one S i± ξ with Prob. ≥ 1/2. e 1 e 2 -e 2 e 3 -e 3 ξ =0 ξ = 5 log(n) • Regularity Condition. With p ≥ Ω (poly(n)), w.h.p. ⟨grad f (q ),q i q − e i ⟩≥ α(q ) ·∥q − e i ∥ holds for each S i+ ξ (1 ≤ i ≤ n) with α> 0 and for all q ∈S i+ ξ ∩ q ∈ S n−1 | q 1 − q 2 i ≥ µ . • Implicit Regularization. With p ≥ Ω (poly(n)), w.h.p. * grad f (q ), 1 q j e j − 1 q i e i + ≥ c θ (1 − θ ) n ξ 1+ ξ , for all q ∈S i+ ξ and any q j such that j ̸= i and q 2 j ≥ 1 3 q 2 i . From Geometry to Optimization • Random Initialization. Draw q (0) ∼U (S n−1 ), s.t. P q ∈ n [ i=1 S i± ξ ≥ 1/2 • Phase I: Riemannian Gradient Descent (RGD). q (k +1) = P S n−1 q (k ) − τ · grad f (q (k ) ) , with small fixed τ , stays in S i± ξ thanks to implicit regularization. RGD produces a solution q ⋆ with q ⋆ − q tgt ≤O (µ) in a linear rate, thanks to regularity condition. • Phase II: Rounding. With r = q ⋆ , solve min q ζ (q ) := 1 np p X i=1 C y i Pq 1 , s.t. ⟨r , q ⟩ =1 via projected subgradient descent q (k +1) = q (k ) − τ (k ) ·P r ⊥ g (k ) , with τ (k +1) = βτ (k ) and β ∈ (0, 1), it converges linearly q (k ) − q tgt ≤ η k , η ∈ (0, 1), thanks to local sharpness of ζ (q ). Comparison with Literature Experiments • Algorithmic Convergence and recovery with varying θ (a) Comparison of iterate convergence (b) Recovery prob. with varying θ • Phase Transition on (p, n) (a) ℓ 1 -loss (b) Huber-loss (c) ℓ 4 -loss • Experiments on STORM Imaging (a) Observation (b) Ground truth (c) Huber-loss (d) ℓ 4 -loss (e) Kernel: Ground truth (f) Kernel: Huber-loss (g) Kernel: ℓ 4 -loss References [1] Q. Qu, X. Li, and Z. Zhu “A nonconvex approach for exact and efficient multichannel sparse blind deconvolution”, NeurIPS, 2019. [2] Y. Li, and Y. Bresler, “Multichannel sparse blind deconvolution on the sphere”. NeurIPS, 2018. [3]L. Wang, and Y. Chi, “Blind Deconvolution From Multiple Sparse Inputs”. IEEE SPL, 2016.