Top Banner
Sparse Solutions of Underdetermined Linear Equations by Linear Programming David Donoho & Jared Tanner Stanford University, Department of Statistics University of Utah, Department of Mathematics Arizona State University: March 6 th 2006
62

Sparse Solutions of Underdetermined Linear Equations by ...

Mar 29, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Sparse Solutions of Underdetermined Linear Equations by Linear ProgrammingDavid Donoho & Jared Tanner
Stanford University, Department of Statistics University of Utah, Department of Mathematics
Arizona State University: March 6th 2006
Underdetermined systems, dictionary perspective
Ax = b, A ∈ R d×n, d < n
Underdetermined systems, dictionary perspective
Ax = b, A ∈ R d×n, d < n
I Least squares solution via “canonical dual” (ATA)−1AT
• Linear reconstruction, not signal adaptive • Solution vector full, n nonzero elements in x
Underdetermined systems, dictionary perspective
Ax = b, A ∈ R d×n, d < n
I Least squares solution via “canonical dual” (ATA)−1AT
• Linear reconstruction, not signal adaptive • Solution vector full, n nonzero elements in x
I Eschew redundancy, find simple model of data from A
Underdetermined systems, dictionary perspective
Ax = b, A ∈ R d×n, d < n
I Least squares solution via “canonical dual” (ATA)−1AT
• Linear reconstruction, not signal adaptive • Solution vector full, n nonzero elements in x
I Eschew redundancy, find simple model of data from A
I Seek sparsest solution, x`0 := # nonzero elements
min x`0 subject to Ax = b
Underdetermined systems, dictionary perspective
Ax = b, A ∈ R d×n, d < n
I Least squares solution via “canonical dual” (ATA)−1AT
• Linear reconstruction, not signal adaptive • Solution vector full, n nonzero elements in x
I Eschew redundancy, find simple model of data from A
I Seek sparsest solution, x`0 := # nonzero elements
min x`0 subject to Ax = b
I Combinatorial cost for naive approach
Underdetermined systems, dictionary perspective
Ax = b, A ∈ R d×n, d < n
I Least squares solution via “canonical dual” (ATA)−1AT
• Linear reconstruction, not signal adaptive • Solution vector full, n nonzero elements in x
I Eschew redundancy, find simple model of data from A
I Seek sparsest solution, x`0 := # nonzero elements
min x`0 subject to Ax = b
I Combinatorial cost for naive approach
I Efficient nonlinear (signal adaptive) methods • Greedy (local) and Basis Pursuit (global)
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
I Nonlinear selection of basis, x = (AT A)−1ATb; x`0 ≤ d
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
I Nonlinear selection of basis, x = (AT A)−1ATb; x`0 ≤ d
I Highly redundant dictionary often give fast decay of residual
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
I Nonlinear selection of basis, x = (AT A)−1ATb; x`0 ≤ d
I Highly redundant dictionary often give fast decay of residual
I Recovery of sparsest solution? • examples of arbitrary sub-optimality for a fixed dictionary A
[Temlyakov, DeVore, S. Chen, Tropp, . . . ]
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
I Nonlinear selection of basis, x = (AT A)−1ATb; x`0 ≤ d
I Highly redundant dictionary often give fast decay of residual
I Recovery of sparsest solution? • examples of arbitrary sub-optimality for a fixed dictionary A
[Temlyakov, DeVore, S. Chen, Tropp, . . . ] • residual nonzero for steps < d , irregardless of sparsity [Chen]
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
I Nonlinear selection of basis, x = (AT A)−1ATb; x`0 ≤ d
I Highly redundant dictionary often give fast decay of residual
I Recovery of sparsest solution? • examples of arbitrary sub-optimality for a fixed dictionary A
[Temlyakov, DeVore, S. Chen, Tropp, . . . ] • residual nonzero for steps < d , irregardless of sparsity [Chen]
I Recover sparsest if sufficiently sparse, O( √
d) [Tropp]
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
I Nonlinear selection of basis, x = (AT A)−1ATb; x`0 ≤ d
I Highly redundant dictionary often give fast decay of residual
I Recovery of sparsest solution? • examples of arbitrary sub-optimality for a fixed dictionary A
[Temlyakov, DeVore, S. Chen, Tropp, . . . ] • residual nonzero for steps < d , irregardless of sparsity [Chen]
I Recover sparsest if sufficiently sparse, O( √
d) [Tropp]
Greedy [Temlyakov, DeVore, Tropp, ...]
I Orthogonal Matching Pursuit: initial r = b, A = [] while r 6= 0
max `∞
AT r =: aT j r A = [A aj ] r = b − A(AT A)−1ATb
I Nonlinear selection of basis, x = (AT A)−1ATb; x`0 ≤ d
I Highly redundant dictionary often give fast decay of residual
I Recovery of sparsest solution? • examples of arbitrary sub-optimality for a fixed dictionary A
[Temlyakov, DeVore, S. Chen, Tropp, . . . ] • residual nonzero for steps < d , irregardless of sparsity [Chen]
I Recover sparsest if sufficiently sparse, O( √
d) [Tropp]
Basis Pursuit
I Rather than solve `0 (combinatorial), solve `1, (use LP)
min x`1 subject to Ax = b
• Global basis selection rather than greedy local selection
Basis Pursuit
I Rather than solve `0 (combinatorial), solve `1, (use LP)
min x`1 subject to Ax = b
• Global basis selection rather than greedy local selection
I Example, A = [A1A2] two ONB with coherence maxij(ai , aj ) • If x`0 / .914(1 + µ−1) then `1 → `0 [Elad, Bruckstein] • Coherence, µ := maxij(ai , aj) ≥ 1/
√ d , [Candes, Romberg]
I Rather than solve `0 (combinatorial), solve `1, (use LP)
min x`1 subject to Ax = b
• Global basis selection rather than greedy local selection
I Example, A = [A1A2] two ONB with coherence maxij(ai , aj ) • If x`0 / .914(1 + µ−1) then `1 → `0 [Elad, Bruckstein] • Coherence, µ := maxij(ai , aj) ≥ 1/
√ d , [Candes, Romberg]
d), signals.
Basis Pursuit
I Rather than solve `0 (combinatorial), solve `1, (use LP)
min x`1 subject to Ax = b
• Global basis selection rather than greedy local selection
I Example, A = [A1A2] two ONB with coherence maxij(ai , aj ) • If x`0 / .914(1 + µ−1) then `1 → `0 [Elad, Bruckstein] • Coherence, µ := maxij(ai , aj) ≥ 1/
√ d , [Candes, Romberg]
d), signals.
Basis Pursuit
I Rather than solve `0 (combinatorial), solve `1, (use LP)
min x`1 subject to Ax = b
• Global basis selection rather than greedy local selection
I Example, A = [A1A2] two ONB with coherence maxij(ai , aj ) • If x`0 / .914(1 + µ−1) then `1 → `0 [Elad, Bruckstein] • Coherence, µ := maxij(ai , aj) ≥ 1/
√ d , [Candes, Romberg]
d), signals.
I Is the story over? Can O( √
d) threshold be overcome? yes!
Basis Pursuit
I Rather than solve `0 (combinatorial), solve `1, (use LP)
min x`1 subject to Ax = b
• Global basis selection rather than greedy local selection
I Example, A = [A1A2] two ONB with coherence maxij(ai , aj ) • If x`0 / .914(1 + µ−1) then `1 → `0 [Elad, Bruckstein] • Coherence, µ := maxij(ai , aj) ≥ 1/
√ d , [Candes, Romberg]
d), signals.
I Is the story over? Can O( √
d) threshold be overcome? yes!
I Examples of success: partial Fourier and Laplace, x`0 / bd 2 c
Basis Pursuit
I Rather than solve `0 (combinatorial), solve `1, (use LP)
min x`1 subject to Ax = b
• Global basis selection rather than greedy local selection
I Example, A = [A1A2] two ONB with coherence maxij(ai , aj ) • If x`0 / .914(1 + µ−1) then `1 → `0 [Elad, Bruckstein] • Coherence, µ := maxij(ai , aj) ≥ 1/
√ d , [Candes, Romberg]
d), signals.
I Is the story over? Can O( √
d) threshold be overcome? yes!
I Examples of success: partial Fourier and Laplace, x`0 / bd 2 c
I More to come for typical (random) matrices
Sparsity threshold and the sampling matrix, A
Deterministic:
d) that is, success only for highly sparse signals:
I Some special cases of success, partial Fourier and Laplace, x`0 / bd
2 c [Donoho, T]
Deterministic:
d) that is, success only for highly sparse signals:
I Some special cases of success, partial Fourier and Laplace, x`0 / bd
2 c [Donoho, T]
Deterministic:
d) that is, success only for highly sparse signals:
I Some special cases of success, partial Fourier and Laplace, x`0 / bd
2 c [Donoho, T]
d threshold for most A with randomness
I Recent order bounds for random ortho-projectors • `1 → `0 if x`0 / O(d/ log(n/d)) [Candes, Tao, Romberg; Vershinin, Rudelson] • OMP→ `0 if x`0 / O(d/ log(n)) [Tropp]
Sparsity threshold and the sampling matrix, A
Deterministic:
d) that is, success only for highly sparse signals:
I Some special cases of success, partial Fourier and Laplace, x`0 / bd
2 c [Donoho, T]
d threshold for most A with randomness
I Recent order bounds for random ortho-projectors • `1 → `0 if x`0 / O(d/ log(n/d)) [Candes, Tao, Romberg; Vershinin, Rudelson] • OMP→ `0 if x`0 / O(d/ log(n)) [Tropp]
I What is the precise `1 sparsity threshold for random matrices?
Sparsity threshold and the sampling matrix, A
Deterministic:
d) that is, success only for highly sparse signals:
I Some special cases of success, partial Fourier and Laplace, x`0 / bd
2 c [Donoho, T]
d threshold for most A with randomness
I Recent order bounds for random ortho-projectors • `1 → `0 if x`0 / O(d/ log(n/d)) [Candes, Tao, Romberg; Vershinin, Rudelson] • OMP→ `0 if x`0 / O(d/ log(n)) [Tropp]
I What is the precise `1 sparsity threshold for random matrices?
I Computing random inner products, “correlation with noise”
Sparsity threshold and the sampling matrix, A
Deterministic:
d) that is, success only for highly sparse signals:
I Some special cases of success, partial Fourier and Laplace, x`0 / bd
2 c [Donoho, T]
d threshold for most A with randomness
I Recent order bounds for random ortho-projectors • `1 → `0 if x`0 / O(d/ log(n/d)) [Candes, Tao, Romberg; Vershinin, Rudelson] • OMP→ `0 if x`0 / O(d/ log(n)) [Tropp]
I What is the precise `1 sparsity threshold for random matrices?
I Computing random inner products, “correlation with noise” Why solve this problem? Are there applications?
Motivation for systems with random A
Compressed Sensing [Donoho; Candes, Tao]:
I Transform Φ with sparse signal coefficients, x = Φx . Can x be recovered with few measurements of x?
Motivation for systems with random A
Compressed Sensing [Donoho; Candes, Tao]:
I Transform Φ with sparse signal coefficients, x = Φx . Can x be recovered with few measurements of x?
I Yes, from nonadaptive measurements recover sparse coef. Sample the signal with AΦ where A is random d × n, d < n.
min Φx`1 subject to measurements AΦx = b
Motivation for systems with random A
Compressed Sensing [Donoho; Candes, Tao]:
I Transform Φ with sparse signal coefficients, x = Φx . Can x be recovered with few measurements of x?
I Yes, from nonadaptive measurements recover sparse coef. Sample the signal with AΦ where A is random d × n, d < n.
min Φx`1 subject to measurements AΦx = b
I Coming to a digital camera near you [Baranuik]
Motivation for systems with random A
Compressed Sensing [Donoho; Candes, Tao]:
I Transform Φ with sparse signal coefficients, x = Φx . Can x be recovered with few measurements of x?
I Yes, from nonadaptive measurements recover sparse coef. Sample the signal with AΦ where A is random d × n, d < n.
min Φx`1 subject to measurements AΦx = b
I Coming to a digital camera near you [Baranuik]
Phase transition as function of measurements (aspect ratio):
I Fix aspect ratio, δ = d/n ∈ (0, 1), where A ∈ R d×n
Sparsity threshold, x`0 ≤ ρ(δ)d , ρ(δ) ∈ (0, 1)
Motivation for systems with random A
Compressed Sensing [Donoho; Candes, Tao]:
I Transform Φ with sparse signal coefficients, x = Φx . Can x be recovered with few measurements of x?
I Yes, from nonadaptive measurements recover sparse coef. Sample the signal with AΦ where A is random d × n, d < n.
min Φx`1 subject to measurements AΦx = b
I Coming to a digital camera near you [Baranuik]
Phase transition as function of measurements (aspect ratio):
I Fix aspect ratio, δ = d/n ∈ (0, 1), where A ∈ R d×n
Sparsity threshold, x`0 ≤ ρ(δ)d , ρ(δ) ∈ (0, 1)
I Phase transition as n → ∞, overwhelming probability `1 → `0
Neighborliness and constrained `1 minimization
Theorem Let A be a d ×n matrix, d < n. The two properties of A are equiv.:
I The polytope AT has n vertices and is outwardly k-neighborly,
I Whenever y = Ax has a nonnegative solution x0 having at
most k nonzeros, x0 is the unique nonnegative solution to
y = Ax and so the unique solution to the constrained `1
minimization problem.
Neighborliness and constrained `1 minimization
Theorem Let A be a d ×n matrix, d < n. The two properties of A are equiv.:
I The polytope AT has n vertices and is outwardly k-neighborly,
I Whenever y = Ax has a nonnegative solution x0 having at
most k nonzeros, x0 is the unique nonnegative solution to
y = Ax and so the unique solution to the constrained `1
minimization problem.
Lemma (Neighborliness and face numbers) Suppose the polytope P = AT has n vertices and is outwardly
k-neighborly. Then
∀` = 0, . . . , k − 1, ∀ F ∈ F`(T n−1), AF ∈ F`(AT ).
Conversely, suppose that the above equation holds; then P = AT
has n vertices and is outwardly k-neighborly.
Strong threshold, random A and all x0
Expected number of faces, random ortho-projector:
Efk(AT ) = fk(T ) − 2 ∑
G∈Fd+1+2s(T )
β(F ,G )γ(G ,T )
where β and γ are internal and external angles respectively [Affentranger, Schneider]
Strong threshold, random A and all x0
Expected number of faces, random ortho-projector:
Efk(AT ) = fk(T ) − 2 ∑
G∈Fd+1+2s(T )
β(F ,G )γ(G ,T )
where β and γ are internal and external angles respectively [Affentranger, Schneider]
Theorem (Strong threshold) Let ρ < ρN(δ) and let A = Ad ,n be a uniformly-distributed random
projection from R n to R
d , with d ≥ δn. Then
Prob{f`(AT n−1) = f`(T n−1), ` = 0, . . . , bρdc} → 1, as n → ∞.
=⇒ P is k neighborly for k = b(ρN(δ) − ε)dc
Strong threshold, random A and all x0
Expected number of faces, random ortho-projector:
Efk(AT ) = fk(T ) − 2 ∑
G∈Fd+1+2s(T )
β(F ,G )γ(G ,T )
where β and γ are internal and external angles respectively [Affentranger, Schneider]
Theorem (Strong threshold) Let ρ < ρN(δ) and let A = Ad ,n be a uniformly-distributed random
projection from R n to R
d , with d ≥ δn. Then
Prob{f`(AT n−1) = f`(T n−1), ` = 0, . . . , bρdc} → 1, as n → ∞.
=⇒ P is k neighborly for k = b(ρN(δ) − ε)dc ⇒ With overwhelming probability (f`(T n−1)− Ef`(AT n−1) ≤ πne
−εn) on A, for every x0 with x0`0 ≤ b(ρN(δ) − ε)dc, y = Ax0
generates an instance of the constrained `1 minimization problem with x0 as its unique solution.
Phase Transition, Strong (non-negative)
`1 → `0 if x0`0 ≤ b(ρN(δ) − ε)dc and x ≥ 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ρN(δ)
δ
I As δ → 0 ρN(δ) ∼ [2e log(1/δ)]−1
Weak threshold, random A and most x0
Theorem (Vershik-Sporyshev) Let d = d(n) ∼ δn and let A = Ad ,n be a uniform random
projection from R n to R
d . Then for a sequence k = k(n) with
k/d ∼ ρ, ρ < ρVS (δ), we have
fk(AT n−1) = fk(T n−1)(1 + oP(1)).
Weak threshold, random A and most x0
Theorem (Vershik-Sporyshev) Let d = d(n) ∼ δn and let A = Ad ,n be a uniform random
projection from R n to R
d . Then for a sequence k = k(n) with
k/d ∼ ρ, ρ < ρVS (δ), we have
fk(AT n−1) = fk(T n−1)(1 + oP(1)).
Theorem Let A be a d × n matrix, d < n in general position. For
1 ≤ k ≤ d − 1, these two properties of A are equivalent
I The polytope P = AT has at least (1 − ε) times as many
zero-free (k − 1)-faces as T , I Among all problem instances (y ,A) generated by some
nonnegative vector x0 with at most k nonzeros, the
constrained `1 minimization recovers the sparsest solution,
except in a fraction ≤ ε of instances.
Phase Transition, Weak (non-negative)
`1 → `0 if x`0 ≤ b(ρVS (δ) − ε)dc and x ≥ 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
δ
ρvs(δ)
I Asymptotic limit of empirical tests (example shown later) I As δ → 0 ρVS(δ) ∼ [2 log(1/δ)]−1
I Typically e times less strict sparsity requirement as δ → 0
Phase Transitions, `1 → `0 if x`0 < ρ(δ)d
Two modalities from the random sampling perspective: I Weak threshold, random signal and measurement independent I Strong threshold, worst signal for a given measurement
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
δ
ρ(δ)
Non-negative signal, x , [Donoho,T]
Phase Transitions, `1 → `0 if x`0 < ρ(δ)d
Two modalities from the random sampling perspective: I Weak threshold, random signal and measurement independent I Strong threshold, worst signal for a given measurement
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
δ
ρ(δ)
Solid for non-negative, dashed for signed signal [Donoho]
Some precise numbers and implications
δ = .1 δ = .25 δ = .5 δ = .75 δ = .9
ρ+ N .060131 .087206 .133457 .198965 .266558
ρ+ W .240841 .364970 .558121 .765796 .902596
ρ±N .048802 .065440 .089416 .117096 .140416
ρ±W .188327 .266437 .384803 .532781 .677258
I For most A measure 1/10 of a non-negative signal, recovery every signal if 6% sparse, most if 24% sparse.
Some precise numbers and implications
δ = .1 δ = .25 δ = .5 δ = .75 δ = .9
ρ+ N .060131 .087206 .133457 .198965 .266558
ρ+ W .240841 .364970 .558121 .765796 .902596
ρ±N .048802 .065440 .089416 .117096 .140416
ρ±W .188327 .266437 .384803 .532781 .677258
I For most A measure 1/10 of a non-negative signal, recovery every signal if 6% sparse, most if 24% sparse.
I Half ’under-sampling’, i.e., δ = 1/2: apply `1, if non-negative solution is < 55% sparse then typically is sparsest solution.
Some precise numbers and implications
δ = .1 δ = .25 δ = .5 δ = .75 δ = .9
ρ+ N .060131 .087206 .133457 .198965 .266558
ρ+ W .240841 .364970 .558121 .765796 .902596
ρ±N .048802 .065440 .089416 .117096 .140416
ρ±W .188327 .266437 .384803 .532781 .677258
I For most A measure 1/10 of a non-negative signal, recovery every signal if 6% sparse, most if 24% sparse.
I Half ’under-sampling’, i.e., δ = 1/2: apply `1, if non-negative solution is < 55% sparse then typically is sparsest solution.
I Encode (1 − δ)n bits of info in signal of length n. Can recover with less than δρ±W (δ)n accidental, δρ±N(δ)n malicious errors. • twice redun., tolerate 19% random error, 4.4% mallitous.
Empirical verification of weak transitions
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
δ
ρ F
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Non-negative Signed signal
n = 200, 40 × 40 mesh with 60 random tests per node.
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
I if on boundary then unique for x`1 ≤ 1 and Ax = b, `1 → `0
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
I if on boundary then unique for x`1 ≤ 1 and Ax = b, `1 → `0
I if b in the interior of P = AT then ∃ x`1 < 1 with Ax = b.
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
I if on boundary then unique for x`1 ≤ 1 and Ax = b, `1 → `0
I if b in the interior of P = AT then ∃ x`1 < 1 with Ax = b. I If j ≤ k faces of T n−1 remain in P then `1 → `0 for x`0 ≤ k
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
I if on boundary then unique for x`1 ≤ 1 and Ax = b, `1 → `0
I if b in the interior of P = AT then ∃ x`1 < 1 with Ax = b. I If j ≤ k faces of T n−1 remain in P then `1 → `0 for x`0 ≤ k
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
I if on boundary then unique for x`1 ≤ 1 and Ax = b, `1 → `0
I if b in the interior of P = AT then ∃ x`1 < 1 with Ax = b. I If j ≤ k faces of T n−1 remain in P then `1 → `0 for x`0 ≤ k
ρN := Prob(f`(AT n−1) = f`(T n−1) : ` = 0, . . . , bρNdc) → 1
That is, f`(T n−1) − Ef`(AT n−1) ≤ πne
−εn
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
I if on boundary then unique for x`1 ≤ 1 and Ax = b, `1 → `0
I if b in the interior of P = AT then ∃ x`1 < 1 with Ax = b. I If j ≤ k faces of T n−1 remain in P then `1 → `0 for x`0 ≤ k
ρN := Prob(f`(AT n−1) = f`(T n−1) : ` = 0, . . . , bρNdc) → 1
That is, f`(T n−1) − Ef`(AT n−1) ≤ πne
−εn
ρW := Ef`(AT n−1) ≥ (1 − ε)f`(T n−1), ` = 0, . . . , bρW dc
Ingredients of the proofs, non-negative
Proof (main ideas):
I Given x0 ≥ 0 with x0`1 = 1 and x0`0 = k
I x0 is on a k − 1 face (say F ) of unit simplex, T n−1
I b = Ax0 either on boundary (AF ∈ Fk(AT )), or inside AT
I if on boundary then unique for x`1 ≤ 1 and Ax = b, `1 → `0
I if b in the interior of P = AT then ∃ x`1 < 1 with Ax = b. I If j ≤ k faces of T n−1 remain in P then `1 → `0 for x`0 ≤ k
ρN := Prob(f`(AT n−1) = f`(T n−1) : ` = 0, . . . , bρNdc) → 1
That is, f`(T n−1) − Ef`(AT n−1) ≤ πne
−εn
ρW := Ef`(AT n−1) ≥ (1 − ε)f`(T n−1), ` = 0, . . . , bρW dc
Robustness: Nearby sparse solution, Ax0 − b2 ≤ ε, then solve
min x1,ε`1 such that Ax1,ε − b2 ≤ ε
Then x0 − x1,ε2 ≤ C (k ,A)ε where k = x0`0 .
Summary
I Underdetermined system, Ax = b with A ∈ R d×n where d < n
I To obtain sparsest solution, x`0 , solve constrained min x`1
I Precise sparsity phase transitions, ρ(d/n) available for `1 → `0
I That is, if x`0 < ρ(d/n) · d then min x`1 → min x`0
I Surprisingly large transition, effectiveness of Basis Pursuit (`1)
Summary
I Underdetermined system, Ax = b with A ∈ R d×n where d < n
I To obtain sparsest solution, x`0 , solve constrained min x`1
I Precise sparsity phase transitions, ρ(d/n) available for `1 → `0
I That is, if x`0 < ρ(d/n) · d then min x`1 → min x`0
I Surprisingly large transition, effectiveness of Basis Pursuit (`1)
Associated Papers for non-negative case [Donoho, T]:
I Sparse Nonnegative Solution of Underdetermined Linear Equations by Linear Programming, Proc. Nat. Acc. Sci.
I Neighborliness of Randomly-Projected Simplices in High Dimensions, Proc. Nat. Acc. Sci.
• see also work by Donoho; Candes, Romberg, Tao; Tropp
Thank you for your time
Underdetermined Systems, the frame perspective
Least squares
Random sampling matrices
Phase transitions for 1 recovering 0
Summary