1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

1+eps-Approximate Sparse Recovery

Eric PriceMIT

David WoodruffIBM Almaden

Compressed Sensing• Choose an r x n matrix A• Given x 2 Rn

• Compute Ax• Output a vector y so that

|x-y|p · (1+ε) |x-xtop k|p

• xtop k is the k-sparse vector of largest magnitude coefficients of x

• p = 1 or p = 2• Minimize number r = r(n, k, ε) of “measurements”

PrA[ ] > 2/3

Previous Work

• p = 1

[IR, …] r = O(k log(n/k) / ε) (deterministic A)

• p = 2

[GLPS] r = O(k log(n/k) / ε)

In both cases, r = (k log(n/k)) [DIPW]

What is the dependence on ε?

Why 1+ε is Important

• Suppose x = ei + u

– ei = (0, 0, …, 0, 1, 0, …, 0)

– u is a random unit vector orthogonal to ei

• Consider y = 0n

– |x-y|2 = |x|2 · 21/2 ¢ |x-ei|2It’s a trivial solution!

• (1+ε)-approximate recovery fixes this

In some applications, can have 1/ε = 100, log n = 32

In some applications, can have 1/ε = 100, log n = 32

Our Results Vs. Previous Work• p = 1[IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log2(1/ ε) / ε1/2) (randomized) r = (k log(1/ε) / ε1/2)

• p = 2:[GLPS] r = O(k log(n/k) / ε) r = (k log(n/k) / ε)

Previous lower bounds (k log(n/k))Lower bounds for randomized constant probability

Comparison to Deterministic Schemes

• We get r = O~(k/ε1/2) randomized upper bound for p = 1

• We show (k log (n/k) /ε) for p = 1 for deterministic schemes

• So randomized easier than deterministic

Our Sparse-Output Results

• Output a vector y from Ax so that

|x-y|p · (1+ε) |x-xtop k|p

• Sometimes want y to be k-sparser = (k/εp)

• Both results tight up to logarithmic factors

• Recall that for non-sparse output r = £~(k/εp/2)

Talk Outline

1. O~(k / ε1/2) upper bound for p = 1

2. Lower bounds

Simplifications

• Want O~(k/ε1/2) for p = 1

• Replace k with 1– Sample 1/k fraction of coordinates– Solve the problem for k = 1 on the sample– Repeat O~(k) times independently– Combine the solutions found

ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n

ε/k, 1/n, …, 1/n

k = 1

• Assume |x-xtop|1 = 1, and xtop = ε

• First attempt– Use CountMin [CM]– Randomly partition coordinates into B buckets, maintain

sum in each bucket

Σi s.t. h(i) = 2 xi

• The expected l1-mass of “noise” in a bucket is 1/B

• If B = £(1/ε), most buckets have count < ε/2, but bucket that contains xtop has count > ε/2

• Repeat O(log n) times

Second Attempt• But we wanted O~(1/ε1/2) measurements

• Error in a bucket is 1/B, need B ¼ 1/ε

• What about CountSketch? [CCF-C]– Give each coordinate i a random ¾(i) 2 {-1,1}– Randomly partition coordinates into B buckets,

maintain Σi s.t. h(i) = j ¾(i)¢xi in j-th bucket

– Bucket error is (Σi top xi2 / B)1/2

– Is this better?

Σi s.t. h(i) = 2 ¾(i)¢xi

CountSketch

• Bucket error Err = (Σ i top xi2 / B)1/2

• All |xi| · ε and |x-xtop|1 = 1

• Σi top xi2 · 1/ ε ¢ ε2 · ε

• So Err · (ε/B)1/2 which needs to be at most ε

• Solving, B ¸ 1/ ε

• CountSketch isn’t better than CountMin

Main Idea

• We insist on using CountSketch with B = 1/ε1/2

• Suppose Err = (Σ i top xi2 / B)1/2 = ε

• This means Σ i top xi2 = ε3/2

• Forget about xtop !

• Let’s make up the mass another way

Main Idea• We have: Σ i top xi

2 = ε3/2

• Intuition: suppose all xi, i top, are the same or 0

• Then: (# non-zero)*value = 1 (# non-zero)*value2 = ε3/2

• Hence, value = ε3/2 and # non-zero = 1/ε3/2

• Sample ε-fraction of coordinates uniformly at random!– value = ε3/2 and # non-zero sampled = 1/ε1/2, so l1-contribution = ε– Find all non-zeros with O~(1/ε1/2) measurements

General Setting• Σ i top xi

2 = ε3/2

• Sj = {i | 1/4j < xi2 · 1/4j-1}

• Σ i top xi2 = ε3/2 implies there is a j for which |Sj|/4j = ~(ε3/2)

ε3/2 , …, ε3/2

4ε3/2 , …, 4ε3/2

16ε3/2 , …, 16ε3/2

…ε3/4

General Setting• If |Sj| < 1/ε1/2, then 1/4j > ε2, so 1/2j > ε, can’t happen

• Else, sample at rate 1/(|Sj| ε1/2) to get 1/ε1/2 elements of |Sj|

• l1-mass of |Sj| in sample is > ε

• Can we find the sampled elements of Sj? Use Σ i top xi2 = ε3/2

• The l22 of the sample is about ε3/2 ¢ 1/(|Sj| ε1/2) = ε/|Sj|

• Using CountSketch with 1/ε1/2 buckets:

Bucket error = sqrt{ε1/2¢ε3/2 ¢1/(|Sj| ε1/2)}

= sqrt{ε3/2/|Sj|} < 1/2j since |Sj|/4j > ε3/2

Algorithm Wrapup

• Sub-sample O(log 1/ε) times in powers of 2

• In each level of sub-sampling maintain CountSketch with O~(1/ε1/2) buckets

• Find as many heavy coordinates as you can!

• Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling

• Wouldn’t work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)

Talk Outline

1. O~(k / ε1/2) upper bound for p = 1

2. Lower bounds

Our Results

• General results:– ~(k / ε1/2) for p = 1– (k log(n/k) / ε) for p = 2

• Sparse output:– ~(k/ε) for p = 1– ~(k/ε2) for p = 2

• Deterministic:– (k log(n/k) / ε) for p = 1

Simultaneous Communication Complexity

Alice Bob

x

• Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability

• Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs

• Parties share randomness

What is f(x,y)?y

MA(x) MB(y)

• Shared randomness decides matrix A

• Alice sends Ax to referee

• Bob sends Ay to referee

• Referee computes A(x+y), uses compressed sensing recovery algorithm

• If output of algorithm solves f(x,y), then

# rows of A * # bits per measurement > CC(f)

Reduction to Compressed Sensing

A Unified View

• General results: Direct-Sum Gap-l1– ~(k / ε1/2) for p = 1 – ~(k / ε) for p = 2

• Sparse output: Indexing– ~(k/ε) for p = 1– ~(k/ε2) for p = 2

• Deterministic: Equality– (k log(n/k) / ε) for p = 1

Tighter log factors achievable by looking at Gaussian channels

General Results: k = 1, p = 1

• Alice and Bob have x, y, respectively, in Rm

• There is a unique i* for which (x+y)i* = d

For all j i*, (x+y)j 2 {0, c, -c}, where |c| < |d|

• Finding i* requires (m/(d/c)2) communication [SS, BJKS]

• m = 1/ε3/2, c = ε3/2 , d = ε• Need (1/ε1/2) communication


• But the compressed sensing algorithm doesn’t need to find i*

• If not then it needs to transmit a lot of information about the tail– Tail a random low-weight vector in {0, ε3/2, - ε3/2}1/ε3

– Uses distributional lower bound and RS codes

• Send a vector y within 1-ε of tail in l1-norm

• Needs 1/ε1/2 communication


• Same argument, different parameters

• (1/ε) communication

• What about general k?

Handling General k

• Bounded Round Direct Sum Theorem [BR] (with slight modification) given k copies of a function f, with input pairs independently drawn from ¹, solving a 2/3 fraction needs communication (k¢CC¹ (f))

ε3/2 , …, ε3/2ε1/2

ε1/2

…

ε1/2

ε3/2 , …, ε3/2

ε3/2 , …, ε3/2

} k

Instance for p = 1

Handling General k

• CC = (k/ε1/2) for p = 1

• CC = (k/ε) for p = 2

• What is implied about compressed sensing?

Rounding Matrices [DIPW]

• A is a matrix of real numbers

• Can assume orthonormal rows

• Round the entries of A to O(log n) bits, obtaining matrix A’

• Careful– A’x = A(x+s) for “small” s– But s depends on A, no guarantee recovery works– Can be fixed by looking at A(x+s+u) for random u

Lower Bounds for Compressed Sensing

• # rows of A * # bits per measurement > CC(f)

• By rounding, # bits per measurement = O(log n)

• In our hard instances, universe size = poly(k/ε)

• So # rows of A * O(log (k/ε)) > CC(f)

• # rows of A = ~(k/ε1/2) for p = 1• # rows of A = ~(k/ε) for p = 2

Sparse-Output Results

Sparse output: Indexing

– ~(k/ε) for p = 1

– ~(k/ε2) for p = 2

Sparse Output Results - Indexing

x 2 {0,1}n i 2 {1, 2, …, n}

What is xi?

CC(Indexing) = (n)

(1/ε) Bound for k=1, p = 1

x 2 {- ε, ε}1/ε y = ei

• Consider x+y

• If output is required to be 1-sparse must place mass on the i-th coordinate

• Mass must be 1+ε if xi = ε, otherwise 1-ε

Generalizes to k > 1 to give ~(k/ε)

Generalizes to p = 2 to give ~(k/ε2)

Generalizes to k > 1 to give ~(k/ε)

Generalizes to p = 2 to give ~(k/ε2)

Deterministic Results

Deterministic: Equality– (k log(n/k) / ε) for p = 1

Deterministic Results - Equality

x 2 {0,1}n

Is x = y?

Deterministic CC(Equality) = (n)

y 2 {0,1}n

(k log(n/k) / ε) for p = 1

Choose log n signals x1, …, xlog n, each with k/ε values equal to ε/k

x = Σi=1log n 10i xi

Choose log n signals y1, …, ylog n, each with k/ε values equal to ε/k

y = Σi=1log n 10i yi

Consider x-yCompressed sensing output is 0n iff x = y

General Results – Gaussian Channels (k = 1, p = 2)

• Alice has a signal x =ε1/2 ei for random i 2 [n]

• Alice transmits x over a noisy channel with independent N(0, 1/n) noise on each coordinate

• Consider any row vector a of A

• Channel output = <a,x> + <a,y>, where <a,y> is N(0, |a|22/n)

• Ei[<a,x>2] = ε |a|22/n

• Shannon-Hartley Theorem: I(i; <a,x>+<a,y>) = I(<a,x>; <a,x>+<a,y>) · ½ log(1+ ε) = O(ε)

Summary of Results

• General results– £~(k/εp/2)

• Sparse output– £~(k/εp)

• Deterministic– £(k log(n/k) / ε) for p = 1

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Documents