Top Banner
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden
37

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Mar 26, 2015

Download

Documents

Antonio Brown
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

1+eps-Approximate Sparse Recovery

Eric PriceMIT

David WoodruffIBM Almaden

Page 2: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Compressed Sensing• Choose an r x n matrix A• Given x 2 Rn

• Compute Ax• Output a vector y so that

|x-y|p · (1+ε) |x-xtop k|p

• xtop k is the k-sparse vector of largest magnitude coefficients of x

• p = 1 or p = 2• Minimize number r = r(n, k, ε) of “measurements”

PrA[ ] > 2/3

Page 3: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Previous Work

• p = 1

[IR, …] r = O(k log(n/k) / ε) (deterministic A)

• p = 2

[GLPS] r = O(k log(n/k) / ε)

In both cases, r = (k log(n/k)) [DIPW]

What is the dependence on ε?

Page 4: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Why 1+ε is Important

• Suppose x = ei + u

– ei = (0, 0, …, 0, 1, 0, …, 0)

– u is a random unit vector orthogonal to ei

• Consider y = 0n

– |x-y|2 = |x|2 · 21/2 ¢ |x-ei|2It’s a trivial solution!

• (1+ε)-approximate recovery fixes this

In some applications, can have 1/ε = 100, log n = 32

In some applications, can have 1/ε = 100, log n = 32

Page 5: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Our Results Vs. Previous Work• p = 1[IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log2(1/ ε) / ε1/2) (randomized) r = (k log(1/ε) / ε1/2)

• p = 2:[GLPS] r = O(k log(n/k) / ε) r = (k log(n/k) / ε)

Previous lower bounds (k log(n/k))Lower bounds for randomized constant probability

Page 6: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Comparison to Deterministic Schemes

• We get r = O~(k/ε1/2) randomized upper bound for p = 1

• We show (k log (n/k) /ε) for p = 1 for deterministic schemes

• So randomized easier than deterministic

Page 7: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Our Sparse-Output Results

• Output a vector y from Ax so that

|x-y|p · (1+ε) |x-xtop k|p

• Sometimes want y to be k-sparser = (k/εp)

• Both results tight up to logarithmic factors

• Recall that for non-sparse output r = £~(k/εp/2)

Page 8: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Talk Outline

1. O~(k / ε1/2) upper bound for p = 1

2. Lower bounds

Page 9: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Simplifications

• Want O~(k/ε1/2) for p = 1

• Replace k with 1– Sample 1/k fraction of coordinates– Solve the problem for k = 1 on the sample– Repeat O~(k) times independently– Combine the solutions found

ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n

ε/k, 1/n, …, 1/n

Page 10: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

k = 1

• Assume |x-xtop|1 = 1, and xtop = ε

• First attempt– Use CountMin [CM]– Randomly partition coordinates into B buckets, maintain

sum in each bucket

Σi s.t. h(i) = 2 xi

• The expected l1-mass of “noise” in a bucket is 1/B

• If B = £(1/ε), most buckets have count < ε/2, but bucket that contains xtop has count > ε/2

• Repeat O(log n) times

Page 11: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Second Attempt• But we wanted O~(1/ε1/2) measurements

• Error in a bucket is 1/B, need B ¼ 1/ε

• What about CountSketch? [CCF-C]– Give each coordinate i a random ¾(i) 2 {-1,1}– Randomly partition coordinates into B buckets,

maintain Σi s.t. h(i) = j ¾(i)¢xi in j-th bucket

– Bucket error is (Σi top xi2 / B)1/2

– Is this better?

Σi s.t. h(i) = 2 ¾(i)¢xi

Page 12: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

CountSketch

• Bucket error Err = (Σ i top xi2 / B)1/2

• All |xi| · ε and |x-xtop|1 = 1

• Σi top xi2 · 1/ ε ¢ ε2 · ε

• So Err · (ε/B)1/2 which needs to be at most ε

• Solving, B ¸ 1/ ε

• CountSketch isn’t better than CountMin

Page 13: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Main Idea

• We insist on using CountSketch with B = 1/ε1/2

• Suppose Err = (Σ i top xi2 / B)1/2 = ε

• This means Σ i top xi2 = ε3/2

• Forget about xtop !

• Let’s make up the mass another way

Page 14: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Main Idea• We have: Σ i top xi

2 = ε3/2

• Intuition: suppose all xi, i top, are the same or 0

• Then: (# non-zero)*value = 1 (# non-zero)*value2 = ε3/2

• Hence, value = ε3/2 and # non-zero = 1/ε3/2

• Sample ε-fraction of coordinates uniformly at random!– value = ε3/2 and # non-zero sampled = 1/ε1/2, so l1-contribution = ε– Find all non-zeros with O~(1/ε1/2) measurements

Page 15: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

General Setting• Σ i top xi

2 = ε3/2

• Sj = {i | 1/4j < xi2 · 1/4j-1}

• Σ i top xi2 = ε3/2 implies there is a j for which |Sj|/4j = ~(ε3/2)

ε3/2 , …, ε3/2

4ε3/2 , …, 4ε3/2

16ε3/2 , …, 16ε3/2

…ε3/4

Page 16: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

General Setting• If |Sj| < 1/ε1/2, then 1/4j > ε2, so 1/2j > ε, can’t happen

• Else, sample at rate 1/(|Sj| ε1/2) to get 1/ε1/2 elements of |Sj|

• l1-mass of |Sj| in sample is > ε

• Can we find the sampled elements of Sj? Use Σ i top xi2 = ε3/2

• The l22 of the sample is about ε3/2 ¢ 1/(|Sj| ε1/2) = ε/|Sj|

• Using CountSketch with 1/ε1/2 buckets:

Bucket error = sqrt{ε1/2¢ε3/2 ¢1/(|Sj| ε1/2)}

= sqrt{ε3/2/|Sj|} < 1/2j since |Sj|/4j > ε3/2

Page 17: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Algorithm Wrapup

• Sub-sample O(log 1/ε) times in powers of 2

• In each level of sub-sampling maintain CountSketch with O~(1/ε1/2) buckets

• Find as many heavy coordinates as you can!

• Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling

• Wouldn’t work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)

Page 18: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Talk Outline

1. O~(k / ε1/2) upper bound for p = 1

2. Lower bounds

Page 19: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Our Results

• General results:– ~(k / ε1/2) for p = 1– (k log(n/k) / ε) for p = 2

• Sparse output:– ~(k/ε) for p = 1– ~(k/ε2) for p = 2

• Deterministic:– (k log(n/k) / ε) for p = 1

Page 20: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Simultaneous Communication Complexity

Alice Bob

x

• Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability

• Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs

• Parties share randomness

What is f(x,y)?y

MA(x) MB(y)

Page 21: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

• Shared randomness decides matrix A

• Alice sends Ax to referee

• Bob sends Ay to referee

• Referee computes A(x+y), uses compressed sensing recovery algorithm

• If output of algorithm solves f(x,y), then

# rows of A * # bits per measurement > CC(f)

Reduction to Compressed Sensing

Page 22: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

A Unified View

• General results: Direct-Sum Gap-l1– ~(k / ε1/2) for p = 1 – ~(k / ε) for p = 2

• Sparse output: Indexing– ~(k/ε) for p = 1– ~(k/ε2) for p = 2

• Deterministic: Equality– (k log(n/k) / ε) for p = 1

Tighter log factors achievable by looking at Gaussian channels

Page 23: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

General Results: k = 1, p = 1

• Alice and Bob have x, y, respectively, in Rm

• There is a unique i* for which (x+y)i* = d

For all j i*, (x+y)j 2 {0, c, -c}, where |c| < |d|

• Finding i* requires (m/(d/c)2) communication [SS, BJKS]

• m = 1/ε3/2, c = ε3/2 , d = ε• Need (1/ε1/2) communication

Page 24: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

General Results: k = 1, p = 1

• But the compressed sensing algorithm doesn’t need to find i*

• If not then it needs to transmit a lot of information about the tail– Tail a random low-weight vector in {0, ε3/2, - ε3/2}1/ε3

– Uses distributional lower bound and RS codes

• Send a vector y within 1-ε of tail in l1-norm

• Needs 1/ε1/2 communication

Page 25: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

General Results: k = 1, p = 2

• Same argument, different parameters

• (1/ε) communication

• What about general k?

Page 26: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Handling General k

• Bounded Round Direct Sum Theorem [BR] (with slight modification) given k copies of a function f, with input pairs independently drawn from ¹, solving a 2/3 fraction needs communication (k¢CC¹ (f))

ε3/2 , …, ε3/2ε1/2

ε1/2

ε1/2

ε3/2 , …, ε3/2

ε3/2 , …, ε3/2

} k

Instance for p = 1

Page 27: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Handling General k

• CC = (k/ε1/2) for p = 1

• CC = (k/ε) for p = 2

• What is implied about compressed sensing?

Page 28: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Rounding Matrices [DIPW]

• A is a matrix of real numbers

• Can assume orthonormal rows

• Round the entries of A to O(log n) bits, obtaining matrix A’

• Careful– A’x = A(x+s) for “small” s– But s depends on A, no guarantee recovery works– Can be fixed by looking at A(x+s+u) for random u

Page 29: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Lower Bounds for Compressed Sensing

• # rows of A * # bits per measurement > CC(f)

• By rounding, # bits per measurement = O(log n)

• In our hard instances, universe size = poly(k/ε)

• So # rows of A * O(log (k/ε)) > CC(f)

• # rows of A = ~(k/ε1/2) for p = 1• # rows of A = ~(k/ε) for p = 2

Page 30: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Sparse-Output Results

Sparse output: Indexing

– ~(k/ε) for p = 1

– ~(k/ε2) for p = 2

Page 31: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Sparse Output Results - Indexing

x 2 {0,1}n i 2 {1, 2, …, n}

What is xi?

CC(Indexing) = (n)

Page 32: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

(1/ε) Bound for k=1, p = 1

x 2 {- ε, ε}1/ε y = ei

• Consider x+y

• If output is required to be 1-sparse must place mass on the i-th coordinate

• Mass must be 1+ε if xi = ε, otherwise 1-ε

Generalizes to k > 1 to give ~(k/ε)

Generalizes to p = 2 to give ~(k/ε2)

Generalizes to k > 1 to give ~(k/ε)

Generalizes to p = 2 to give ~(k/ε2)

Page 33: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Deterministic Results

Deterministic: Equality– (k log(n/k) / ε) for p = 1

Page 34: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Deterministic Results - Equality

x 2 {0,1}n

Is x = y?

Deterministic CC(Equality) = (n)

y 2 {0,1}n

Page 35: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

(k log(n/k) / ε) for p = 1

Choose log n signals x1, …, xlog n, each with k/ε values equal to ε/k

x = Σi=1log n 10i xi

Choose log n signals y1, …, ylog n, each with k/ε values equal to ε/k

y = Σi=1log n 10i yi

Consider x-yCompressed sensing output is 0n iff x = y

Page 36: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

General Results – Gaussian Channels (k = 1, p = 2)

• Alice has a signal x =ε1/2 ei for random i 2 [n]

• Alice transmits x over a noisy channel with independent N(0, 1/n) noise on each coordinate

• Consider any row vector a of A

• Channel output = <a,x> + <a,y>, where <a,y> is N(0, |a|22/n)

• Ei[<a,x>2] = ε |a|22/n

• Shannon-Hartley Theorem: I(i; <a,x>+<a,y>) = I(<a,x>; <a,x>+<a,y>) · ½ log(1+ ε) = O(ε)

Page 37: 1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Summary of Results

• General results– £~(k/εp/2)

• Sparse output– £~(k/εp)

• Deterministic– £(k log(n/k) / ε) for p = 1