Top Banner
Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)
13

Sublinear Algorithms via Precision Sampling

Jan 23, 2016

Download

Documents

Elmo

Sublinear Algorithms via Precision Sampling. Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU). Goal. Compute the number of Dacians in the empire. Estimate S=a 1 +a 2 +…a n where a i  [0,1]. sublinearly…. Sampling. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sublinear Algorithms via Precision Sampling

Sublinear Algorithms via Precision Sampling

Alexandr Andoni (Microsoft Research)

joint work with:

Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

Page 2: Sublinear Algorithms via Precision Sampling

Goal

Compute the number of Dacians in the empire

Estimate S=a1+a2+…an where ai[0,1]

sublinearly…

Page 3: Sublinear Algorithms via Precision Sampling

Sampling Send accountants to a subset J of provinces

Estimator: S =∑jJ aj * n/J

Chebyshev bound: with 90% success probability0.5*S – O(n/m) < S < 2*S + O(n/m)

For constant additive error, need m~n

Page 4: Sublinear Algorithms via Precision Sampling

Send accountants to each province, but require only approximate counts Estimate ai, up to some pre-selected precision ui: |ai – ai|

< ui

Challenge: achieve good trade-off between quality of approximation to S total cost of estimating each a i to precision ui

Precision Sampling Framework

Page 5: Sublinear Algorithms via Precision Sampling

Formalization

Sum Estimator Adversary

1. fix a1,a2,…an1. fix precisions ui

2. fix a1,a2,…an s.t. |ai – ai| < ui

3. given a1,a2,…an, output S s.t.|∑ai – S| < 1.

What is cost? Here, average cost = 1/n * ∑ 1/ui to achieve precision ui, use 1/ui “resources”: e.g., if ai is itself a sum

ai=∑jaij computed by subsampling, then one needs Θ(1/ui) samples For example, can choose all ui=1/n

Average cost ≈ n This is best possible, if estimator S = ∑a i

Page 6: Sublinear Algorithms via Precision Sampling

Precision Sampling Lemma Goal: estimate ∑ai from ai satisfying |ai-ai|<ui. Precision Sampling Lemma: can get, with 90%

success: O(1) additive error and 1.5 multiplicative error:

S – O(1) < SL < 1.5*S + O(1) with average cost equal to O(log n)

Example: distinguish Σai=5 vs Σai=0 Consider two extreme cases:

if five ai=1: sample all, but need only crude approx (ui=1/10)

if all ai=5/n: only few with good approx ui=1/n, and the rest with ui=1

ε 1+εS – ε < S < (1+ ε)S + ε

O(ε-3 log n)

Page 7: Sublinear Algorithms via Precision Sampling

Precision Sampling Algorithm Precision Sampling Lemma: can get, with 90%

success: O(1) additive error and 1.5 multiplicative error:

S – O(1) < SL < 1.5*S + O(1) with average cost equal to O(log n)

Algorithm: Choose each ui[0,1] i.i.d. Estimator: S = count number of i‘s s.t. ai / ui > 6

(modulo a normalization constant) Proof of correctness:

we use only ai which are (1+ε)-approximation to ai

E[S] ≈ ∑ Pr[ai / ui > 6] = ∑ ai/6. E[1/u] = O(log n) w.h.p.

function of [ai /ui - 4/ε]+ and ui’sconcrete distrib. = minimum of O(ε-3) u.r.v.

O(ε-3 log n)

ε 1+εS – ε < S < (1+ ε)S + ε

Page 8: Sublinear Algorithms via Precision Sampling

Why? Save time:

Problem: computing edit distance between two strings new algorithm that obtains (log n)1/ε approximation in

n1+O(ε) time via efficient property-testing algorithm that uses Precision

Sampling More details: see the talk by Robi on Friday!

Save space: Problem: compute norms/frequency moments in

streams gives a simple and unified approach to compute all lp, Fk

moments, and other goodies More details: now

Page 9: Sublinear Algorithms via Precision Sampling

Streaming frequencies Setup:

1+ε estimate frequencies in small space Let xi = frequency of ethnicity i kth moment: Σxi

k

k[0,2]: space O(1/ε2)

[AMS’96,I’00, GC07, Li08, NW10, KNW10, KNPW11]

k>2: space O(n1-2/k)[AMS’96,SS’02,BYJKS’02,CKS’03,IW’05,BGKS’06,BO10]

Sometimes frequencies xi are negative: If measuring traffic difference (delay, etc) We want linear “dim reduction” L:RnRm

m<<n

Ethnicity Frequency

Dacians 358

Galois 12

Barbarians 2988

Page 10: Sublinear Algorithms via Precision Sampling

Norm Estimation via Precision Sampling Idea:

Use PSL to compute the sum ||x||kk=∑ |xi|k

General approach 1. Pick ui’s according to PSL and let yi=xi/ui

1/k

2. Compute all yik up to additive approximation O(1)

Can be done by computing the heavy hitters of the vector y

3. Use PSL to compute the sum ||x||kk=∑ |xi|k

Space bound is controlled by the norm ||y||2

Since heavy hitters under l2 is the best we can do Note that ||y||2≤||x||2 * E[1/ui]

Page 11: Sublinear Algorithms via Precision Sampling

Streaming Fk moments Theorem: linear sketch for Fk with O(1)

approximation, O(1) update, and O(n1-2/k log n) space (in words).

Sketch: Pick random ui [0,1], si±1, and let yi = si * xi / ui

1/k

throw into one hash table H, size m=O(n1-2/k log n) cells

Update: on (i, a) H[h(i)] += si*a/ui

1/k

Estimator: Maxj[m] |H[j]|k

Randomness: O(1) independence suffices

x1 x2 x3 x4 x5 x6

y1

+y3

y4 y2

+y5+y6

x=

H=

Page 12: Sublinear Algorithms via Precision Sampling

More Streaming Algorithms Other streaming algorithms:

Algorithm for all k-moments, including k≤2 For k>2, improves existing space bounds [AMS96, IW05,

BGKS06, BO10] For k≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10,

KNW10, KNPW11]

Improved algorithm for mixed norms (lp of lk) [CM05, GBD08, JW09] space bounded by (Rademacher) p-type constant

Algorithm for lp-sampling problem [MW’10] This work extended to give tight bounds by [JST’11]

Connections: Inspired by the streaming algorithm of [IW05], but

simpler Turns out to be distant relative of Priority Sampling

[DLT’07]

Page 13: Sublinear Algorithms via Precision Sampling

Finale Other applications for Precision Sampling

framework ? Better algorithms for precision sampling ?

Best bound for average cost (for 1+ε approximation) Upper bound: O(1/ ε3 * log n) (tight for our algorithm) Lower bound: Ω(1/ ε2 * log n)

Bounds for other cost models? E.g., for 1/square root of precision, the bound is O(1 / ε3/2)

Other forms of “access” to ai’s ?