Top Banner
Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar- Ilan IBM
22

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Mar 26, 2015

Download

Documents

Jada Sanders
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Fast Moment Estimation in Data Streams in Optimal Space

Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff

Harvard MIT Bar-Ilan IBM

Page 2: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

lp-estimation: Problem Statement

• Model• x = (x1, x2, …, xn) starts off as 0n

• Stream of m updates (j1, v1), …, (jm, vm)

• Update (j, v) causes change xj = xj + v

• v 2 {-M, -M+1, …, M}

• Problem• Output lp = j=1

n |xj|p = |x|p• Want small space and fast update time• For simplicity: n, m, M are polynomially related

Page 3: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Some Bad News

• Alon, Matias, and Szegedy– No sublinear space algorithms unless

• Approximation (allow output to be (1±ε) lp)

• Randomization (allow 1% failure probability)

• New goal– Output (1±ε) lp with probability 99%

Page 4: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Some More Bad News

• Estimating lp for p > 2 in a stream requires n1-2/p space [AMS, IW, SS]

• We focus on the “feasible” regime, when p 2 (0,2)

p = 0 and p = 2 well-understood– p = 0 is number of distinct elements– p = 2 is Euclidean norm

Page 5: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Applications for p 2 [1,2)

lp-norm for p 2 [1,2) less sensitive to outliers

– Nearest neighbor– Regression– Subspace approximation

Query point a 2 Rd Database points

b1

b2

…bn

Want argminj |a-bj|p

Less likely to be spoiled by noise in each coordinate

Can quickly replace d-dimensional points with small sketches

Can quickly replace d-dimensional points with small sketches

Page 6: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Applications for p 2 (0,1)Best entropy estimation in a stream [HNO]

– Empirical entropy = j qj log(1/qj), where qj = |xj|/|x|1

– Estimates |x|p for O(log 1/ε) different p 2 (0,1)

– Interpolates a polynomial through these values to estimate entropy

– Entropy used for detecting DoS attacks, etc.

Page 7: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Previous Work for p 2 (0,2)• Lot of players

– FKSV, I, KNW, GC, NW, AOK

• Tradeoffs possible

– Can get optimal ε-2 log n bits of space, but then the update time is at least 1/ε2

– BIG difference in practice between ε-2 update time

and O(1) (e.g., AMS vs. TZ for p = 2)

– No way to get close to optimal space with less than poly(1/ε) update time

Page 8: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Our Results

• For every p 2 (0,2)– estimate lp with optimal ε-2 log n bits of space– log2 1/ε log log 1/ε update time– exponential improvement over previous

update time

• For entropy– Exponential improvement over previous

update time (polylog 1/ε versus poly 1/ε)

Page 9: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Our Algorithm

Split coordinates into “head” and “tail”

j 2 “head” if |xj|p ¸ ε2 |x|pp

j 2 “tail” if |xj|p < ε2 |x|pp

Estimate |x|pp = |xhead|p

p + |xtail|pp

separately

Two completely different procedures

Page 10: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Outline

• Estimating |xhead|pp

• Estimating |xtail|pp

• Putting it all together

Page 11: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Simplifications

We can assume we know the set of “head” coordinates, as well as their signs

• Can be found using known algorithms [CountSketch]

Challenge

• Need j in head |xj|p

Page 12: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Estimating |xhead|p p

xj

log 1/εrows

1/ε2 columns

Hash each coordinate to a unique column in each row

We DO NOT- maintain sum of values in each cell

We DO NOT- maintain the inner product of values in a cell with a random sign vector

Key idea: for each cell c, if S is the set of items hashed to c, let

V(c)j in S xj ¢ exp(2¼i h(j)/r )

r is a parameter, i = sqrt(-1)

Key idea: for each cell c, if S is the set of items hashed to c, let

V(c)j in S xj ¢ exp(2¼i h(j)/r )

r is a parameter, i = sqrt(-1)

Page 13: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Our Algorithm

To estimate |xhead|pp

– For each j in the head, find an arbitrary cell c(j) containing j and no other head coordinates

– Compute yj = sign(xj) ¢ exp(-2¼i h(j)/r) ¢ V(c)

• Recall V(c)j in S xj ¢ exp(2¼i h(j)/r )

– Expected value of yj is |xj|

– What can we say about yjp?

– What does it mean?

Page 14: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Our Algorithm

• Recall yj = sign(xj) ¢ exp(-2¼i h(j)/r) ¢ V(c)

• What is yj1/2 if yj = -4?

• -4 = 4 exp(¼ i) • (-4)1/2 = 2 exp(¼ i / 2) = 2i or 2 exp(- ¼ i / 2) = -2i

• By yjp we mean |yj|p exp(i p arg(z)),

where arg(z) 2 (-¼, ¼] is the angle of yj in the complex plane

Page 15: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Our Algorithm

Wishful thinking• Estimator = j in head yj

p

• Intuitively, when p = 1, since E[yj] = |yj| we have an unbiased estimator

• For general p, this may be complex, so how about Estimator = Re [j in head yj

p]?• Almost correct, but we want optimal space, and

we’re ignoring most of the cells• Better:

yj = Meancells c isolating j sign(xj) ¢ exp(-2¼i h(j)/r)¢V(c)

Page 16: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Analysis

• Why did we use roots of unity?

• Estimator is real part of j in head yjp

• j in head yjp = j in head |yj|p ¢ (1+zj)p for zj = (yj - |yj|)/|yj|

• Can apply Generalized Binomial theorem

• E[|yj|p (1+zj)p] = |yj|p ¢ k=0

1 {p choose k} E[zjk]

= |yj|p + small

since E[zjk] = 0 if 0 < k < r

Generalized binomial coefficient {p choose k} = p ¢ (p-1) (p-k+1)/k! = O(1/k1+p)

Generalized binomial coefficient {p choose k} = p ¢ (p-1) (p-k+1)/k! = O(1/k1+p)

Intuitively variance is small because head coordinates don’t collide

Intuitively variance is small because head coordinates don’t collide

Page 17: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Outline

• Estimating |xhead|pp

• Estimating |xtail|pp

• Putting it all together

Page 18: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Our Algorithm

x(b)

Estimating |xtail|pp

xj

In each bucket b maintain an unbiased estimator of the p-th power of the p-norm |x(b)|p

p in the bucket [Li]If Z1, …, Zs are p-stable, for any vector a = (a1, …, as),

j=1s Zj¢aj » |a|p Z, for Z also p-stable

Add up estimators in all buckets not containing a head coordinate (variance is small)

Page 19: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Outline

• Estimating |xhead|pp

• Estimating |xtail|pp

• Putting it all together

Page 20: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Complexity

Bag of tricks

Example• For optimal space, in buckets in the light estimator, we prove

1/εp – wise independent p-stable variables suffice– Rewrite Li’s estimator so that [KNW] can be

applied• Need to evaluate a degree- 1/εp polynomial per update• Instead: batch 1/εp updates together and do fast

multipoint evaluation– Can be deamortized– Use that different buckets are pairwise independent

Page 21: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Complexity

Example # 2• Finding head coordinates requires ε-2 log2 n space

• Reduce the universe size to poly 1/ε by hashing

• Now requires ε-2 log n log 1/ε space

• Replace ε with ε log1/2 1/ε

• Head estimator okay, but slightly adjust light estimator

Page 22: Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Conclusion

• For every p 2 (0,2)– estimate lp with optimal ε-2 log n bits of space– log2 1/ε log log 1/ε update time– exponential improvement over previous

update time

• For entropy– Exponential improvement over previous

update time (polylog 1/ε versus poly 1/ε)