Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar- Ilan IBM
Mar 26, 2015
Fast Moment Estimation in Data Streams in Optimal Space
Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff
Harvard MIT Bar-Ilan IBM
lp-estimation: Problem Statement
• Model• x = (x1, x2, …, xn) starts off as 0n
• Stream of m updates (j1, v1), …, (jm, vm)
• Update (j, v) causes change xj = xj + v
• v 2 {-M, -M+1, …, M}
• Problem• Output lp = j=1
n |xj|p = |x|p• Want small space and fast update time• For simplicity: n, m, M are polynomially related
Some Bad News
• Alon, Matias, and Szegedy– No sublinear space algorithms unless
• Approximation (allow output to be (1±ε) lp)
• Randomization (allow 1% failure probability)
• New goal– Output (1±ε) lp with probability 99%
Some More Bad News
• Estimating lp for p > 2 in a stream requires n1-2/p space [AMS, IW, SS]
• We focus on the “feasible” regime, when p 2 (0,2)
p = 0 and p = 2 well-understood– p = 0 is number of distinct elements– p = 2 is Euclidean norm
Applications for p 2 [1,2)
lp-norm for p 2 [1,2) less sensitive to outliers
– Nearest neighbor– Regression– Subspace approximation
Query point a 2 Rd Database points
b1
b2
…bn
Want argminj |a-bj|p
Less likely to be spoiled by noise in each coordinate
Can quickly replace d-dimensional points with small sketches
Can quickly replace d-dimensional points with small sketches
Applications for p 2 (0,1)Best entropy estimation in a stream [HNO]
– Empirical entropy = j qj log(1/qj), where qj = |xj|/|x|1
– Estimates |x|p for O(log 1/ε) different p 2 (0,1)
– Interpolates a polynomial through these values to estimate entropy
– Entropy used for detecting DoS attacks, etc.
Previous Work for p 2 (0,2)• Lot of players
– FKSV, I, KNW, GC, NW, AOK
• Tradeoffs possible
– Can get optimal ε-2 log n bits of space, but then the update time is at least 1/ε2
– BIG difference in practice between ε-2 update time
and O(1) (e.g., AMS vs. TZ for p = 2)
– No way to get close to optimal space with less than poly(1/ε) update time
Our Results
• For every p 2 (0,2)– estimate lp with optimal ε-2 log n bits of space– log2 1/ε log log 1/ε update time– exponential improvement over previous
update time
• For entropy– Exponential improvement over previous
update time (polylog 1/ε versus poly 1/ε)
Our Algorithm
Split coordinates into “head” and “tail”
j 2 “head” if |xj|p ¸ ε2 |x|pp
j 2 “tail” if |xj|p < ε2 |x|pp
Estimate |x|pp = |xhead|p
p + |xtail|pp
separately
Two completely different procedures
Outline
• Estimating |xhead|pp
• Estimating |xtail|pp
• Putting it all together
Simplifications
We can assume we know the set of “head” coordinates, as well as their signs
• Can be found using known algorithms [CountSketch]
Challenge
• Need j in head |xj|p
Estimating |xhead|p p
xj
log 1/εrows
1/ε2 columns
Hash each coordinate to a unique column in each row
We DO NOT- maintain sum of values in each cell
We DO NOT- maintain the inner product of values in a cell with a random sign vector
Key idea: for each cell c, if S is the set of items hashed to c, let
V(c)j in S xj ¢ exp(2¼i h(j)/r )
r is a parameter, i = sqrt(-1)
Key idea: for each cell c, if S is the set of items hashed to c, let
V(c)j in S xj ¢ exp(2¼i h(j)/r )
r is a parameter, i = sqrt(-1)
Our Algorithm
To estimate |xhead|pp
– For each j in the head, find an arbitrary cell c(j) containing j and no other head coordinates
– Compute yj = sign(xj) ¢ exp(-2¼i h(j)/r) ¢ V(c)
• Recall V(c)j in S xj ¢ exp(2¼i h(j)/r )
– Expected value of yj is |xj|
– What can we say about yjp?
– What does it mean?
Our Algorithm
• Recall yj = sign(xj) ¢ exp(-2¼i h(j)/r) ¢ V(c)
• What is yj1/2 if yj = -4?
• -4 = 4 exp(¼ i) • (-4)1/2 = 2 exp(¼ i / 2) = 2i or 2 exp(- ¼ i / 2) = -2i
• By yjp we mean |yj|p exp(i p arg(z)),
where arg(z) 2 (-¼, ¼] is the angle of yj in the complex plane
Our Algorithm
Wishful thinking• Estimator = j in head yj
p
• Intuitively, when p = 1, since E[yj] = |yj| we have an unbiased estimator
• For general p, this may be complex, so how about Estimator = Re [j in head yj
p]?• Almost correct, but we want optimal space, and
we’re ignoring most of the cells• Better:
yj = Meancells c isolating j sign(xj) ¢ exp(-2¼i h(j)/r)¢V(c)
Analysis
• Why did we use roots of unity?
• Estimator is real part of j in head yjp
• j in head yjp = j in head |yj|p ¢ (1+zj)p for zj = (yj - |yj|)/|yj|
• Can apply Generalized Binomial theorem
• E[|yj|p (1+zj)p] = |yj|p ¢ k=0
1 {p choose k} E[zjk]
= |yj|p + small
since E[zjk] = 0 if 0 < k < r
Generalized binomial coefficient {p choose k} = p ¢ (p-1) (p-k+1)/k! = O(1/k1+p)
Generalized binomial coefficient {p choose k} = p ¢ (p-1) (p-k+1)/k! = O(1/k1+p)
Intuitively variance is small because head coordinates don’t collide
Intuitively variance is small because head coordinates don’t collide
Outline
• Estimating |xhead|pp
• Estimating |xtail|pp
• Putting it all together
Our Algorithm
x(b)
Estimating |xtail|pp
xj
In each bucket b maintain an unbiased estimator of the p-th power of the p-norm |x(b)|p
p in the bucket [Li]If Z1, …, Zs are p-stable, for any vector a = (a1, …, as),
j=1s Zj¢aj » |a|p Z, for Z also p-stable
Add up estimators in all buckets not containing a head coordinate (variance is small)
Outline
• Estimating |xhead|pp
• Estimating |xtail|pp
• Putting it all together
Complexity
Bag of tricks
Example• For optimal space, in buckets in the light estimator, we prove
1/εp – wise independent p-stable variables suffice– Rewrite Li’s estimator so that [KNW] can be
applied• Need to evaluate a degree- 1/εp polynomial per update• Instead: batch 1/εp updates together and do fast
multipoint evaluation– Can be deamortized– Use that different buckets are pairwise independent
Complexity
Example # 2• Finding head coordinates requires ε-2 log2 n space
• Reduce the universe size to poly 1/ε by hashing
• Now requires ε-2 log n log 1/ε space
• Replace ε with ε log1/2 1/ε
• Head estimator okay, but slightly adjust light estimator
Conclusion
• For every p 2 (0,2)– estimate lp with optimal ε-2 log n bits of space– log2 1/ε log log 1/ε update time– exponential improvement over previous
update time
• For entropy– Exponential improvement over previous
update time (polylog 1/ε versus poly 1/ε)