Optimal Space Lower Bounds for all Frequency
Moments
David Woodruff
Based on SODA ’04 paper
The Streaming Model [AMS96]
0113734 … Stream of elements a1, …, aq each in {1, …, m} Want to compute statistics on stream Elements arranged in adversarial order Algorithms given one pass over stream Goal: Minimum space algorithm
Frequency Moments
Notation q = stream size, m = universe size fi = # occurrences of item i
Why are frequency moments important?
F0 = # of Distinct elements F1 = q F2 = repeat rate
k-th moment
Applications
Estimating # distinct elts. w/ low space Estimate selectivity of queries to DB w/o expensive sort Routers gather # distinct destinations w/limited memory.
Estimating F2 estimates size of self-joins:
Bob x
Alice y
Bob z
Bob a
Alice b
Bob c
,
Alice b y
Bob a x
Bob a z
Bob c x
Bob c z
The Best Determininistic Algorithm
Trivial algorithm for Fk
Store/update fi for each item i, sum fi
k at end
Space = O(mlog q): m items i, log q bits to count f i
Negative Results [AMS96]: Compute Fk exactly => (m) space Any deterministic alg. outputs x with |Fk – x| < must use (m) space
What about randomized algorithms?
Randomized Approx Algs for Fk
Randomized alg. -approximates Fk if outputs x s.t. Pr[|Fk – x| < Fk ] > 2/3
Can -approximate F0 [BJKST02], F2 [AMS96], Fk [CK04], k > 2 in space:(big-Oh notation suppresses polylog(1/, m, q) factors)
Ideas: Hashing: O(1)-wise independence Sampling
Example: F0 [BJKST02]
Idea: For random function h:[m] -> [0,1] and distinct elts b1, b2, …, bF0
, expect mini h(bi) ¼ 1/F0
Algorithm: Choose 2-wise indep. hash function h: [m] -> [m3] Maintain t = (1/2) distinct smallest values h(bi) Let v be t-th smallest value Output tm3/v as estimate for F0
Success prob up to 1- => take median O(log 1/) copies Space: O((log 1/)/2)
Example: F2 [AMS99]
Algorithm: Choose 4-wise indep. hash function h:[m] -> {-1,1} Maintain Z = i in [m] fi ¢ h(i) Output Y = Z2 as estimate for F2
Correctness:
Chebyshev’s inequality => O(1/2) space
Previous Lower Bounds:
[AMS96] 8 k, –approximating Fk => (log m) space
[Bar-Yossef] -approximating F0 => (1/) space
[IW03] -approximating F0 => space if
Questions: Does the bound hold for k 0? Does it hold for F0 for smaller ?
Our First Result
Optimal Lower Bound: 8 k 1, any = (m-.5), -approximate Fk => (-2) bits of space.
F1 = q trivial in log q space
Fk trivial in O(m log q) space, so need = (m-.5)
Technique: Reduction from 2-party protocol for computing Hamming distance (x,y)
Use tools from communication complexity
Lower Bound Idea
x 2 {0,1}m
y 2 {0,1}m
Stream s(x) Stream s(y)
(1 § ) Fk algorithm A
(1 § ) Fk algorithm A
Internal state of A
• Compute (1 § ) Fk(s(x) ± s(y)) w.p. > 2/3 • Idea: If can decide f(x,y) w.p. > 2/3, space used by A at least randomized 1-way comm. Complexity of f
S
Alice Bob
Randomized 1-way comm. complexity
Boolean function f: X £ Y ! {0,1} Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) Only 1 message m sent: must be from Alice to Bob Communication cost = maxx,y Ecoins [|m|]
-error randomized 1-way communication complexity R(f), is cost of optimal protocol computing f with probability ¸ 1-
Ok, but how do we lower bound R(f)?
Shatter Coefficients [KNR] F = {f : X ! {0,1}} function family, f 2 F length-|X| bitstring
For S µ X, shatter coefficient SC(fS) of S :
|{f |S}f 2 F| = # distinct bitstrings when F restricted to S
SC(F, p) = maxS µ X, |S| = p SC(fS). If SC(fS) = 2|S|, S shattered
Treat f: X £ Y ! {0,1} as function family fX :
fX = { fx(y) : Y ! {0,1} | x 2 X }, where fx(y) = f(x,y)
Theorem [BJKS]: For every f: X £ Y ! {0,1}, every integer p, R1/3(f) = (log(SC(fX, p)))
Warmup: (1/) Lower Bound [Bar-Yossef]
Alice input x 2R {0,1}m, wt(x) = m/2 Bob input y 2R {0,1}m, wt(y) = m s(x), s(y) any streams w/char. vectors x, y PROMISE:
(1) wt(x Æ y) = 0 OR (2) wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1
F0(s(x) ± s(y)) = m/2 + m F0(s(x) ± s(y)) = m/2
R1/3(f) = (1/) [Bar-Yossef] (uses shatter coeffs) (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = () Hence, can decide f ! F0 alg. uses (1/) space Too easy! Can replace F0 alg. with a Sampler!
Our Reduction: Hamming Distance Decision Problem (HDDP)
Lower bound R1/3(f) via SC(fX, t), but need a lemma
Set t = (1/2)
x 2 {0,1}t y 2 {0,1}t
Alice Bob
Promise Problem :
(x,y) · t/2 – (t1/2) (x,y) > t/2 f(x,y) = 0 OR f(x,y) = 1
Main Lemma
S µ{0,1}n
y= T= S-T
9 S µ {0,1}n with |S| = n s.t. exist 2(n) “good” sets T µ S s.t.
9 y 2 {0,1}n s.t 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 8 t 2 S – T, (y,t) > n/2
Lemma Resolves HDDP Complexity
Theorem: R1/3(f) = (t) = (-2). Proof:
Alice gets yT for random good set T applying main lemma with n = t.
Bob gets random s 2 S Let f: {yT }T £ S ! {0,1}. Main Lemma =>SC(f) = 2(t)
[BJKS] => R1/3(f) = (t) = (-2)
Corollary: (1/2) space for randomized 2-party protocol to approximate (x,y) between inputs
First known lower bound in terms of !
Back to Frequency Moments
Use -approximator for Fk to solve HDDP
y 2 {0,1}t s 2 S µ {0,1}t
Fk Alg Fk AlgState
ay as
i-th universe element included exactlyonce in stream ay iff yi = 1 (as same)
Solving HDDP with Fk
Alice/Bob compute -approx to Fk(ay ± as)
Fk(ay ± as) = 2k wt(y Æ s) + 1k (y,s) For k 1,
Conclusion: -approximating Fk(ay ± as) decides HDDP, so space for Fk is (t) = (-2)
Alice also transmits wt(y) in log m space.
Back to the Main Lemma
Recall: show 9 S µ {0,1}n with |S| = n s.t. 2(n) “good” sets T µ S s.t:
9 y 2 {0,1}n s.t 1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0
2. 8 t 2 S – T, (y,t) > n/2
Probabilistic Method Choose n random elts in {0,1}n for S Show arbitrary T µ S of size n/2 is good with
probability > 2-zn for constant z < 1. Expected # good T is 2(n)
So exists S with 2(n) good T
Proving the Main Lemma
T ={t1, …, tn/2} µ S arbitrary Let y be majority codeword of T What is probability p that both:
1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0
2. 8 t 2 S – T, (y,t) > n/2
Put x = Pr[8 t 2 T, (y,t) · n/2 – cn1/2] Put y = Pr[8 t 2 S-T, (y,t) > n/2] = 2-n/2
Independence => p = xy = x2-n/2
The Matrix Problem
Wlog, assume y = 1n (recall y is majority word) Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn1/2] Equivalent to matrix problem:
t1 ->t2 -> …tn/2 ->
101001000101111001100101011100011110001110111101010101101010111011100011
For random n/2 x n binary matrix M, each column majority 1, what is probablity each row ¸ n/2 + cn1/2 1s?
A First Attempt
Set family A µ 2^{0,1}n monotone increasing if
S1 2 A, S1 µ S2 => S2 2 A For uniform distribution on S µ {0,1}n, and A, B monotone
increasing families, [Kleitman]Pr[A Å B] ¸ Pr[A] ¢ Pr[B]
First try: Let R be event M ¸ n/2 + cn1/2 1s in each row, C event M
majority 1 in each column Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R | C] = Pr[R Å C]/Pr[C] M characteristic vector of subset of [.5n2] => R,C monotone
increasing => Pr[R Å C]/Pr[C] ¸ Pr[R]Pr[C]/Pr[C] = Pr[R] < 2-n/2
But we need > 2-zn/2 for constant z < 1, so this fails…
A Second Attempt
Second Try: R1: M ¸ n/2 + cn1/2 1s in first m rows R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows C: M majority 1 in each column
Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R1 Å R2 | C] = Pr[R1 Å R2 Å C]/Pr[C] R1, R2, C monotone increasing => Pr[R1 Å R2 Å C]/Pr[C] ¸ Pr[R1 Å C]Pr[R2]/Pr[C] = Pr[R1 | C] Pr[R2] Want this at least 2-zn/2 for z < 1 Pr[ Xi > n/2 + cn1/2] > ½ - c (2/pi)1/2 [Stirling] Independence => Pr[R2] > (½ - c(2/pi)1/2)n/2 - m
Remains to show Pr[R1 | C] large.
Computing Pr[R1 | C]
Pr[R1 | C] = Pr[M ¸ n/2 + cn1/2 1s in 1st m rows | C]
Show Pr[R1 | C] > 2-z’m for certain constant z’ < 1
Ingredients: Expect to get n/2 + (n1/2) 1s in each of 1st m rows | C Use negative correlation of entries in a given row => show n/2 + (n1/2) 1s in a given row w/good probability
for small enough c A simple worst-case conditioning argument on these 1st
m rows shows they all have ¸ n/2 + cn1/2 1s
Completing the Proof
Recall: what is probability p = xy, where
1. x = Pr[ 8 t 2 T, (y, t) · n/2 – cn1/2]
2. y = Pr[ 8 t 2 S – T, (y,t) > n/2] = 2-n/2
3. R1: M ¸ n/2 + cn1/2 1s in first m rows
4. R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows
5. C: M majority 1 in each column x ¸ Pr[R1 | C] Pr[R2] ¸ 2-z’m (½ - c(2/pi)1/2)n/2 – m
Analysis shows z’ small so this ¸ 2-z’’n/2, z’’ < 1 Hence p = xy ¸ 2-(z’’+1)n/2 Hence expected # good sets 2n-O(log n)p = 2(n) So exists S with 2(n) good T
Bipartite Graphs
Matrix Problem Bipartite Graph Counting Problem:
How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn1/2 and each right vertex degree > n/2?
… …
Our Result on # of Bipartite Graphs
Bipartite graph count: Argument shows at least 2n^2/2 – zn/2 –n such bipartite
graphs for constant z < 1.
Main lemma shows # bipartite graphs on n + n vertices w/each vertex degree > n/2 is > 2n^2-zn-n
Can replace > with <
Previous knowncount: 2n^2-2n [MW – personal comm.] Follows easily from Kleitman inequality
Summary
Results:
Optimal Fk Lower Bound: 8 k 1 and any = (m-1/2), any -approximator for Fk must use (-2) bits of space.
Communication Lower Bound of (-2) for one-way communication complexity of (, )-approximating (x, y)
Bipartite Graph Count: # bipartite graphs on n + n vertices w/each vertex degree > n/2 at least 2n^2-zn-n for constant z < 1.