Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA ’04 paper
Mar 27, 2015
Optimal Space Lower Bounds for all Frequency
Moments
David Woodruff
Based on SODA ’04 paper
The Streaming Model [AMS96]
0113734 … Stream of elements a1, …, aq each in {1, …, m} Want to compute statistics on stream Elements arranged in adversarial order Algorithms given one pass over stream Goal: Minimum space algorithm
Frequency Moments
Notation q = stream size, m = universe size fi = # occurrences of item i
Why are frequency moments important?
F0 = # of Distinct elements F1 = q F2 = repeat rate
k-th moment
Applications
Estimating # distinct elts. w/ low space Estimate selectivity of queries to DB w/o expensive sort Routers gather # distinct destinations w/limited memory.
Estimating F2 estimates size of self-joins:
Bob x
Alice y
Bob z
Bob a
Alice b
Bob c
,
Alice b y
Bob a x
Bob a z
Bob c x
Bob c z
The Best Determininistic Algorithm
Trivial algorithm for Fk
Store/update fi for each item i, sum fi
k at end
Space = O(mlog q): m items i, log q bits to count f i
Negative Results [AMS96]: Compute Fk exactly => (m) space Any deterministic alg. outputs x with |Fk – x| < must use (m) space
What about randomized algorithms?
Randomized Approx Algs for Fk
Randomized alg. -approximates Fk if outputs x s.t. Pr[|Fk – x| < Fk ] > 2/3
Can -approximate F0 [BJKST02], F2 [AMS96], Fk [CK04], k > 2 in space:(big-Oh notation suppresses polylog(1/, m, q) factors)
Ideas: Hashing: O(1)-wise independence Sampling
Example: F0 [BJKST02]
Idea: For random function h:[m] -> [0,1] and distinct elts b1, b2, …, bF0
, expect mini h(bi) ¼ 1/F0
Algorithm: Choose 2-wise indep. hash function h: [m] -> [m3] Maintain t = (1/2) distinct smallest values h(bi) Let v be t-th smallest value Output tm3/v as estimate for F0
Success prob up to 1- => take median O(log 1/) copies Space: O((log 1/)/2)
Example: F2 [AMS99]
Algorithm: Choose 4-wise indep. hash function h:[m] -> {-1,1} Maintain Z = i in [m] fi ¢ h(i) Output Y = Z2 as estimate for F2
Correctness:
Chebyshev’s inequality => O(1/2) space
Previous Lower Bounds:
[AMS96] 8 k, –approximating Fk => (log m) space
[Bar-Yossef] -approximating F0 => (1/) space
[IW03] -approximating F0 => space if
Questions: Does the bound hold for k 0? Does it hold for F0 for smaller ?
Our First Result
Optimal Lower Bound: 8 k 1, any = (m-.5), -approximate Fk => (-2) bits of space.
F1 = q trivial in log q space
Fk trivial in O(m log q) space, so need = (m-.5)
Technique: Reduction from 2-party protocol for computing Hamming distance (x,y)
Use tools from communication complexity
Lower Bound Idea
x 2 {0,1}m
y 2 {0,1}m
Stream s(x) Stream s(y)
(1 § ) Fk algorithm A
(1 § ) Fk algorithm A
Internal state of A
• Compute (1 § ) Fk(s(x) ± s(y)) w.p. > 2/3 • Idea: If can decide f(x,y) w.p. > 2/3, space used by A at least randomized 1-way comm. Complexity of f
S
Alice Bob
Randomized 1-way comm. complexity
Boolean function f: X £ Y ! {0,1} Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) Only 1 message m sent: must be from Alice to Bob Communication cost = maxx,y Ecoins [|m|]
-error randomized 1-way communication complexity R(f), is cost of optimal protocol computing f with probability ¸ 1-
Ok, but how do we lower bound R(f)?
Shatter Coefficients [KNR] F = {f : X ! {0,1}} function family, f 2 F length-|X| bitstring
For S µ X, shatter coefficient SC(fS) of S :
|{f |S}f 2 F| = # distinct bitstrings when F restricted to S
SC(F, p) = maxS µ X, |S| = p SC(fS). If SC(fS) = 2|S|, S shattered
Treat f: X £ Y ! {0,1} as function family fX :
fX = { fx(y) : Y ! {0,1} | x 2 X }, where fx(y) = f(x,y)
Theorem [BJKS]: For every f: X £ Y ! {0,1}, every integer p, R1/3(f) = (log(SC(fX, p)))
Warmup: (1/) Lower Bound [Bar-Yossef]
Alice input x 2R {0,1}m, wt(x) = m/2 Bob input y 2R {0,1}m, wt(y) = m s(x), s(y) any streams w/char. vectors x, y PROMISE:
(1) wt(x Æ y) = 0 OR (2) wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1
F0(s(x) ± s(y)) = m/2 + m F0(s(x) ± s(y)) = m/2
R1/3(f) = (1/) [Bar-Yossef] (uses shatter coeffs) (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = () Hence, can decide f ! F0 alg. uses (1/) space Too easy! Can replace F0 alg. with a Sampler!
Our Reduction: Hamming Distance Decision Problem (HDDP)
Lower bound R1/3(f) via SC(fX, t), but need a lemma
Set t = (1/2)
x 2 {0,1}t y 2 {0,1}t
Alice Bob
Promise Problem :
(x,y) · t/2 – (t1/2) (x,y) > t/2 f(x,y) = 0 OR f(x,y) = 1
Main Lemma
S µ{0,1}n
y= T= S-T
9 S µ {0,1}n with |S| = n s.t. exist 2(n) “good” sets T µ S s.t.
9 y 2 {0,1}n s.t 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 8 t 2 S – T, (y,t) > n/2
Lemma Resolves HDDP Complexity
Theorem: R1/3(f) = (t) = (-2). Proof:
Alice gets yT for random good set T applying main lemma with n = t.
Bob gets random s 2 S Let f: {yT }T £ S ! {0,1}. Main Lemma =>SC(f) = 2(t)
[BJKS] => R1/3(f) = (t) = (-2)
Corollary: (1/2) space for randomized 2-party protocol to approximate (x,y) between inputs
First known lower bound in terms of !
Back to Frequency Moments
Use -approximator for Fk to solve HDDP
y 2 {0,1}t s 2 S µ {0,1}t
Fk Alg Fk AlgState
ay as
i-th universe element included exactlyonce in stream ay iff yi = 1 (as same)
Solving HDDP with Fk
Alice/Bob compute -approx to Fk(ay ± as)
Fk(ay ± as) = 2k wt(y Æ s) + 1k (y,s) For k 1,
Conclusion: -approximating Fk(ay ± as) decides HDDP, so space for Fk is (t) = (-2)
Alice also transmits wt(y) in log m space.
Back to the Main Lemma
Recall: show 9 S µ {0,1}n with |S| = n s.t. 2(n) “good” sets T µ S s.t:
9 y 2 {0,1}n s.t 1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0
2. 8 t 2 S – T, (y,t) > n/2
Probabilistic Method Choose n random elts in {0,1}n for S Show arbitrary T µ S of size n/2 is good with
probability > 2-zn for constant z < 1. Expected # good T is 2(n)
So exists S with 2(n) good T
Proving the Main Lemma
T ={t1, …, tn/2} µ S arbitrary Let y be majority codeword of T What is probability p that both:
1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0
2. 8 t 2 S – T, (y,t) > n/2
Put x = Pr[8 t 2 T, (y,t) · n/2 – cn1/2] Put y = Pr[8 t 2 S-T, (y,t) > n/2] = 2-n/2
Independence => p = xy = x2-n/2
The Matrix Problem
Wlog, assume y = 1n (recall y is majority word) Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn1/2] Equivalent to matrix problem:
t1 ->t2 -> …tn/2 ->
101001000101111001100101011100011110001110111101010101101010111011100011
For random n/2 x n binary matrix M, each column majority 1, what is probablity each row ¸ n/2 + cn1/2 1s?
A First Attempt
Set family A µ 2^{0,1}n monotone increasing if
S1 2 A, S1 µ S2 => S2 2 A For uniform distribution on S µ {0,1}n, and A, B monotone
increasing families, [Kleitman]Pr[A Å B] ¸ Pr[A] ¢ Pr[B]
First try: Let R be event M ¸ n/2 + cn1/2 1s in each row, C event M
majority 1 in each column Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R | C] = Pr[R Å C]/Pr[C] M characteristic vector of subset of [.5n2] => R,C monotone
increasing => Pr[R Å C]/Pr[C] ¸ Pr[R]Pr[C]/Pr[C] = Pr[R] < 2-n/2
But we need > 2-zn/2 for constant z < 1, so this fails…
A Second Attempt
Second Try: R1: M ¸ n/2 + cn1/2 1s in first m rows R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows C: M majority 1 in each column
Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R1 Å R2 | C] = Pr[R1 Å R2 Å C]/Pr[C] R1, R2, C monotone increasing => Pr[R1 Å R2 Å C]/Pr[C] ¸ Pr[R1 Å C]Pr[R2]/Pr[C] = Pr[R1 | C] Pr[R2] Want this at least 2-zn/2 for z < 1 Pr[ Xi > n/2 + cn1/2] > ½ - c (2/pi)1/2 [Stirling] Independence => Pr[R2] > (½ - c(2/pi)1/2)n/2 - m
Remains to show Pr[R1 | C] large.
Computing Pr[R1 | C]
Pr[R1 | C] = Pr[M ¸ n/2 + cn1/2 1s in 1st m rows | C]
Show Pr[R1 | C] > 2-z’m for certain constant z’ < 1
Ingredients: Expect to get n/2 + (n1/2) 1s in each of 1st m rows | C Use negative correlation of entries in a given row => show n/2 + (n1/2) 1s in a given row w/good probability
for small enough c A simple worst-case conditioning argument on these 1st
m rows shows they all have ¸ n/2 + cn1/2 1s
Completing the Proof
Recall: what is probability p = xy, where
1. x = Pr[ 8 t 2 T, (y, t) · n/2 – cn1/2]
2. y = Pr[ 8 t 2 S – T, (y,t) > n/2] = 2-n/2
3. R1: M ¸ n/2 + cn1/2 1s in first m rows
4. R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows
5. C: M majority 1 in each column x ¸ Pr[R1 | C] Pr[R2] ¸ 2-z’m (½ - c(2/pi)1/2)n/2 – m
Analysis shows z’ small so this ¸ 2-z’’n/2, z’’ < 1 Hence p = xy ¸ 2-(z’’+1)n/2 Hence expected # good sets 2n-O(log n)p = 2(n) So exists S with 2(n) good T
Bipartite Graphs
Matrix Problem Bipartite Graph Counting Problem:
How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn1/2 and each right vertex degree > n/2?
… …
Our Result on # of Bipartite Graphs
Bipartite graph count: Argument shows at least 2n^2/2 – zn/2 –n such bipartite
graphs for constant z < 1.
Main lemma shows # bipartite graphs on n + n vertices w/each vertex degree > n/2 is > 2n^2-zn-n
Can replace > with <
Previous knowncount: 2n^2-2n [MW – personal comm.] Follows easily from Kleitman inequality
Summary
Results:
Optimal Fk Lower Bound: 8 k 1 and any = (m-1/2), any -approximator for Fk must use (-2) bits of space.
Communication Lower Bound of (-2) for one-way communication complexity of (, )-approximating (x, y)
Bipartite Graph Count: # bipartite graphs on n + n vertices w/each vertex degree > n/2 at least 2n^2-zn-n for constant z < 1.