Sketching and Streaming Entropy via Approximation Theory

Sketching and Streaming Entropy via Approximation Theory

Nick Harvey (MSR/Waterloo)

Jelani Nelson (MIT)

Krzysztof Onak (MIT)

Streaming Model

Increment x 1

x = (0, 0, 0, 0, …, 0)

Algorithmx = (1, 0, 0, 0, …, 0)

Algorithmx ∈ ℤn

m updates

x = (1, 0, 0, 1, …, 0)

Algorithm

Increment x 4

Goal: Compute statistics, e.g. ||x||1, ||x||2 …Trivial solution: Store x (or store all updates)

O(n·log(m)) space

Goal: Compute using O(polylog(nm)) space

x = (9, 2, 0, 5, …,12)

Algorithm

Streaming Algorithms(a very brief introduction)

• Fact: [Alon-Matias-Szegedy ’99], [Bar-Yossef et al. ’02], [Indyk-Woodruff ’05], [Bhuvanagiri et al. ‘06], [Indyk ’06], [Li ’08], [Li ’09]

Can compute (1±) = (1±)Fp usingO(-2 logc n) bits of space(if 0 p2)O(-O(1) n1-2/p ∙ logO(1)(n)) bits (if

2<p)

• Another Fact: Mostly optimal: [Alon-Matias-Szegedy ‘99], [Bar-Yossef et al. ’02], [Saks-Sun ’02], [Chakrabarti-Khot-Sun ‘03], [Indyk-Woodruff ’03], [Woodruff ’04]

– Proofs using communication complexity and information theory

p

px

Practical Motivation• General goal: Dealing with massive data sets

– Internet traffic, large databases, …

• Network monitoring & anomaly detection– Stream consists of internet packets

– xi = # packets sent to port i

– Under typical conditions, x is very concentrated– Under “port scan attack”, x less concentrated– Can detect by estimating empirical entropy

[Lakhina et al. ’05], [Xu et al. ‘05], [Zhao et al. ‘07]

Entropy

• Probability distribution a = (a1, a2, …, an)

• Entropy H(a) = -Σ ailg(ai)

• Examples:– a = (1/n, 1/n, …, 1/n) : H(a) = lg(n)– a = (0, …, 0, 1, 0, …, 0) : H(a) = 0

• small when concentrated, LARGE when not

Streaming Algorithms for Entropy• How much space to estimate H(x)?

– [Guha-McGregor-Venkatasubramanian ‘06], [Chakrabarti-Do Ba-Muthu ‘06], [Bhuvanagiri-Ganguly ‘06]– [Chakrabarti-Cormode-McGregor ‘07]:

multiplicative (1±) approx: O(-2 log2 m) bits additive approx: O(-2 log4 m) bits

Ω(-2) lower bound for both

• Our contributions:– Additive or multiplicative (1±) approximation– Õ(-2 log3 m) bits, and can handle deletions– Can sketch entropy in the same space

~

First IdeaIf you can estimate Fp for p≈1,

then you can estimate H(x)

Why?Rényi entropy

Review of Rényi

• Definition:

• Convergence to Shannon:

p

xxxH

pp

pp

1

/log)( 1

)()(lim 1 xHxH pp

Hp(x)

p10 2 … Alfred RényiClaude Shannon

Overview of Algorithm• Set p=1.01 and let x =

• Compute

• Set

• So

p

pxy )1(

)log(1

1y

pH

pp

xp

p

1

)1log(

1

)log(

~1

/ xx

~

~

100)( xHH

100)(01.1 xH

~

~

(using Li’s “compressed counting”)

As p1this gets betterthis gets worse!

Analysis

)(xH

Making the tradeoff• How quickly does Hp(x) converge to H(x)?

• Theorem: Let x be distr., with mini xi ≥ 1/m.

Let . Then

Let . Then

• Plugging in: O(-3 log4 m) bits of space suffice for additive approximation

mOplog

11 1

)(

)(1

xH

xH

p

mOp

2log11

)()(0 xHxH p

Multiplicative Approximation

Additive Approximation

~

~~

~ ~

~

Proof: A trick worth remembering• Let f : ℝ and gℝ : ℝ be such thatℝ

Lpg

pfp

)(

)(lim 10)(lim 1 pfp 0)(lim 1 pgp

• l’Hopital’s rule says that

Lpg

pfp )(

)(lim 1

• It actually says more! It says converges to

at least as fast as does.

)(

)(

pg

pfL

)(

)(

pg

pf

Improvements• Status: additive approx using O(-3 log4 m) bits • How to reduce space further?

– Interpolate with multiple points: Hp1(x), Hp2

(x), ...

Hp(x)

p10 2 …

Shannon

Multiple Rényis

Single Rényi

LEGEND

Analyzing Interpolation

• Let f(z) be a Ck+1 function• Interpolate f with polynomial q with q(zi)=f(zi), 0≤i≤k• Fact:

where y, zi [a,b]• Our case: Set f(z) = H1+z(x)• Goal: Analyze f(k+1)(z)

)(sup)()()( )1(

],[

1 zfabyqyf k

baz

k

Hp(x)

p10 2 …

Bounding Derivatives

• Rényi derivatives are messy to analyze• Switch to Tsallis entropy f(z) = S1+z(x),

• Can prove Tsallis also converges to Shannon1

1)(

p

xxS

p

pp

~

n

ii

kzik xxzG

1

1 )(log)( ~ ~

k

jjk

jjk

k

kk

jz

zGk

z

zGkzf

111

0)(

!

)(!)1())(1(!)1()(

Define:

(when a=-O(1/(k·log m)), b=0)

can set k = log(1/ε)+loglog m

mxHOzf kk

baz

1)1(

],[log)(sup

Fact:

Key Ingredient:Noisy Interpolation

• We don’t have f(zi), we have f(zi)±ε

• How to interpolate in presence of noise?

• Idea: we pick our zi very carefully

Chebyshev Polynomials

• Rogosinski’s Theorem: q(x) of degree k and |q(βj)|≤ 1 (0≤j≤k) |q(x)| ≤ |Tk(x)| for |x| > 1

• Map [-1,1] onto interpolation interval [z0,zk]

• Choose zj to be image of βj, j=0,…,k

• Let q(z) interpolate f(zj)±ε and q(z) interpolate f(zj)

• r(z) = (q(z)-q(z))/ ε satisfies Rogosinski’s conditions!

))arccos(cos()( xkxTk

~

~

Tradeoff in Choosing zk

• zk close to 0 |Tk(preimage(0))|still small

• …but zk close to 0 high space complexity

• Just how close do we need 0 and zk to be?

Tk grows quickly once leaving [z0, zk]

z0 zk

0

The Magic of Chebyshev

• [Paturi ’92]:Tk(1 + 1/kc) ≤ e4k1-(c/2). Set c = 2.

• Suffices to set zk=-O(1/(k3log m))

• Translates to Õ(-2 log3 m) space

The Final Algorithm(additive approximation)

• Set k = lg(1/) + lglg(m),

zj = (k2cos(jπ/k)-(k2+1))/(9k3lg(m)) (0 ≤ j ≤ k)

• Estimate S1+zj = (1-(F1+zj

/(F1)1+zj))/zj for 0 ≤ j ≤ k

• Interpolate degree-k polynomial q(zj) = S1+zj

• Output q(0)

~

~~

~

~

Multiplicative Approximation• How to get multiplicative approximation?

– Additive approximation is multiplicative, unless H(x) is small– H(x) small large [CCM ’07]

• Suppose and define

• We combine (1±ε)RF1 and (1±ε)RF1+zj to get (1±ε)f(zj)

• Question: How do we get (1±ε)RFp?

• Two different approaches:– A general approach (for any p, and negative frequencies)– An approach exploiting p ≈ 1, only for nonnegative freqs

(better by log(m))

x

xx

i*

*ii

pip xRF

Questions / Thoughts• For what other problems can we use this

“generalize-then-interpolate” strategy?– Some non-streaming problems too?

• The power of moments?

• The power of residual moments?CountMin (CM ’05) + CountSketch (CCF ’02) HSS (Ganguly et al.)

• WANTED: Faster moment estimation (some progress in [Cormode-Ganguly ’07])

Sketching and Streaming Entropy via Approximation Theory

Documents