Top Banner
Advanced Topics in Communication Networks Programming Network Data Planes ETH Zürich Alexander Dietmüller Oct. 11 2018 nsg.ee.ethz.ch
83

Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

Jul 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

Advanced Topics in Communication Networks

Programming Network Data Planes

ETH Zürich

Alexander Dietmüller

Oct. 11 2018

nsg.ee.ethz.ch

Page 2: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

2

Last week on

Advanced Topics in Communication Networks

Page 3: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

3

Probabilistic data structures like Bloom Filters

help to trade resources with accuracy

1

0

0

0

0

1

0

0

1

0

hash_a(“Hello”)

hash_b(“Hello”)

hash_c(“Hello”)

INSERT“Hello”

Recap

QUERY“Hello”

hash_a(“Hello”)

hash_b(“Hello”)

hash_c(“Hello”)

Page 4: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

4

Bloom Filters take a fixed number of operations,

but hash collisions can cause false positives.

1

0

0

0

0

1

0

0

1

0

hash_a(“Hello”)

hash_b(“Hello”)

hash_c(“Hello”)

INSERT“Hello”

Recap

QUERY“Hello”

hash_a(“Hello”)

hash_b(“Hello”)

hash_c(“Hello”)

Page 5: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

5

Bloom Filters take a fixed number of operations,

but hash collisions can cause false positives

1

0

0

0

0

1

0

0

1

0

hash_a(“Hello”)

hash_b(“Hello”)

hash_c(“Hello”)

INSERT“Hello”

QUERY“Bye”

hash_a(“Bye”)

hash_c(“Bye”)

hash_b(“Bye”)

Recap

Page 6: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

6

A bloom filter is a streaming algorithm

answering specific questions approximately.

Recap

Page 7: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

7

A bloom filter is a streaming algorithm

answering specific questions approximately.

Is X in the stream?What is in the stream?Invertible Bloom Filter

Recap

Page 8: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

8

A bloom filter is a streaming algorithm

answering specific questions approximately.

Is X in the stream?What is in the stream?Invertible Bloom Filter

What about other questions?

Page 9: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

9

This week on

Advanced Topics in Communication Networks

Page 10: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

10

Today we’ll talk about: important questions,

how ‘sketches’ answer them,

and limitations of ‘sketches’

Page 11: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

11

Is a certain element in the stream?

Bloom Filter

How frequently does an element appear?

Count Sketch, CountMin Sketch, ...

How many elements belong to a certain subnet?

SketchLearn SigComm ‘18

How many distinct elements are in the stream?

HyperLogLog Sketch, ...

What are the most frequent elements?

Count/CountMin + Heap, …

Page 12: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

12

In networking, we talk about packet flows,

but these questions apply to other domains as well,

e.g. search engines and databases.

Page 13: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

13

Is a certain element in the stream?

Bloom Filter

How frequently does an element appear?

Count Sketch, CountMin Sketch, ...

How many elements belong to a certain subnet?

SketchLearn SigComm ‘18

How many distinct elements are in the stream?

HyperLogLog Sketch, ...

What are the most frequent elements?

Count/CountMin + Heap, …

Page 14: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

14

We are going to look at frequencies,

i.e. how often an element occurs in a data stream.

vector of frequencies (counts)

of all distinct elements xi

x=[x1x2⋮

]

Page 15: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

15

We are going to look at frequencies,

i.e. how often an element occurs in a data stream.

vector of frequencies (counts)

of all distinct elements xi

x=[x1x2⋮

]distinct flows

Page 16: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

16

In the worst case, an algorithm providing

exact frequencies requires linear space.

Page 17: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

17

In the worst case, an algorithm providing

exact frequencies requires linear space.

Data Stream

n elements in total

Page 18: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

18

In the worst case, an algorithm providing

exact frequencies requires linear space.

Data Stream

n elements in total

→ n distinct elements

(in the worst case)

Page 19: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

19

In the worst case, an algorithm providing

exact frequencies requires linear space.

Data Stream

n elements in total

→ n distinct elements

(in the worst case)

→ n counters required? :(

Page 20: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

20

Bloom Filtersquickly “filter” only those

elements that might be in

the set

More efficient by allowing

false positives.

Probabilistic datastructures can help again!

Page 21: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

21

Bloom Filtersquickly “filter” only those

elements that might be in

the set

More efficient by allowing

false positives.

Sketchesprovide a approximate

frequencies of elements

in a data stream.

More efficient by allowing

mis-counting.

Probabilistic datastructures can help again!

Page 22: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

22

Today we’ll talk about: important questions,

how ‘sketches’ answer them,

limitations of ‘sketches’

Page 23: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

23

A CountMin sketch uses the same principles as a

counting bloom filter, but is designed to have

provable L1 error bounds for frequency queries.

Page 24: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

24

A CountMin sketch uses the same principles as a

counting bloom filter, but is designed to have

provable L1 error bounds for frequency queries.

x=[x1x2⋮ ]

Notation reminder:

vector of frequencies (counts)

of all distinct elements xi

Page 25: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

25

A CountMin sketch uses the same principles as a

counting bloom filter, but is designed to have

provable L1 error bounds for frequency queries.

Page 26: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

26

Pr [ x̂ iestimatedfrequency

− x itrue

frequency

≥ ε‖x‖1sum offrequencies

]≤δ

The estimation error exceeds

with a probability smaller than

ε‖x‖1δ

Page 27: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

27

Pr [ x̂ iestimatedfrequency

− x itrue

frequency

≥ ε‖x‖1sum offrequencies

]≤δ

relative to L1 norm

The estimation error exceeds

with a probability smaller than

ε‖x‖1δ

Page 28: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

28

Pr [ x̂ iestimatedfrequency

− x itrue

frequency

≥ ε‖x‖1sum offrequencies

]≤δ

Let ‖x‖1=10000 , ε=0.01 , δ=0.05

The probability for any estimate to be

off by more than 100 is less than 5%(after counting 10000 elements)

Page 29: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

29

A CountMin sketch uses the same principles as a

counting bloom filter, but is designed to have

provable L1 error bounds for frequency queries.

Page 30: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

30

A CountMin Sketch uses multiple arrays and hashes.

"

w indicesper array(range of hashes)

d arrays(one hash function per array)

w⋅d counters(total size)

counters

Page 31: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

31

xa+1hash_a(“Hello”)

hash_c(“Hello”)

COUNT“Hello”

xc+1

xb+1hash_b(“Hello”)

Page 32: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

32

Hash collisions cause over-counting.

xa +...hash_a(“Hello”)

hash_c(“Hello”)

xc +...

xb +...hash_b(“Hello”)

hash_a(“Test”)hash_a(“Net”)

hash_b(“Bye”)hash_b(“UDP”)hash_b(“FUBAR”)

hash_c(“TCP”)

Page 33: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

33

Returning the minimum value minimizes the error.

xa

hash_a(“Hello”)

hash_c(“Hello”)

QUERY“Hello”

xc

xb

hash_b(“Hello”)

returnmin(xa,xb,xc)

Page 34: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

34

A CountMin sketch uses the same principles as a

counting bloom filter, but is designed to have

provable L1 error bounds for frequency queries.

Pr [ x̂iestimatedfrequency

− xitrue

frequency

≥ ε‖x‖1sum offrequencies

]≤δ

Page 35: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

35

Understanding the error bounds allows

dimensioning the sketch optimally.

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 36: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

37

x̂ iestimatedfrequency

= minh∈h1 .. hd

x̂ih

estimate forspecific hash

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 37: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

38

Pr [X≥ c⋅E [X ]]≤1c

The error bounds can be derived

with Markov’s Inequality

wikipedia.org/wiki/Markov's_inequality

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 38: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

39

The error bounds can be derived

with Markov’s Inequality

wikipedia.org/wiki/Markov's_inequality

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

Page 39: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

40

x̂ih= x i + ∑

x j≠ xi

x j 1h (xi , x j)

truefrequency

over-countingfrom hash collisions

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

Page 40: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

41

={1, if h (x i)=h (x j)0, otherwise

x̂ih= x i + ∑

x j≠ xi

x j 1h (xi , x j)

hash collision

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

Page 41: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

42

x̂ih− xi = ∑

x j≠ x i

x j 1h (x i , x j)

estimationerror

over-countingfrom hash collisions

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

Page 42: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

43

x̂ih− xi = ∑

x j≠ x i

x j 1h (x i , x j)

E [ x̂ ih− xi ] = E [ ∑x j≠ x i

x j 1h (xi , x j)]

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

Page 43: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

44

E [ x̂ ih− xi ] = E [ ∑x j≠ x i

x j 1h (xi , x j)]

We treat the data as a constant and the

hash as a random function with certain properties.

constantrandom

wikipedia.org/wiki/Universal_hashing

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

x̂ih− xi = ∑

x j≠ x i

x j 1h (x i , x j)

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

Page 44: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

45

E [ x̂ ih− xi ] = ∑

x j≠ xi

x j E [1h (x i , x j)]

wikipedia.org/wiki/Universal_hashing

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

x̂ih− xi = ∑

x j≠ x i

x j 1h (x i , x j)

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

We treat the data as a constant and the

hash as a random function with certain properties.

Page 45: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

46

E [ x̂ ih− xi ] = ∑

x j≠ xi

x j E [1h (x i , x j) ]⏟≤1w

wikipedia.org/wiki/Universal_hashing

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

x̂ih− xi = ∑

x j≠ x i

x j 1h (x i , x j)

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

We treat the data as a constant and the

hash as a random function with certain properties.

Page 46: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

47

E [ x̂ih− xi ] ≤ ∑

x j≠ x i

x j1w

wikipedia.org/wiki/Universal_hashing

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

x̂ih− xi = ∑

x j≠ x i

x j 1h (x i , x j)

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

We treat the data as a constant and the

hash as a random function with certain properties.

Page 47: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

48

E [ x̂ih− xi ] ≤ ∑

x j≠ x i

x j1w

≤ ∑x j

x j1w

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

x̂ih− xi = ∑

x j≠ x i

x j 1h (x i , x j)

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

Page 48: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

49

E [ x̂ih− xi ] ≤ ∑

x j≠ x i

x j1w

≤ ‖x‖11w

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

x̂ih− xi = ∑

x j≠ x i

x j 1h (x i , x j)

Pr [ x̂ih− x i≥c⋅E [ x̂ i

h− x i]]≤

1c

Page 49: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

50

Pr [ x̂ih− x i≥c⋅E [ x̂i

h− xi ]⏟

≤1w

‖x‖1

]≤1cError Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 50: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

51

Pr [ x̂ih− x i≥

cw

‖x‖1]≤1cError Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 51: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

52

Pr [ x̂ih− x i≥ ε

h⏟cw

‖x‖1]≤ δh

⏟1c

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 52: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

53

The estimate for each hash has

a well defined L1 error bound.

Pr [ x̂ih− x i≥ ε

h⏟cw

‖x‖1]≤ δh

⏟1c

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 53: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

54

The estimate for each hash has

a well defined L1 error bound.

What about the minimum?

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ x̂ih− x i≥ ε

h⏟cw

‖x‖1]≤ δh

⏟1c

Page 54: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

55

Pr [ x̂ i− x i≥cw

‖x‖1] ≤ ?

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 55: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

56

Pr [ minh∈h1 .. hd

x̂ ih

⏟x̂ i

− xi≥cw

‖x‖1] ≤ ?

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 56: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

57

∏h∈h1 .. hd

Pr [ x̂ ih− x i≥

cw

‖x‖1] ≤ ?

Multiple hash functions work like independent trials.

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ minh∈h1 .. hd

x̂ ih

⏟x̂ i

− xi≥cw

‖x‖1] ≤ ?

Page 57: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

58

∏h∈h1 .. hd

Pr [ x̂ ih− x i≥

cw

‖x‖1]⏟

≤1c

≤ ?

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

error bound per hash

Pr [ minh∈h1 .. hd

x̂ ih

⏟x̂i

− xi≥cw

‖x‖1] ≤ ?

Page 58: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

59

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

∏h∈h1 .. hd

Pr [ x̂ ih− x i≥

cw

‖x‖1]⏟

≤1c

≤1

cd

Pr [ minh∈h1 .. hd

x̂ ih

⏟x̂i

− xi≥cw

‖x‖1] ≤ ?

Page 59: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

60

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ minh∈h1 .. hd

x̂ ih

⏟x̂i

− xi≥cw

‖x‖1] ≤1

cd

Page 60: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

61

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ x̂ i− x i≥cw

‖x‖1] ≤1

cd

Page 61: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

62

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Pr [ x̂ i− x i≥ ε⏟cw

‖x‖1]≤ δ⏟1cd

We have proven the error bounds!

But what about the constant c?

Page 62: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

63

For every c, there is a pair ( ) achieving

the error bound and confidence ( ).ε ,δd ,w

ε=cw

⇒ w= ⌈ cε ⌉

δ=1

cd⇒ d=⌈ logc 1δ ⌉

(hash range)

(#hashes)

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 63: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

64

Choosing c=e minimizes the

total number of counters.

ε=ew

⇒ w= ⌈ eε ⌉

δ=1

ed⇒ d= ⌈ ln 1δ ⌉

d⋅w=cε logc

=minimize e

ε ln1δ

(hash range)

(#hashes)

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 64: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

65

A CountMin sketch recipe

w= ⌈ eε ⌉

d= ⌈ ln 1δ ⌉

(hash range)

(#hashes)

Given , choosing

requires the minimum number of

counters s.t. the CountMin Sketch

can guarantee that

x̂i− xi≥ε‖x‖1with a probability less than δ

ε ,δ

Error Bounds

per hash/array

Optimal Size

Error Bounds

for the minimum

Page 65: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

66

A CountMin sketch uses the same principles as a

counting bloom filter, but is designed to have

provable L1 error bounds for frequency queries.

Page 66: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

67

A CountMin sketch uses the same principles as a

counting bloom filter, but is designed to have

provable L1 error bounds for frequency queries.

CountMin sketch recipe

Choose d= ⌈ ln 1δ ⌉ , w= ⌈eε ⌉

Then x̂i− x i≥ε‖x‖1 with a probability less than δ

Page 67: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

68

A CountMin sketch uses the same principles as a

counting bloom filter, but is designed to have

provable L1 error bounds for frequency queries.

→ only one design out of many!

Page 68: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

69

A Count sketchMin uses the same principles as a

counting bloom filter, but is designed to have

provable L2 error bounds for frequency queries.

Page 69: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

70

The Count sketch uses additional hashing to

give L2 error bounds, but requires more resources.

CountMin sketch

h1, …, h

d: U → {1, …, w}

COUNT xi:

for h in h1, …, h

d:

Regh[h(x

i)] + 1

QUERY xi:

return minh in h1, …, hd

(

Regh[h(x

i)]

)

Page 70: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

71

CountMin sketch

h1, …, h

d: U → {1, …, w}

COUNT xi:

for h in h1, …, h

d:

Regh[h(x

i)] + 1

QUERY xi:

return minh in h1, …, hd

(

Regh[h(x

i)]

)

Count sketch

h1, …, h

d: U → {1, …, w}

g: U → {+1, -1}

COUNT xi:

for h in h1, …, h

d:

Regh[h(x

i)] + g(x

i)

QUERY xi:

return medianh in h1, …, hd

(

Regh[h(x

i)] * g(x

i)

)

The Count sketch uses additional hashing to

give L2 error bounds, but requires more resources.

Page 71: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

72

CountMin sketch recipe

Choose d= ⌈ ln 1δ ⌉ , w= ⌈eε ⌉

Then x̂i− x i≥ε‖x‖1 with a probability less than δ

The Count sketch uses additional hashing to

give L2 error bounds, but requires more resources.

Page 72: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

73

CountMin sketch recipe

Choose d= ⌈ ln 1δ ⌉ , w= ⌈eε ⌉

Then x̂i− x i≥ε‖x‖1 with a probability less than δ

Count sketch recipe

Choose d= ⌈ ln 1δ ⌉ , w=⌈eε2 ⌉Then x̂i− x i≥ε‖x‖2 with a probability less than δ

The Count sketch uses additional hashing to

give L2 error bounds, but requires more resources.

Page 73: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

Sketches are the new black

OpenSketch

NSDI ‘13

UnivMon

SIGCOMM ‘16

SketchLearn

SIGCOMM ‘18

...and many more!

[source] [source] [source]

Page 76: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

77

Today we’ll talk about: important questions,

how ‘sketches’ answer them,

limitations of ‘sketches’

Page 77: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

78

Sketches compute statistical summaries,

favoring elements with high frequency.

Pr [ x̂i− x iestimationerror

≥ε‖x‖1]≤δrelative to sum of all elements

Page 78: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

79

Sketches compute statistical summaries,

favoring elements with high frequency.

Let ε=0.01, ‖x‖1=10000 (⇒ ε⋅‖x‖1=100)

Assume two flows xa , xb ,

with ‖xa‖1=1000, ‖xb‖1=50

high frequency

low frequency

Page 79: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

80

Sketches compute statistical summaries,

favoring elements with high frequency.

Let ε=0.01, ‖x‖1=10000 (⇒ ε⋅‖x‖1=100)

Assume two flows xa , xb ,

with ‖xa‖1=1000, ‖xb‖1=50

Error relative to stream size: 1%

Page 80: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

81

Sketches compute statistical summaries,

favoring elements with high frequency.

Error relative to stream size: 1%

flow size: xa: 10%, xb: 200%

Let ε=0.01, ‖x‖1=10000 (⇒ ε⋅‖x‖1=100)

Assume two flows xa , xb ,

with ‖xa‖1=1000, ‖xb‖1=50

Page 81: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

82

Other Problems a Sketch can’t handle

causality patterns rare things

Page 82: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

83

Regardless of their limitations, sketches provide

trade-offs between resources and error, and

provable guarantees to rely on.

Page 83: Programming Network Data Planes - Advanced Topics in ... · Advanced Topics in Communication Networks Programming Network Data Planes ... vector of frequencies (counts) of all distinct

84

Advanced Topics in Communication Networks

Programming Network Data Planes

ETH Zürich

Alexander Dietmüller

Oct. 11 2018

nsg.ee.ethz.ch