Top Banner
The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe Killian NEC Lab. Ronitt Rubinfeld NEC Lab. Ayellet Tal Technion, Princeton Un. Presented by Lilach Bien
57

The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Dec 17, 2015

Download

Documents

Estella Conley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Bloomier Filter

Bernard Chazzelle Princeton Un., NEC Lab.Joe Killian NEC Lab.Ronitt Rubinfeld NEC Lab.Ayellet Tal Technion, Princeton Un.

Presented by Lilach Bien

Page 2: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Overview

The Problem

Definitions

The algorithm

Analysis

Lower Bounds Deterministic algorithm Mutable version of the problem

Page 3: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Problem

Bloom & Bloomier Filters

Page 4: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Problem – Bloom Filters

A large set of data D, with a small subset S

We want to query whether an item d belongs to S

No false negative rate (if d belongs to S we’ll recognize it)

A small positive rate (we may say d belongs to S, although it doesn’t)

Allowing a small positive rate enables to build a compact data structure

Page 5: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Problem – Bloomier Filters

Bloom Filters – membership queries on a small subset of D.

Bloomier Filters – computing arbitrary functions defined only in a small subset of D.

The function will be computed correctly for all members of S (no false negative)

For items not in S, we almost always return a special value .

Allow dynamic updates to the function, if S doesn’t change.

Page 6: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Example

D={1,…100} S={1,…3} R={1,2}

f(1)=1 f(2)=1 f(3)=2

1 2 87 55 40

1 1 1

f(2)=2

66 2 3

2 2

Page 7: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Bloomier Filters - Uses

Building a meta database for a union of databases.

Keeps track of which database contains information about each entry.

Maintaining directories if the data or code is maintained in multiple locations.

Page 8: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Definitions

Page 9: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Formal Definitions f is a function from D={0,…,N-1}

The range is R={,1,…,2r-1}

S = {t1,…tn} is a subset of D of size n.

f(ti)=vi viR

f(x)= for x outside of S

f can be specified by the assignment

A={(t1,v1),…,(tn,vn)}

Page 10: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Formal Definitions (Cont.)

Bloomier filters allow to query f at any point of S always correctly

For a random xD\S the query return f(x)= with probability 1-

The input to the algorithm is A and

Page 11: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Supported Operations CREATE (A):

Given an assignment A={(t1,v1),…(tn,vn)}, we initialize the data structure Tables.

SET_VALUE(t,v,Tables):

For tD and v R we associate the value v with the domain element t in Tables.

It is required that t belongs to S.

Page 12: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Supported Operations (Cont.)

LOOKUP(t, Tables):

For tS we return the last value v associated with t.

For all but a fraction of D\S we return .

For the remaining elements of D\S we return an arbitrary element of R.

Page 13: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Idea We encode the values in R as elements of the additive group X={0,1}q

Addition in Q is bitwise XOR

Any xR is transformed to Q by its q-bit binary expansion ENCODE(x)

For y Q we define DECODE(y) as The corresponding number in R, if y<|R|

otherwise

Page 14: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Idea (Cont.) We’ll save the function values for elements of S in a table.

We’ll use a hash function to compute a random q-bit masking value M for every x in D.

To lookup the value of x, we’ll access a set of places in the table and calculate a q-bit number a.

We’ll return M XOR a.

Page 15: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Idea (Cont.) If t is in S – we’ll build the table so

a XOR M = f(t).

Otherwise, since M is random, we’ll get a random q-bit number y.

Proof: For the i’th bit of y

Suppose ai=0 (without loss of generality)

We get

5.005.015.0

)1|0Pr()1Pr(

)0|0Pr()0Pr()0Pr(

iii

iiii

MyM

MyMy

Page 16: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Idea (Cont.) Since y is random, for big enough q, DECODE(y) will return with high probability

If we save in the table elements of R (y is an element of R) DECODE(y) will not return with probability |R|/2q

We can do better.

Page 17: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Using 2 Tables We have a table of size m, and a hash function HASH: D{1,..,m}k

if HASH(t) = (h1,..,hk) we say that {h1,…,hk} is the neighborhood of t, N(t)

For large enough m and k, we can choose for each tS an element (t) from HASH(t) such that:

For each t’S, t’≠t, it holds that (t) ≠ (t’)

If (t) =hi we use (t) to denote i.

Page 18: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Using 2 Tables (Cont.) We’ll use 2 tables:

The first table will store values in {,1,…,k} encoded as values in Q.

It will return (t) for t in S, and return for most of the other items.

The second table will store values in R.

For each t in S the value f(t) in will be in place (t) .

Page 19: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Using 2 Tables (Cont.)If x is in D/S then with probability k/2q the first table will not return .

With probability k/2q we will access the second table and return “garbage”.

Now we can also change function values if we want.

We use the first table to check which place in the second table stores the value we want to change.

We change the value in the second table.

Page 20: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Algorithm

Page 21: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The First Table Reminder:

We want to use the table to compute a value a for each item t in D.

For items in S, a XOR M will give us the encoded (t) .

When we access the first table with an element t we know N(t)={h1,…,hk} and M.

We’ll compute

We want to set the values in the indices of N(t) so

a XOR M will give us the encoded (t).

][11

i

k

ihTablea

Page 22: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Order Respecting Matching

Let S be a set with neighborhood N(t) defined for each tS.

Let be a complete ordering on the elements of S.

A matching respects (S, ,N) if

For all t S, (t) N(t)

If ti> tj then (ti)N(ti)

Page 23: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Order Respecting Matching (Cont.)

If for N defined by HASH a matching respects (S, ,N) it has all the properties we wanted:

For all t S, (t) N(t)

For all t,t’ S, (t) ≠ (t’)

We may build the first table incrementally so that for

a XOR M will give us the encoded (t).

][11

i

k

ihTablea

Page 24: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Building The First Table Input:

Order

Neighborhood N(t) defined by HASH

Order respecting matching

For t= [1],…, [n] we set Table[(t)] so that

encodes (t).

Since is order respecting we can’t affect any value already set for t’< t.

][11

i

k

ihTableM

Page 25: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Finding A Good Ordering And Matching

We get S and HASH, and compute and so is order respecting.

A location h{1,…,m} is a singleton for S if hN(t) for exactly one tS.

TWEAK(t,S,HASH) is the smallest value j such that hj is a singleton for S, where N(t)=(h1,…,hk)

TWEAK(t,S,HASH)= if no such j exists.

If TWEAK(t,S,HASH) is defined we may set

(t)= TWEAK(t,S,HASH). This is an “easy match”.

Page 26: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Finding A Good Ordering And Matching (Cont.)

If t is an easy match it doesn’t collide with the neighborhood of any t’S.

E – the subset of S with easy matches.

H=S/E.

We recursively find (’,’) for H.

We extend (’,’) to (,): We first put the ordered elements of H, and then the

elements of E.

is the union of matchings for H and E.

Page 27: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

FIND_MATCHFIND_MATCH (HASH, S)[m, k] Find (, ) for S, HASH1. E =; =

For ti S

If TWEAK (ti, S,HASH ) is defined

i = TWEAK (ti, S, HASH )

E = E + ti

If E = Return (failure)2. H = S \ E

Recursively compute (', ')= FIND_MATCH (HASH ,H)[m ,k].If FIND_MATCH (HASH ,H)[m,k]=failure Return (failure)

3. = 'For ti E

Add ti to the end of (ie, make ti be the largest element in thus far)Return (; ={1,…,n})

(where i is determined for ti E, in Step 1, and for ti H (via ') in Step 2.)

Page 28: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

CREATECREATE (A = {(t1, v1) …, (tn, vn)})[m, k, q] (create a mutable table)1. Uniformly choose hash : D {1,…,m}k {0, 1}q

S = {t1,…, tn}

Create Table1 to be an array of m elements of {0, 1}q

Create Table2 to be an array of m elements of R.(the initial values for both tables are arbitrary)Put (HASH , m, k, q) into the "header" of Table1

(we assume that these values may be recovered from Table1)2. (, ) = FIND_MATCH (hash , S)[m, k]

If FIND_MATCH (hash , S)[m, k]= failure Goto Step 13. For t = [1], … , [n]

v = A(t) (ie, the value assigned by A to t)(h1,…,hk,M) = HASH (t)

L = (t); l = (t) (ie, L = hl)

Table1 [L] = ENCODE (l) M

Table2 [L] = v

4. Return (Table = (Table1,Table2)) ][1

1i

k

lii

hTable

Page 29: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

LOOKUP & SET_VALUELOOKUP (t, Table = (Table1,Table2))

1. Get (HASH, m, k, q) from Table1

(h1,…, hk, M) = HASH (t)l = DECODE (M )

2. If l is definedL = hl

Return (Table2[L])Else Return ()

SET_VALUE (t, v, Table = (Table1,Table2))

1. Get (HASH, m, k, q) from Table1

(h1,…, hk, M) = HASH (t)l = DECODE (M )

2. If l is definedL = hl

Table2[L] = vReturn (success)Else Return (failure)

][11

i

k

ihTable

][11

i

k

ihTable

Page 30: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Analysis

Page 31: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Analyzing FIND_MATCH We show that FIND_MATCH succeeds with constant probability for every S.

We’ll define a bi-partite graph G:

On the left side there are n vertices L={L1,…,Ln} corresponding to S.

On the right side there are m vertices R={R1,…,Rm} corresponding to {1,…,m}

There is an edge between Li and Rj if for tiS if there is l such that j=hl.

Page 32: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Singleton Property We say that G has the singleton property if for all nonempty AL there exists a vertex RiR such that Ri is adjacent to exactly one vertex in A.

If G has the singleton property FIND_MATCH will never get stuck (there will always be easy matches).

N(v) – the set of neighbors of vL.

N(A) – the set of neighbors of the elements in A.

Page 33: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Lossless Expansion Property

We say that G has the lossless expansion property if for all nonempty AL, |N(A)|>k|A|/2

If G has the lossless expansion property it has the singleton property:

Assume to contrary that there is an A such that each node in N(A) has at least 2 neighbors.

The sub-graph for A has at least 2|N(A)| edges.

Since |N(A)|>k|A|/2, the sub-graph has more than k|A| edges – a contradiction.

Page 34: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Lossless Expansion Property (Cont.)

For a random graph G with

Fixed k, k>2

m=ckn for a fixed c

G is a lossless expander with constant probability.

FIND_MATCH will succede with constant probability.

Page 35: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Data Structure Complexity

The error probability is k/2q

We have to set

Space:O(n(r+log1/ε)) bits

Lookup Time: O(1)

Update Time: O(1)

)/lg( kq

Page 36: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Data Structure Complexity (Cont.)

FIND_MATCH – we’ll use the graph again.

We may show that with high probability for all non-empty AL, |N(A)|>c|A| for some constant c>k/2.

For a set A L we’ll assume there are a items in N(A) with one neighbor and c|A|-a items with more than one neighbor.

The sub-graph for A has at least

a+2(c|A|-a)=2c|A|-a edges.

On the other hand it has at most k|A| edges.

aAkcAkaAc ||)2(||||2

Page 37: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Data Structure Complexity (Cont.)

Each item in A has at most k neighbors.

The number of items in A that has neighbors that belong only to them is at least

a/k (2c-k)|A|/k = (2c/k-1)|A|=p|A|

These items are easy matches.

The run-time of FIND_MATCH is, if there is such c is

O(n)+O((1-p)n)+O((1-p)2n)+…=O(n)

That is also the expected run-time of CREATE

Page 38: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Lower Bounds

Page 39: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Deterministic Algorithm If R={1,2,}, S splits into subsets A and B that map to 1 and 2, resp.

Even in that case deterministic Bloomier filtering requires Ω(n + log log N) bits of storage.

Define G - a graph where each node is a vector in {-1,0,1}N with exactly n coordinates equal to 1, and n others equal to -1.

The 1’s represent A and the -1’s represent B.

Two nodes v and v’ are adjacent if the set A of v intersects the set B of v’

(if v=(x1,…,xN) and v’=(y1,…yN) they are adjacent if there is i such that xiyi=-1)

Page 40: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Deterministic Algorithm (Cont.)

Since the memory is the only source of information about A and B no 2 adjacent node should correspond to the same memory configuration. The memory size m is at least logχ(G) (χ(G) is the minimum number of colors required to color G). We’ll show that χ(G) is between Ω(2n log N) and O(n2n log N).

Page 41: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Lower Bound On χ(G) For every color c required to color G we have a vector zc in {-1,1}N.

For a node v=(x1,…,xN) we allow xi to be 1 (or -1) only if zi is 1 (or -1).

A set of binary vectors in length l is (k,l) universal if for every choice of k coordinate positions we get all the possible 2k patterns.

We’ll show that zc is (N,n) universal if we turn the minus ones to zeroes.

Page 42: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Lower Bound On χ(G) (Cont.)

Let i1,..,in be n coordinate positions.

For each w in {-1,1}N we have a node v whose i1,..,in coordinates match w.

If v is colored in color c then the i1,..,in coordinates of zc match w.

Therefore, for each choice of n-coordinate positions we get all the possible patterns.

There size of an (N,n) universal set is Ω(2n log N) so this is a lower bound on χ(G) .

Page 43: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Upper Bound On χ(G)

There exists an (N,2n) universal set of vectors of size O(n2n log N).

We’ll turn all the zeroes to minus ones.

We’ll use that group as zc.

Because the set zc is universal we may select for each node is a vector zc that matches the 1’s and -1’s of the node.

c will be the color of the node.

Page 44: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Mutable Filtering If

and the number m of storage bits satisfies

for some large enough constant c, the Bloomier Filtering cannot support dynamic updates on S of size 2n.

The proof is for the R={1,2,}, S splits into subsets of size n A and B that map to 1 and 2, resp.

We assume the algorithm is randomized.

)1(

2OnN

)/log(log 3cnNc

nmn

Page 45: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Mutable Filtering (Cont.) Let be a sequence of random choices made by the algorithm, when the input to the algorithm was A and B.

We assume B was a specific set Borg and change A.

For each possible A we have a corresponding memory configuration.

In other words – for each memory configuration we have a family of possibilities to A that led to this configuration.

Let F be the largest family.

Page 46: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Mutable Filtering (Cont.) Now we change B: For each possible Bnew we get to a different memory configuration.

For each configuration 1i2m there is a family of options to Bnew that leads to it. We mark it by Gi.

1, ABorg

n

Norg AB ,

1conf

mconf2

1newB

n

NnewB

1conf

mconf2

1newB

n

NnewB

1newB

I II

Page 47: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Mutable Filtering (Cont.) Given a memory configuration C in II, For any path that leads to it

B can be the Bnew on the path.For each item in such a set we must answer ‘in B’.

A can be the set on the path before configuration in I.For each item in such a set that couldn’t be changed to Bnew on the path we must answer ‘in A’.

Suppose in I we were in the configuration F leads to, and then we randomly chose Bnew.

i(Bnew) denotes j such that BnewG j

In II we have to: Answer ‘in A’ for each item of a set in F that couldn’t

be changed to Bnew

Answer ‘in B’ for each item of a set in G i(Bnew)

Page 48: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Proof is the subset of F whose sets intersect Bnew.

We show that with high probability (over the selection of Bnew) the sets

are intersecting.

There is an item for which the algorithm must answer both ‘in A’ and ‘in B’.

There is a set Bnew that causes the algorithm to make errors.

newB

cFS

S

)( newBiGS

S

newBFnewnew BB

c FFF \

Page 49: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Lk And Its Size

Lk is the set of items that belong to at list k sets in F.

We’ll look at subsets of

that belong to Lk and show they intersect.

We first bound the size of Lk.

Fk is the sub-family of F that contains only subsets of Lk.

m

nNF

2||

)( newBiGS

S

NkFF k )1(|\|

12||2/1)1(||||

2/||

mk n

NFNkFF

NFk

newB

cFS

S

Page 50: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Lk And Its Size (Cont.)

NL

nNF

nL

n

m

k

mk

k

1

1

2||

2||||

Page 51: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

And Its Size

It is a subset of both and Lk.

The algorithm should answer ‘in A’ for each item of

We’ll show that with probability 1/2 cannot be very small.

The expected number of sets in F a random item of D intersects is

newBkL

}|{ newnew Bck

Bk FSLSL

newBkL

newBkL

N

nF ||

newB

cFS

S

Page 52: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

And Its Size (Cont.)

If an item in Lk does not appear in it intersects only sets in

Such an item appears in at least k sets.

According to Markov bound

newBF

newBkL

newBkL

N

nFFnEFSSE newnew BB

3

|||)(|})|:{|(

k

FSSLL

new

new

BBkk

}|:{||\|

33

3|||||||)\||(||)(| nLFkN

nLLLLELE kk

Bkkk

Bk

newnew

33|)||(| nLLE newBkk

2/1|)|||6Pr()6|||Pr(| 33 newnew Bkkk

Bk LLnnLL

Page 53: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

And Its Size

Mi is a subset of Lk and

The algorithm should answer ‘in B’ for each item of

According to Chernoff Bound

)( newBiM

)( newBiM

iGS

S

}:{ iki GSLSM

NnLk 2/||

N

LnLBE k

knew

|||)(|

)1()|Pr(| 8/12/1)2/1( 2

oeeLB knew

Page 54: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

And Its Size (Cont.)

This probability is the number of Bnew’s that hold

in all Gi’s, divided by the number of Bnew’s that hold

nmckL

/

2||

)( newBiM

)||||Pr(|max

)||||Pr(|

)(

)(

sLBM

LBM

knewBis

knewBi

new

new

s

Ls

snLD

sL

snLD

ssLBM km

skk

km

sknewBi

s new

||/2max

|\|||

|\|2

max)||||Pr(|max )(

sLBM knewBi new |||| )(

sLB knew ||

Page 55: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

)1(||/2

||/2

||/2max

)||||Pr(| )(

oL

LsL

s

LBM

km

kmkm

s

kBi new

)1(1

)|Pr(|)||||Pr(|1

)|Pr(|1)|Pr(|

)(

)()(

o

LBLBM

MM

kkBi

BiBi

new

newnew

)1(1)6|Pr(|6 3)(

3 onMnnewBi

And Its Size (Cont.))( newBiM

Page 56: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

The Error

With probability at least ½-o(1)

and

that is, the 2 sets intersect

The algorithm must answer ‘in A’ for each item of

and ‘in B’ for each item of

There is a set Bnew for which algorithm will make an error

3)( 6|| nM

newBi 36|||| nLL kBknew

)( newBiM

newBkL

Page 57: The Bloomier Filter Bernard Chazzelle Princeton Un., NEC Lab. Joe KillianNEC Lab. Ronitt RubinfeldNEC Lab. Ayellet TalTechnion, Princeton Un. Presented.

Questions?