The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel
The Bloom Paradox
Ori Rottenstreich
Joint work with Isaac Keslassy
Technion, Israel
• Requirement: A data structure in user with fast answer to• Solutions:
o O(n) – Searching in a listo O(log(n)) – Searching in a sorted listo O(1) – But with false positives / negatives
Slocal cache
Problem Definition
2
Mcentral memory with
all elements
vuzyxzx
x
usercost = 10
cost = 1x
y
cost = 10
y
user
y
• False Positive: but the data structure answers
• Results in a redundant access to the local cache.
Additional cost of 1.
• False Negative: but the data structure answers
• Results in an expensive access to the central memory instead of the local cache.
Additional cost of 10-1=9.
Two Possible Errors
3
x
y
1
• Initialization: Array of zero bits.
• Insertion: Each of the elements is hashed times, the corresponding bits are set.
• Query: Hashing the element, checking that all bits are set.
• False positive rate (probability) of .
• No false negatives.
Bloom Filters (Bloom, 1970)
4
0000000000 00
1
y1 1
0000000000 00
1 1
z
x11
1 1
1 11 1 1
x11 1 w
1 11
• Cache/Memory Framework• Packet Classification• Intrusion Detection• Routing• Accounting• Beyond networking: Spell Checking, DNA Classification
• Can be found in o Google's web browser Chromeo Google's database system BigTableo Facebook's distributed storage system Cassandrao Mellanox's IB Switch System
Bloom Filters are Widely Used
5
The Bloom Paradox
6
Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it,
thus making the Bloom filter useless.
Outline
Introduction to Bloom Filters The Bloom Paradox
o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter
Summary
7
• Parameters:
• Extreme case without locality: All elements with equal probability of
belonging to the cache.o Toy example
Bloom Paradox Example
8
Bloom filter
• Parameters:• Let be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives in Bloom filter
• Intuition:
Slocal cache
Mcentral memory with
all elements
vuzyxzx
cost = 10cost = 1
cost = 10
Bloom Paradox Example
. .
userBBloom filterBloom filter
9
• Parameters:• Let be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives in Bloom filter
• Surprise:
cost = 1
Slocal cache
Mcentral memory with
all elements
vuzyxzx
cost = 10
cost = 10
Bloom Paradox Example
. . 9
BBloom filter
• Parameters:• Let be the set of elements that the Bloom filter indicates are in
o In particular, no false negatives in Bloom filter
• Surprise:
The Bloom filter indicates the membership of
elements. Only of them are indeed in .
Bloom Paradox Example
. .
BBloom filter
• When the Bloom filter states that , it is wrong with probability
• Average cost if we listen to the Bloom filter:
• Average cost if we don’t:
The Bloom filter is useless!
Bloom Paradox Example
11
Don’t listen to the Bloom filter
= =
Outline
Introduction to Bloom Filters The Bloom Paradox
o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter
Summary
12
• The cost of a false positive : 1• The cost of a false negative :
• In the cache example:
Costs of the Two Possible Errors
13
• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter
• Intuition: The Bloom paradox occurs more often when:o is small
Conditions for the Bloom Paradox
14
localcache
Bloom filter
central memory
• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter
• Intuition: The Bloom paradox occurs more often when:o is smallo is large (i.e. is small)
Conditions for the Bloom Paradox
14central memory
localcache
Bloom filter
• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter
• Intuition: The Bloom paradox occurs more often when:o is small o is large (i.e. is small)o is small (because the Bloom filter implicitly assumes )
Conditions for the Bloom Paradox
14
Bloom filtercentral memory
localcache
• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter
• Intuition: The Bloom paradox occurs more often when:o is small o is large (i.e. is small)o is small (because the Bloom filter implicitly assumes )
• Theorem 1:The Bloom paradox occurs if and only if
• Boundaries of the Bloom Paradox: (for )
Conditions for the Bloom Paradox
14
If and the Bloom paradox occurs if
• Theorem 1:The Bloom paradox occurs if and only if
Bloom Filter Improvements
15
• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be
useful
Bloom filtercentral memory
localcache
• Theorem 1:The Bloom paradox occurs if and only if
Bloom Filter Improvements
15
• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be
useful
Bloom filtercentral memory
localcache
Outline
Introduction to Bloom Filters The Bloom Paradox
o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter
Summary
16
1
• Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives.
• The solution: Counting Bloom filters - Storing array of counters instead of bits.o Insertion: Incrementing counters by one.o Deletion: Decrementing counters by one. o Query: Checking that counters are positive.
• The same false positive probability.• Require too much memory, e.g. 57 bits per element for .
Counting Bloom Filters (CBFs)
y+1 +1
0102001010 01
+1 +1x
+1+1
0000001010 00
x11 111
• Queryo Checking that counters are positive.
o Question: Which is more likely to be correct? y or z?
Counting Bloom Filter Query
18
0381052010 12
zy
y
• Theorem 2:Let denote the values of the counters pointed by the
set of hash functions. Then,
19
The Bloom Paradox in the Counting Bloom Filter
Only counters product matters!
• Parameters: n=3328, m = 28485, k=6 20
CBF Based Membership Probability
-Before checking CBF, a priori membership probability = ≈ 0.03-CBF indicates counters product=8 a posteriori membership probability ≈ 0.69
• Internet trace (equinix-chicago) with real hash functions.
Counting Bloom filter parameters: n=210, m / n = 30, k=5, 220
queries
21
Experimental Results
• Discovery of the Bloom paradox
• Importance of the a priori membership probability
• Using the counters product to estimate the correctness of a positive indication of the CBF
Concluding Remarks
22
Thank You