Top Banner
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa
21

A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Jan 04, 2016

Download

Documents

Easter Wells
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

A Formal Analysis of Conservative Update Based Approximate

Counting

Gil Einziger and Roy FreidmanTechnion, Haifa

Page 2: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

We wish to count the number of occurrences of various items from a very large domain.

To gain space efficiency, we are willing to tolerate an “approximate count” only.

Approximate Counting

Page 3: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Bloom Filters• An array BF of m bits and k hash functions {h1,…,hk} over the

domain [0,…,m-1]• Adding an object obj to the Bloom filter is done by computing

h1(obj),…, hk(obj) and setting the corresponding bits in BF• Checking for set membership for an object cand is done by

computing h1(cand),…, hk(cand) and verifying that all corresponding bits are set

m=11, k=3,

1 11

h1(o1)=0, h2(o1)=7, h3(o1)=5

BF=

h1(o2)=0, h2(o2)=7, h3(o2)=4

×

Page 4: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Counting Bloom Filters

• A vector of counters (instead of bits)• A counting Bloom filter supports the operations:

– Increment• Increment by 1 all entries that correspond to the results of the k hash

functions

– Decrement• Decrement by 1 all entries that correspond to the results of the k hash

functions

– Estimate (instead of get)• Return the minimal value of all corresponding entries

m=11

3 68

k=3, h1(o1)=0, h2(o1)=7, h3(o1)=5

CBF=

Estimate(o1)=4

4 9 7

Page 5: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

• Give up the ability to Decrement in favor of accuracy/space efficiency– During an Increment operation, only update the

lowest counters

m=11

3 68

k=3, h1(o1)=0, h2(o1)=7, h3(o1)=5

SBF-MI=

Increment(o1) only addsto the first entry (3->4)

4

Empirically shown to improve accuracy! Up to two orders of magnitude for some workloads. – But not formally understood.

Conservative Update Technique

Page 6: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Motivation

• Applications: – Network messurements and heavy hitters.– Network security: anomaly detection.– Cache admission policy

Additional applications in other fields: e.g. databases and natural language processing.

Page 7: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

TinyLFU - Cache Admission Policy (PDP 2014)

Fre

qu

en

cy

Rank

• The access distribution of most content is skewed▫Often modeled using Zipf-like functions, power-law, etc.

Long Heavy Tail For example~(50% of the weight)

A small number of very popular itemsFor example~(50% of the weight)

Page 8: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Cache Victim

Winner

Eviction and Admission Policies

Eviction Policy Admission Policy

New Item

One of you guys should leave …

is the new item any better than

the victim ?

What is the common Answer ?

Page 9: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

• Conservative Update allows counting just the head items, with high accuracy, so our cache can make educated admission decisions.

Undesired

Desired Items

Conservative Update - Intuition

Page 10: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Admission Policy Example

More memory

Better cache management

Without admission policy

Frequency based admission policy

Cache Size

Hit

Rate

Page 11: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

The Basic Observation

CBF =

LCS =

1 1 1

1 1 1

2 2 2

1 1 1

1

1

If we can quantify how many items are inserted to each level in the LCS we can bound the error.

A CBF is exactly like

Page 12: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Simple Observations• It is useful to discuss the number of items that

are inserted to each level of the LCS.

• Since all levels are considered the same – the false positive probability of each level is determined only by the number of items inserted to that level.

• A false positive at a higher level implies false positive at all lower levels.

Page 13: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

• Known (constant) distribution • Large enough sample– We assume that we can make a ‘characteristic’

histogram.

Formally we know how many items are going to appear every number of times.

The Model

Page 14: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Denote A[i] - the number of items that are actually inserted to level i.• By definition: A min/max argument about the lowest level that could have experienced a false positive yields the following:

Lower Bound

𝐴 [𝑖 ]≥𝐷 [ 𝑖 ]

(𝑃 (𝐹 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑−𝐹ℜ𝑎𝑙 )≥𝑘 )≤𝐹𝑃 (𝐷 [𝑘 ] )

Page 15: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Upper Bound

• Is derived similar by upper bounding A[i]. • Requires a bit further assumptions.

Technical details in the paper.

Page 16: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Accurate Configuration – Uniform

Page 17: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Accurate Configuration – Zipf 1

Page 18: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Inaccurate Configuration – Uniform

Page 19: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Inaccurate Configuration – Zipf 1

Page 20: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Real Trace – Counting TCP packets

Page 21: A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

Summery

• A simple analysis to an extensively used approximate counting optimization.

• First to analyze it for general distributions• Lower and upper bounds on model • Good indicator on real workloads. • An extended version published as tech report.

Thank You