Top Banner
Improving Sketch Reconstruction Accuracy Using Linear Least Squares Method Gene Moo Lee , Huiya Liu, Young Yoon, Yin Zhang University of Texas at Austin [email protected] IMC 2005, Berkeley, CA, USA
20

Improving Sketch Reconstruction Accuracy

Jul 05, 2015

Download

Science

Gene Moo Lee
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Improving Sketch Reconstruction Accuracy

Improving Sketch Reconstruction Accuracy Using Linear Least Squares Method

Gene Moo Lee, Huiya Liu, Young Yoon, Yin ZhangUniversity of Texas at Austin

[email protected]

IMC 2005, Berkeley, CA, USA

Page 2: Improving Sketch Reconstruction Accuracy

IMC’05

Roadmap

●Introduction to Sketch●Problem Definition●Our Approach●Evaluation – Accuracy, Tolerance●Conclusion and Future work

Page 3: Improving Sketch Reconstruction Accuracy

IMC’05

Sketch: a data structure

● Sketch is a “lossy” data structure, which is used to summarize massive data streams○ Avoid per-flow state maintenance○ Using constant memory ○ With small number of memory access

● We can use sketch for○ Heavy-hitter detection, Usage-based Pricing,

Bandwidth Provisioning, DoS attack detection

Page 4: Improving Sketch Reconstruction Accuracy

IMC’05

Sketch: a data structure

1

j

H

0 1 K-1…

Update (key, value): Tj [ hj(k)] += u (for all j)

Say we’ve got an update of (key k, value u)

= hj(k’)

hH(k’)

h1(k’)

hj(k)

hH(k)

h1(k)

Page 5: Improving Sketch Reconstruction Accuracy

IMC’05

Point Estimation

Point Estimation : key • value of the key?

Nontrivial because of collisions!

1

j

H

0 1 K-1…

= hj(k’)

hH(k’)

h1(k’)

hj(k)

hH(k)

h1(k)

Page 6: Improving Sketch Reconstruction Accuracy

IMC’05

Point Estimation

hj(k)

hH(k)

h1(k)

[5] Countmin : key • minj { Tj [ hj(k)] }

Can we do better than this?

1

j

H

0 1 K-1…

= hj(k’)

hH(k’)

h1(k’)

hj(k)

hH(k)

h1(k)

take min

Page 7: Improving Sketch Reconstruction Accuracy

IMC’05

Our Approach: lsquare

●Say we have a sketch and a set of keys○We want to accurately estimate the

accumulated values of those keys

● Construct a linear system Ax=b, based on the information sketch provides

● Find the optimal solution using least squares method [10, 13]

Page 8: Improving Sketch Reconstruction Accuracy

IMC’05

An example: constructing a sketch

●A sketch with H=2, K=3○ H1(j) = j mod 3, H2(j) = (j XOR 3) mod 3

●Total update values for keys○ U0 = 5, U1 = 4, U2 = 3, U3 = 9, U4 = 16

Page 9: Improving Sketch Reconstruction Accuracy

IMC’05

An example: building a linear system

●Now, we want to reconstruct the values of key 3 and 4

X3 + Y = 14, X4 + Y = 20, Y = 3

X3 + Y = 14, X4 + y = 19, Y = 4

Here, y is a variable to capture noise effect

Page 10: Improving Sketch Reconstruction Accuracy

IMC’05

An example: solving the linear system

lsquare:

X3 = 10.5

X4 = 16

countmin:

X3 = min{14, 14} = 14

X4 = min{20, 19} = 19

answer:

U3 = 9

U4 = 16

Page 11: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation - data sets

May 2002 [Bell02]

Feb 2004 [Tera04]

IP addresses with traffic amounts

Page 12: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation – lsquare vs countmin

X axis = Top 50 hitters

Y axis = Relative error

Lsquare vs Countmin

Lsquare is more accurate than countmin

Page 13: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation – Accuracy with Light hitters

X axis = Top 200 hitters

Y axis = Traffic amounts

Actual

Countmin vs Lsquare

Lsquare has good accuracy even for

“light” hitters

Page 14: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation – Multiple noise variables

X axis = Top 20 hitters

Y axis = Relative error

# of noise variable:

1 vs 31 vs 181

We can get better accuracy using more

noise variables

Page 15: Improving Sketch Reconstruction Accuracy

IMC’05

X axis = sketch config

Y axis = avg relative error

Lsquare vs Countmin

Lsquare is tolerant with limited memory

sketch

Evaluation – Tolerant with limited memory

Page 16: Improving Sketch Reconstruction Accuracy

IMC’05

Conclusion

●We propose a new method for point estimation in sketch data structure○ More accurate!○ Tolerant with small-sized sketch

●Future Direction○ Applying statistical inference in data streaming

Page 17: Improving Sketch Reconstruction Accuracy

IMC’05

Q&A

Thank you for your attention!

Questions?

Contact Info: [email protected]

Page 18: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation - Time Complexity

●In the experiment, it took just 1~5 seconds to do lsquare○ Time is a function of number

of heavy hitters, which is relatively small number

●Lots of room to further speedup○ exploiting scarcity

Page 19: Improving Sketch Reconstruction Accuracy

IMC’05

How to get the set of keys

● Countmin only computes the value of a single key individually, but we try to find values of a “set” of keys

● Set of keys can be obtained by

○ maintaining a priority queue

○ using reversible sketch

Page 20: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation – Error Metric

We use a relative error metric and the average of it

n: # of IPs

Uest = estimation

U = real value