The Impact of Duality The Impact of Duality on Data Synopsis Problems on Data Synopsis Problems Panagiotis Karras Panagiotis Karras KDD, San Jose, August 13 th , 2007 work with Dimitris Sacharidis and Nikos Mamoulis work with Dimitris Sacharidis and Nikos Mamoulis
23
Embed
The Impact of Duality on Data Synopsis Problems Panagiotis Karras KDD, San Jose, August 13 th, 2007 work with Dimitris Sacharidis and Nikos Mamoulis.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Impact of DualityThe Impact of Dualityon Data Synopsis Problemson Data Synopsis Problems
Panagiotis KarrasPanagiotis KarrasKDD, San Jose, August 13th, 2007
work with Dimitris Sacharidis and Nikos Mamouliswork with Dimitris Sacharidis and Nikos Mamoulis
IntroductionIntroduction• Data synopsis problems require the
optimization of error under a bound on space.• Classical approaches treat them in a direct
manner, producing complicated solutions, and sometimes resorting to heuristics.
• Parameters involved have a monotonic relationship.
• Hence, an alternative approach is possible, based on the dual, error-bounded problems.
OutlineOutline• Histograms.• Restricted Haar Wavelet Synopses.• Unrestricted Haar and Haar+ Synopses.• Experiments.• Conclusions.
HistogramsHistograms• Approximate a data set [d1, d2, …, dn] with B buckets,
si = [bi, ei, vi] so that a maximum-error metric is minimized.
• Classical solution: Jagadish et al. VLDB 1998 Guha et al. VLDB 2004, Guha VLDB 2005 ijbjEbiE
ij,1,1,maxmin,
1
nnBO 2log
• Recent solutions: Buragohain et al. ICDE 2007
Guha and Shim TKDE 19(7) 2007 For weighted error:
Liner for:
Bn
UnnO loglog
nBnO 32 log
n
nB
3log 199824,741,073,1230 Bn
nBnnO 62 loglog
HistogramsHistograms
• Solve the error-bounded problem.
Maximum Absolute Error bound ε = 2
4 5 6 2 15 17 3 6 9 12 …
[ 4 ] [ 16 ] [ 4.5 ] […
• Generalized to any weighted maximum-error metric.
Each value di defines a tolerance interval
Bucket closed when running intersection of interval becomes null
Complexity:
ii
ii w
dw
d
,
nO
HistogramsHistograms
• Apply to the space-bounded problem.
Perform binary search in the domain of the error bound ε
Complexity: *lognO
For error values requiring space , with actual error , run an optimality test:BB
Error-bounded algorithm running under constraint instead oferror error
If requires space, then optimal solution has been reached.BB ~error
Independent of buckets B
34 16 2 20 20 0 36 16
0
18
7 -8
9 -9 1010 25 11 10 26
Restricted Haar Wavelet Restricted Haar Wavelet Synopses Synopses
• Select subset of Haar wavelet decomposition coefficients, so that a maximum-error metric is minimized.