Top Banner
Wavelet decomposition of data streams by Dragana Veljkovic
34

Wavelet Decomposition

Apr 07, 2018

Download

Documents

Yadava Jaffa
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 1/34

Wavelet decomposition of

data streams

by Dragana Veljkovic

Page 2: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 2/34

Motivation

• Continuous data streams arise naturally in:• telecommunication and internet traffic

• retail and banking transactions

• web server log records etc.

• Many applications need this data to be

processed on a 24*7 basis in only one

pass

Page 3: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 3/34

Motivation cont.

• Usually this data is accumulated andarchived for later use, but not always (e.g.network security)

• The ability to make decisions and interpretinteresting patterns online can be crucialand has real dollar value for large

corporations (e.g. fraud detection)

Page 4: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 4/34

Our motivation

• Currently working on data collected from100 electrodes receiving electricalpotential of monkey brain over long

periods of time

• We want to look at this data in real timeand seek patterns, trends and surprises

Page 5: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 5/34

Outline

• Background• streams

• wavelets

• sketches• error analysis

• Results

• Implementation details

• Strengths and weaknesses of thisapproach

Page 6: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 6/34

Data streams

• Sequence of unbounded, real time datawith high rate that can only be read onceby an application

• Problems:• Unbounded memory requirements

• High data rate

Page 7: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 7/34

Underlying signal

• Signal is one dimensional function

a: [0, …, N-1] ? Z+

• Data item that arrives in time is an ordered pair:

<domain, value>

Example : voting results

<Texas, 60>

Example : phone call records

<210-748, 12>

Page 8: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 8/34

Data model

Two different data models used for rendering theunderlying signal:

• Cash register

• Aggregate

Example : cash register model<210-748,10>, <210-689,13>, <210-748, 20>, <210-740, 5>,<210-748, 2>, <210-740, 30>…

where the underlying signal is<210-748, 32>, <210-689, 13>, <210-740, 35>

Page 9: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 9/34

Stream format

Two distinct formats for the stream – Ordered

 – Unordered

Example : Aggregate ordered stream – any time series

Example : Unordered cash-register stream – phone callrecords

Ordered cash-register is trivial to convert to orderaggregate

Page 10: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 10/34

Wavelets

• Basis functions of limited duration and averagevalue of zero

• Basis functions are shifted and scaled versions

of the original wavelet

Page 11: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 11/34

Discrete wavelet transform

• Uses only fixed values forwavelet scales based onpowers of two

• Wavelet positions are also

fixed and non overlapping• Wavelets form a set of wavelet

basis vectors of length N

Example: Haar wavelets onsignal of length N = 8• j = 1,…, logN levels

• k = 0,…, 2 j-1 spaces for eachlevel

Haar wavelets for signal of size 8

Page 12: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 12/34

Wavelet decomposition

• Wavelet decomposition can be regarded as projection ofthe signal on the set of wavelet basis vectors

• Each wavelet coefficient can be computed as the dotproduct of the signal with the corresponding basis vector

Example: 

Table 1. from Gilbert et al. 2003.

Page 13: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 13/34

Best B-term decomposition

• The signal can be fully recovered from the waveletdecomposition

• Best B-term decomposition uses only a small number of

coefficients, B, that carry the highest energy

• The signal reconstructed using the B-term coefficientsand the corresponding vectors is called the best B-termapproximation

• Most signals that occur in nature can be wellapproximated using only a small number of coefficients(5-10).

Page 14: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 14/34

Computing best B-term

decomposition in runtimeFor the ordered aggregate model

• Maintain two sets of items• Highest B wavelet basis coefficients for the signal seen so far

• logN straddling coefficients, one for each level

• When the data item is read the affected straddlingcoefficients get updated.

• If a coefficient is no longer straddling it is compared toexisting highest B coefficient and the set is updated ifnecessary. New straddling coefficient is initialized.

• Takes O(B + logN) storage and time for the orderedaggregate model

Page 15: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 15/34

Sketches

• Sketch is made by projecting a signal ontoseveral different low dimensional spaceswhich are chosen at random

• Many properties of the signal, such ashistograms, can be accurately estimatedby looking at the sketch

Page 16: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 16/34

Definition of a sketch

• Atomic sketch of signal a is the dotproduct <a, r> where r is a random vectorof ±1 valued random variables

• A sketch of a signal is k independentatomic sketches, each with a differentrandom vector r j

• Sketch size is small compared to the

signal size

Page 17: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 17/34

Sketches

• Maintaining the sketch is easy as we arereceiving the data

• If element <i, a(i)> arrives, add a(i)*ri j to

the sketch corresponding to random vectorr j

Example : In cash-register receive <5, 10>,need to add 10* r5

 j to each atomic sketchcorresponding to the random vector r j

Page 18: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 18/34

Error metrics

• SSE (sum squared error) – if R is a representation of the

signal a then SSE is defined as

• Pseudoenergy of the representation R is computed as

Page 19: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 19/34

Query processing

• Batched – queries are posed at certain

periodic intervals

• Ad hoc – a query may be posed at anytime

Page 20: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 20/34

Batch query using best B-termapproximation for day 0 of call records

Figure 2. from Gilbert et al. 2003.

Page 21: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 21/34

Batch query using best B-term approximationfor all 7 days of call records

Figure 3. from Gilbert et al. 2003.

Page 22: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 22/34

Estimating a point query

Answer to point query i is a(i)

• Direct point estimate – directly estimating a(i)using the sketch

• Direct wavelet estimate – use the sketch toestimate the wavelet coefficients whose supportintersects i and reconstruct a(i) using thesecoefficients

• Another way is to compute a(i) using only thehigh wavelet coefficients (like the known B-termapproximation) whose support intersects a(i)

Page 23: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 23/34

Using sketches to estimate dot

product

• Following parameters characterize how well thesketch does

• e – distortion parameter

• d – failure probability

• ? – failure threshold

• Sketch of a signal is independentatomic sketches, each with a different random

vector• If the cosine between vectors a and b is greater

than ? we estimate the dot product within (1±e)with probability at least 1- d

Page 24: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 24/34

Sketches and random vectors

• If element <i, a(i)> arrives, add a(i)*ri j to the

sketch corresponding to random vector r j

• In order to use the sketches we need to get theelements r j quickly.

• r j is of size N, it can not be stored explicitly

Page 25: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 25/34

Generating random vectors

• The paper shows that ri j can be generated

by a pseudorandom number generatorusing a seed s j of size logO(1)N

• Generator G is based on second orderReed-Muller codes

• The generator G takes s j and i and outputs

ri j = G(s j, i) quickly

Page 26: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 26/34

Estimation of dot products using

sketches

Lemma: Lemma: Let X be aLet X be a O(logNO(logN /  / dd))--wise median of O(1/ wise median of O(1/ 

ee22))--wise means of independent copies ofwise means of independent copies of

then we have with probability of 1then we have with probability of 1-- dd

Note Note : use b=a to estimate energy of a using this: use b=a to estimate energy of a using this

lemmalemma

Page 27: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 27/34

Example :

Want to estimate dot product of vectors a

and b with no more than 30% error withprobability of 80%, assuming the cosinebetween these two vectors is greater then

0.25

That is e = 0.3, ? = 0.25 and d = 0.2 and

for a signal of size N=1024 we would needabout 30 atomic sketches

Page 28: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 28/34

Theorem

There is a streaming algorithm, A, such that, given a signal a[1,…, N]with energy ||a||2

2 if there is a B-term representation with energy atleast ?*||a||2

2, then, with probability at least (1-d) A finds arepresentation of at most B terms with pseudoenergy at least (1-e)?*||a||2

2. If there is no such B-term representation with energy ?*||a||22,

A reports “no good representation”. In any case A uses

space and per item time while processing the stream. This holds withboth aggregate and cash-register models

Example : take ?=0.3, d=0.2, e=0.3 and B=10. Then if there exists a10 terms representation of the signal that captures at least 30% of thesignal’s energy the algorithm will output a 10 term representation withenergy at least 21% of the signal with 80% probability

Page 29: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 29/34

Strengths and weaknesses

• Good example how to work with cash-register models

• Shows several ways to estimate the signalusing a sketch

• Time requirements seem higher than thepaper claims

• On-line algorithms do not seem aspromising as batch algorithms

Page 30: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 30/34

References

1. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan and M. J. Strauss, "One-pass wavelet decomposition of data streams," IEEE transactionson knowledge and data engineering, Vol. 15, No. 3, May/June2003.

2. A. C. Gilbert, Y. Kotidis, S. Muthukrishnan and M. J. Strauss,"Surfing wavelets on streams: one-pass summaries forapproximate aggregate queries," Proceedings of the 27th VLDBConference, Roma, Italy 2001.

3. A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan andM. J. Strauss, "Fast, small-space algorithms for approximatehistogram maintenance," STOC ’02, May 19- 21, 2002, Montreal,Quebec, Canada.

Page 31: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 31/34

Answering queries on-line

Comparison of sse/energy of top –B wavelets against direct estimates

Table 1. from Gilbert et al. 2003.

Table 2. from Gilbert et al. 2003.

Page 32: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 32/34

Direct estimates for the top 10 heavy

hitters

Figure 6. from Gilbert et al. 2003.

Page 33: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 33/34

Direct estimates for the top 10 heavy

hitters using the greedy algorithm

Figure 7. from Gilbert et al. 2003.

Page 34: Wavelet Decomposition

8/4/2019 Wavelet Decomposition

http://slidepdf.com/reader/full/wavelet-decomposition 34/34

Adaptive greedy pursuit for heavy

hitters• Obtain a very accurate estimate for the first heavy hitter

• Get a new sketch by subtracting this value from theoriginal sketch. This can be done because sketches arelinear

• New sketch is a good estimation of the residualdistribution in which the second heavy hitter is the peakvalue

• Use the new sketch to estimate the second heavy hitter

• Repeat procedure for more heavy hitters• Each estimate introduces an error and after manyiterations the errors tend to overwhelm the benefits