Top Banner
1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University
58

1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

Dec 15, 2015

Download

Documents

Issac Gillard
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

1

A Deterministic Algorithm forSummarizing Asynchronous

Streamsover a Sliding Window

Costas BuschRensselaer Polytechnic Institute

Srikanta TirthapuraIowa State University

Page 2: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

2

Introduction

Algorithm

Analysis

Outline of Talk

Page 3: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

3

1 C

Time

1tData stream:

For simplicity assumeunit valued elements

2t 3t 4t 5t1v 2v 3v

4v 5v

Page 4: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

4

1 C

Current time

Most recent time window of durationW

Compute the sum of elements with time stamps in time window ],[ CWC

Goal:

1tData stream:

2t 3t 4t 5t1v 2v 3v

4v 5v

CtWCi

i

v

Page 5: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

5

Example I: All packets on a network link, maintain the number of different ip sources in the last one hour

Example II: Large database, continuously maintain averages and frequency moments

Page 6: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

6

Synchronous streamti: In ascending order

Asynchronous streamti: No order guaranteed

1tData stream:

2t 3t 4t 5t1v 2v 3v

4v 5v

Page 7: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

7

Why Asynchronous Data Streams?

NetworkSynchronous stream Asynchronous stream

Synchronous

SynchronousAsynchronous

Merge w/o control

Network delay & multi-path routing

Page 8: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

8

Processing Requirements:•One pass processing•Small workspace: poly-logarithmic in the size of data•Fast processing time per element•Approximate answers are ok

Page 9: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

9

Our results:

A deterministic data aggregation algorithm

Time:

W

BOlog

log

Space:

BWWBO

loglogloglog

SSX ||

Relative Error:

Page 10: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

10

Previous Work:

[Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing, 2002]

Deterministic, Synchronous

[Tirthapura, Xu, Busch, PODC, 2006]

Randomized, Asynchronous

Merging buckets

Random sampling

Page 11: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

11

Introduction

Algorithm

Analysis

Outline of Talk

Page 12: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

12

1 C

Current time

Time

1t 2tData stream:

For simplicity assumeunit valued elements

3t 4t 5t 6t

Page 13: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

13

1 C

Current time

Most recent time window of durationW

1t 2tData stream:

3t 4t 5t 6t

Compute the sum of elements with time stamps in time window ],[ CWC

Goal:

Page 14: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

14

1

Divide time into periods of durationW

W W W W W

W W2 W3 W4

Page 15: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

15

1 W W2 W3 W4

The sliding window may span at most two time periods

C

Wsliding window

T

Page 16: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

16

1 W W2 W3 W4C

Wsliding window

leftS rightS

21 SSS

Sum can be written as two sub-sumsIn two time periods

T

Page 17: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

17

1 W W2 W3 W4C

Wsliding window

leftD rightD

Data structure that maintains an estimate ofIn left time period

leftS

TleftS

rightS

Page 18: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

18

1 WT

Without loss of Generality,Consider data structure in time period ],1[ W

leftS

leftD

leftD

Page 19: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

19

leftD

1D

2D

LD

Data structure consists of various levels

L2is an upper bound of the sum in a period

Page 20: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

20

1 W

Counts up to elements 12 i

0Time period

Bucket at Level 1i

Consider level iD

Page 21: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

21

1 W1

Increase counter value

Wt 11Stream:1t

Page 22: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

22

1 W2

Increase counter value

Wt 21Stream:1t 2t

Page 23: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

23

1 W

Wt 31

3

Stream:

Increase counter value

1t 2t 3t

Page 24: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

24

1 W12 1 i

Increase counter value

Wt i 12 11Stream:

1t 2t 3t 12 1it......

Page 25: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

25

Wt i 12 11Stream:

1t 2t 3t 12 1it...... 12 it

1 W

12

W1

2

W W

12 i

i2 i2

Counter threshold of reached 12 i

Split bucket

Page 26: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

26

Wt i 12 11Stream:

1t 2t 3t 12 1it...... 12 it

12

W1

2

W W

i2 i2

New buckets have threshold also 12 i

Page 27: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

27

21

12 1

Wt i

Stream:1t 2t 3t 12 1it...... 12 it

12 1it

12

W1

2

W W12 i i2

Increase appropriate bucket

Page 28: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

28

WtW

i 22 1

2Stream:

1t 2t 3t 12 1it...... 12 it12 1it

12

W1

2

W W12 i 12 i

Increase appropriate bucket

22 1it

Page 29: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

29

Stream:1t 2t 3t 12 1it...... 12 it

12 1it

12

W1

2

W W22 i 12 i

Increase appropriate bucket

22 1it32 1it

21

32 1

Wt i

Page 30: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

30

12

W1

2

W W

12

W

43W

14

3

W W

1x 12 i

i2 i2

1t ......mtStream:

Split bucket

21

2W

tW

m

Page 31: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

31

12

W

12

W

43W

14

3

W W

i2 i2

1t ......mtStream:

1x

Page 32: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

32

12

W

12

W

43W

14

3

W W12 i i2

1t ......mtStream: 1mt

Increase appropriate bucket

1x

43

12 1

Wt

Wm

Page 33: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

33

12

W

12

W

43W

14

3

W W

1x

12 i4x

1t ......mtStream: 1mt

12

W

43W

12

W

43W

85W

18

5

W

i2 i2

......mt

Split bucket

Page 34: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

34

12

W

14

3

W W

1x

4x

1t ......mtStream: 1mt

12

W

43W

85W

18

5

W

i2 i2

......mt

Page 35: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

35

1 W

12

W1

2

W W

12

W

43W

14

3

W W

12

W

43W

85W

18

5

W

12 i

12 i

12 i

1x

4x

2x 3x

122 ik

i x

Splitting Tree

Page 36: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

36

1 W

Leaf buckets of duration 1 are not split any further

1t 11 t2t 12 t

12 i

Max depth =

Wlog

Page 37: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

37

1 W

12 i

The initial bucket may be split intomany buckets

Leaf buckets

Page 38: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

38

1 W

12 i

Due to space limitations we only keep the last buckets

Wa log2

Leaf buckets

Page 39: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

39

1 WT

Suppose we want to find the sum of elements in time period ],[ WT

S

S

Page 40: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

40

1 WT

of splitting threshold

a

a

a

a

12

22

k2

12 k

Consider various levels

S

Page 41: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

41

1 WT

a

a

a

12

22

12 k

First level with a leaf bucketthat intersects timeline

a

k2

S

Page 42: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

42

1 WT

a

k2

Estimate of S:

1x 2x zx

zxxxX 21

Consider buckets on right of timelineaz

S

Page 43: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

43

1 WT

a

a

a

12

22

12 k

First level with a leaf bucketOn right timeline

a

k2

OR

S

Page 44: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

44

Introduction

Algorithm

Analysis

Outline of Talk

Page 45: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

45

Suppose that we use level in order to compute the estimate

12 i

1 WTS

Page 46: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

46

ktStream:

1 bb xx

lt rt

A data element is counted in the appropriate bucket

Consider splitting threshold level12 i

Page 47: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

47

kt

Stream: kt

We can assume that the element is placed in the respective bucket

lt rt

rkl ttt

Page 48: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

48

Stream: kt

We can assume that when bucket splits the element is placed in an arbitrary child bucket

lt rt

lt rtkt

2rl tt

12

rl tt

12 i

i2i2

Page 49: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

49

Stream: kt

lt rt

lt rtkt

2rl tt

12

rl tt

12 i

i2 i2

2rl

kl

tttt

If: GOOD!

Element counted in correct bucket

Page 50: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

50

Stream: kt

lt rt

lt rt2rl tt

12

rl tt

12 i

i2 i2

rkrl tt

tt

1

2If: BAD!

Element counted in wrong bucket

kt

Page 51: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

51

1 WT

ktConsider Leaf Buckets

If WtT k

1 W

GOOD!

S

Page 52: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

52

1 WT

ktConsider Leaf Buckets

If Ttk

1 W

BAD!

Element counted in wrong bucket

S

Page 53: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

53

|||| 21 ZZSX

:elements of left part counted on right

1 WT

ktConsider Leaf Buckets

1 W

S

1Z

2Z :elements of right part counted on left

Page 54: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

54

kt

1 W

kt1Z

elements of left part counted on right

T1 W

Must have been initially inserted in one of these buckets

Page 55: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

55

Since tree depth Wlog

)log2(|| 1 WOZ i

Page 56: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

56

Since tree depth Wlog

)log2(|| 1 WOZ i

Similarly, we can prove

)log2(|| 2 WOZ i

Therefore: )log2(|||||||| 21 WOZZSX i

Page 57: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

57

It can be proven

Wa log2

Since

)log2( WS i

Page 58: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

58

It can be proven

Wa log2

Since

Combined with )log2(|| WOSX i

)log2( WS i

We obtain relative error : S

SX ||