Top Banner
22/6/16 CIKM'04 1 Evaluating Window Joins over Punctuated Streams Many slides taken from talk by Luping Ding and Elke A. Rundensteine r, CIKM04 Database Systems Research Group Worcester Polytechnic Institute
29

Evaluating Window Joins over Punctuated Streams

Jan 01, 2016

Download

Documents

scott-holcomb

Evaluating Window Joins over Punctuated Streams. Many slides taken from talk by Luping Ding and Elke A. Rundensteiner, CIKM04 Database Systems Research Group Worcester Polytechnic Institute. Stream Data Processing. Online Transaction Management. Sensor Network Monitoring. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 1

Evaluating Window Joins over Punctuated Streams

Many slides taken from talk byLuping Ding and Elke A. Rundensteiner, CIKM04

Database Systems Research GroupWorcester Polytechnic Institute

Page 2: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 2

Stream Data Processing

RegisterContinuous Queries

Stream QueryEngine

Stream QueryEngine

Streaming Data Streaming Result

• Network Usage Analysis

• Online Transaction Management • Sensor Network Monitoring

• Online Auction

Page 3: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 3

New Challenges in Stream Context Potentially infinite data streams vs. stateful ope

rators. e.g., join, distinct, …

Problem: potentially unbounded state Reason: no hint on which data is no longer use

ful

Page 4: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 4

Example -Symmetric Hash Join [WA93]

Memory overflow resolution – state relocation Example: XJoin [UF00],

Hash-Merge Join [MLA04] Problems

Join state still grows with no bound

Delivery of some join results may be highly deferred

A B

insert probe

MemoryMemoryOverflowSA SB

Page 5: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 5

Avoiding Unbounded State

Solution: exploit constraints to detect no-longer-useful data

Sliding window [MWA+03] Identify a bounded set of input data based on time

K-constraint [BW03] Models clustered or ordered data arrival pattern

Punctuation [TMSF03] Dynamically announce termination of certain value

Page 6: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 6

Sliding Window [KNV03]

Wb

Timeline

Wa

Stream A Stream B

… …

Page 7: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 7

Punctuation

Meta-knowledge embedded inside data streams An ordered set of patterns corresponding to attributes of tuples Wildcard (*), constant (9), list ({1,2,3}), range ([1, 20]), empty ()

Semantics: tuples after a punctuation p will NOT match p

No more tuplewill containItem_id 180.

180 Marlie 820.00 Nov-13-03 11:02:00

182 Ultrasale 1000.00 Nov-13-03 11:05:00

180 Jocelyn 850.00 Nov-13-03 11:14:00

180 * * *

181 pcfan 50.00 Nov-13-03 11:36:00

Bid

Page 8: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 8

Punctuation-Aware Join [DMR+04]

Joinitem_id

Stream BStream A

181 50.00

175 20.00

180 135.00

175 *

158 310.00

… …

2 63.00

175 80.00

1 200.00

A C

… … … …

No more tuple will have A = 175.

175 100.00

… …

A B

175 80.00

175 100.00

175 20.00

SA SB

Page 9: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 9

Features of Punctuation

Purge rule. For any tuple ta from stream A, if there exists a punctuation Pb that has already been received from stream B such that match (ta, ,,Pb), ta will not be joining with any future arriving tuples from stream B. ta doesn’t need to be maintained in the A state after being processed.

Propagation rule. The join operator can also propagate punctuations to the output stream in order to help do

wnstream operators.

Page 10: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 10

Based on punctuation semantics, we derive the following theorem as the foundation of our punctuation propagation algorithm.

Theorem 3.1. Let pa and pb be punctuations retrieved from streams A and B at time TSa and TSb respectively specifying the same punctuated value val of join attribute att. Then no output tuples with val being the value of attribut

e att will be generated after time max(TSa, TSb).

Page 11: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 11

Sliding Window Join

Suppose Ta and Tb are time windows for streams A and B respectively. We define the invalidation rule from the join state based on the sliding window:

Let tuple ta be the latest tuple with timestamp TS

a from stream A that has been processed.The tuple in the B state with timestamp TSb such that TSb + Tb < TSa is called a time-expired tuple and can be invalidated. The same invalidation rule applies to tuples in the A state.

Page 12: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 12

Tb

Ta

Stream A Stream B

……

TSb

TSa

TSb-Ta

TSa-Tb

timeline

Basic Window join

Page 13: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 13

Optimization Opportunities Maintain smaller state than either pure window join or pu

re punctuation-exploiting join Bid tuples that have been joined don’t need to be m

aintained in state (Punctuation)

Drop tuples without affecting precision of result Bid tuples out of 24-hour window of corresponding Au

ction tuple don’t need to be processed Aggregate result for some Auction tuples can be produce

d in less than 24 hours

Page 14: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 14

Features of PWJoin algorithm

Punctuation-exploiting Window Join is composed of three operations:

Probing state to find matching tuples for producing join results.

Purging no-longer-joining tuples by punctuations. Invalidating expired tuples by windows. Among these op

erations.

Page 15: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 15

SELECT A.item_id, Count (*) FROM Auction [Range 24 Hours] A, Bid B WHERE A.item_id = B.item_id GROUP BY A.item_id

Window and Punctuation Occur Simultaneously

Joinitem_id

Auction Stream

Bid Stream Out1

(item_id)

Group-byitem_id (count(*))

Out2

(item_id, count)

Contains punctuations on

item_id

Applies a 24-hour window on Auction

stream

Page 16: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 16

PWJoin Basics and Issue

Issue: how to design PWJoin state to facilitate all search-based operations? Invalidate conducts time-based search Probe and Purge needs value-based search

Receive a new tupleta from stream A

Invalidate tuplesfrom B state

Probe B stateInsert ta

into A state

Receive a new punctpa from stream A

Purge tuplesfrom B state

Insert pa

into A state

Page 17: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 17

PWJoin State with Two-dimensional Index

8

10

8

8

10

4

8

Key Head Tail PunctFlag

8 none

Time List I-Node Index (Hash Table)

WindowBegin

10 punctuated

WindowEnd

I-Node

tuple

NextTimeListTNode

NextValueListTNodeT-Node

Punctuation Timestamp

p1 T1

p2 T2

… …

Punctuation Time List

Page 18: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 18

PWJoin AlgorithmInvalidate: Once a new tuple t is retrieved from stream A, its timestamp is used to invalidate expired tuples from the head of the time list of stream B. Probe: probe I-Node index and join with tuples in value list of matching I-Node. After invalidation is done, the join value of t is used to probe the I-Node index of the B state. If the matching I-Node iNode is found, the corresponding value list is located by following the Head pointer of iNode. Tuple t then joins with all tuples in this value list by following the NextValueListTNode pointer of each T-Node. Finally, the PunctFlag of iNode is checked. If it is “punctuated”, t is discarded. If it is “none”, t is inserted into the A state.

Page 19: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 19

PWJoin AlgorithmPurge: probe I-Node index and delete tuples in value list of matching I-Node. When a new punctuation p is retrieved from stream A, p is used to probe the I-Node index of the B state. If the matching I-Node iNode is found, all tuples in the corresponding value list are deleted. iNode is removed from the I-Node index as well. If the PunctFlag of iNode is “punctuated”, p is discarded. If iNode is not found or iNode’s PunctFlag is “none”, p is used to probe the I-Node index of the A state and set the PunctFlag of the matching I-Node iNodea as “punctuated”.If iNodea does not exist, a new I-Node is created with its PunctFlag marked as true and inserted into the I-Node index of the A state.

Page 20: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 20

Punctuation Propagation [CIKM04] An operator may propagate punctuations to

benefit downstream operators

Joinitem_id

Auction Stream

Bid Stream

Group-byitem_id (count(*))

be unblocked by punctuations propagated by join o

perator

Item_id Bidder_id Bid_price

180 * *propagate punctuations on ite

m_id

Page 21: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 21

Early Punctuation Propagation

Optimizations Enabled by Combined Constraintsa1

a1

a2

a3

a4

a3

a1

a3

a6

a3

a6

a3

a3

a7

a2

a8

a2

a10

Stream S1 Stream S2

a3 propagation point 1

propagation point 2

Tuple Dropping

a1

a1

a2

a3

a4

a3

a1

a3

a6

a3

a6

a3

a3

a7

a2

a8

a2

a10

Stream S1 Stream S2

a3

Page 22: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 22

Achieving Optimizations by Combined Constraints Early propagation

Invalidate punctuations in punctuation time list as invalidating tuples

Expired punctuations can be propagated Tuple dropping

When early propagation happens, set PunctFlag of matching I-Node as “propagated”

Drop new tuples that matches an I-Node whose PunctFlag is “propagated”

Page 23: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 23

Memory Cost Analysis

|Sb|T = |Sb|Tinsert - |Sb|Tpurge = |Sb|Tarrive - |Sb|Tpurge

= bTb - bTb( paT/NKb,T)

b – tuple input rate of stream B

pa – punctuation input rate of stream A

NKb,T - # of distinct join values occurred in stream B up to T’th time unit

Tb – time window on stream B

Window Join Saving by Punctuation

Page 24: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 24

PWJoin vs. WJoin – Memory and Tuple Output Rate

0

500

1000

1500

2000

2500

1 4 7 10 13 16 19 22 25 28 31

Sampl i ng Step (per 2 seconds)

# of

Tup

les

in J

oin

Stat

e

WJ oi n- 1PWJ oi n- 1WJ oi n- 5PWJ oi n- 5WJ oi n- 15PWJ oi n- 15

0

100000

200000

300000

400000

500000

600000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Sampling Step (per 2 seconds)

# o

f T

up

les

Ou

tpu

t

WJoin-5

PWJoin-5

WJoin-15

PWJoin-15

Stream A, B: punct-asc-100-40

Page 25: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 25

PWJoin vs. PJoin – Punctuation Output Rate

0

100

200

300

400

500

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33

Sampl i ng Step (per 1 second)

# of Pu

nctua

tions O

utput PJ oi n

PWJ oi n

Stream A: punct-asc-100-40, Stream B: punct-random-30-40Window: 1 second

Page 26: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 26

Conclusion

PWJoin algorithm Designed storage structure for PWJoin state Memory cost analysis of PWJoin

Page 27: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 27

Thanks

WPI Database Research Group

many slides are from davis.wpi.edu/~dsrg/CAPE/slides

Page 28: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 28

References [CIKM04], L. Ding and E.A. Rundensteiner. Evaluating Window Joins over Punctuated Streams. CIK

M04. [KNV03] J. Kang, J. F. Naughton and S. D. Viglas. Evaluating Window Joins over Unbounded Stream

s. ICDE’03. [UF00] T. Urhan and M. Franklin, XJoin: A Reactively Scheduled Pipelined Join Operator. IEEE Data

Engineering Bulletin, 23(2), 2000. [HH99] P. Haas and J. Hellerstein, Ripple Joins for Online Aggregation. SIGMOD’99. [GO03] L. Golab and M. T. Ozsu, Processing Sliding Window Multi-Joins in Continuous Queries over

Data Streams. VLDB’03. [GGO04] L. Golab, S. Garg and M. T. Ozsu, On Indexing Sliding Windows over On-line Data Streams,

EDBT’04. [RDS+04] E. A. Rundensteiner, L. Ding, T. Sutherland, Y. Zhu, B. Pielech and N. Mehta, CAPE: Conti

nuous Query Engine with Heterogeneous-Grained Adaptivity. VLDB Demo, 2004. [BW04] S. Babu and J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous

Queries over Data Streams [TMS+03] P. A. Tucker, D. Maier, T. Sheard and L. Fegaras. Exploiting Punctuation Semantics in Con

tinuous Data Streams. TKDE, 15(3), 2003. [DMR+04] L. Ding, N. Mehta, E. A. Rundensteiner and G. T. Heineman, Joining Punctuated Streams.

EDBT’04. [MWA+03] R. Motwani, J. Widom, A. Arasu et al. Query Processing, Resource Management, and App

roximation in a Data Stream Management System. CIDR’03.

Page 29: Evaluating Window Joins over Punctuated Streams

23/4/19 CIKM'04 29

Thanks!