1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena Razzaq
Jan 07, 2016
1
XJoin: Faster Query Results Over Slow And Bursty Networks
IEEE Bulletin, 2000by T. Urhan and M Franklin
Based on a talk prepared by Asima Silva & Leena Razzaq
2
Motivation
Data delivery issues in terms of:– unpredictable delay from some remote data sources– wide-area network with possibly communication links,
congestion, failures, and overload Goal:
– Not just overall query processing time matters– Also when initial data is delivered– Overall throughput and rate throughout query process
3
Overview
Hash Join History 3 Classes of Delays Motivation of XJoin Challenges of Developing XJoin Three Stages of XJoin Handling Duplicates Experimental Results
4
Hash Join
Only one table is hashed
key2 R tuples
key1 R tuples
key3 R tuples
key4 R tuples
Key5… R tuples…
S tuple 1
S tuple 2
S tuple 3
S tuple 4
S tuple 5….
1. BUILD
2. Probe
5
Hybrid Hash Join
One table is hashed both to disk and memory (partitions) G. Graefe. “Query Evaluation Techniques for Large Databases”, ACM1993.
Disk
Bucket i
Bucket i+1
Bucket i+2
Bucket …
Bucket j-1
Bucket j
R tuples
R tuples
R tuples
R tuples
R tuples
R tuplesBucket n
Bucket n+1
Bucket n+2
Bucket …
Bucket m-1
Bucket m
R tuples
R tuples
R tuples
R tuples
R tuples
R tuples
Memory
S tuple 1
S tuple 1
S tuple 1
S tuple 1
S tuple …
6
Symmetric Hash Join (Pipeline)
Both tables are hashed (both kept in main memory only) Z. Ives, A. Levy, “An Adaptive Query Execution”, VLDB 99
Source R
OUTPUT
Source S
Key n
Key n+1
Key n+2
Key …
Key m-1
Key m
R tuples
R tuples
R tuples
R tuples
R tuples
R tuples
BUILD
PROBE
R tuple S tuple
Key i
Key i+1
Key i+2
Key …
Key j-1
Key j
S tuples
S tuples
S tuples
S tuples
S tuples
S tuples
BUILD
PROBE
R tuple S tuple
7
Problem of SHJ:
Memory intensive : – Won’t work for large input streams– Wont’ allow for many joins to be processed in a
pipeline (or even in parallel)
8
New Problems: Three Delays
– Initial Delay First tuple arrives from remote source more slowly than
usual (still want initial answer out quickly)
– Slow Delivery Data arrives at a constant, but slower than expected
rate (at the end, still overall good throughput behavior)
– Bursty Arrival Data arrives in a fluctuating manner (how to avoid sitting
idle in periods of low input stream rates)
9
Question:
Why are delays undesirable?– Prolongs the time for first output– Slows the processing if wait for data to first be
there before acting– If too fast, you want to avoid loosing any data– Waste of time if you sit idle while no data is
incoming– Unpredictable, one single strategy won’t work
10
Challenges for XJoin
Manage flow of tuples between memory and secondary storage (when and how to do it)
Control background processing when inputs are delayed (reactive scheduling idea)
Ensure the full answer is produced Ensure duplicate tuples are not produced Both quick initial output as well as good
overall throughput
11
Motivation of XJoin
Produces results incrementally when available– Tuples returned as soon as produced– Good for online processing
Allows progress to be made when one or more sources experience delays by:– Background processing performed on previously
received tuples so results are produced even when both inputs are stalled
12
Stages (in different threads)
M :M
M :D
D:D
13
Tuple B
hash(Tuple B) = n
SOURCE-B
Memory-resident partitions of source B
SOURCE-A
D I S K
M E
M O
R Y 1
. . . . . . nn1
Memory-resident partitions of source A
1
. . . . . . . . . . . . n
1
Disk-residentpartitions of source A
. . . n
Disk-residentpartitions of source B
. . . . . .1 nk
k
flu
sh
Tuple A
hash(Tuple A) = 1
XJoin
14
1st Stage of XJoin
Memory - to - Memory Join Tuples are stored in partitions:
– A memory-resident (m-r) portion – A disk-resident (d-r) portion
Join processing continues as usual:– If space permits, M to M– If memory full, then pick one partition as victim, flush to disk
and append to end of disk partition 1st Stage runs as long as one of the inputs is producing
tuples If no new input, then block stage1 and start stage 2
15
M E
M O
R Y
Partitions of source B
. . . . . . . . .i j
SOURCE-B
hash(record B) = j
Tuple B
SOURCE-A
Tuple A
hash(record A) = i
i j
Partitions of source A
. . . . . . . . .
Output
Insert Probe InsertProbe
1st Stage Memory-to-Memory Join
16
Why Stage 1?
• Use Memory as it is the fastest whenever possible
• Use any new coming data as it’s already in memory
• Don’t stop to go and grab stuff out of disk for new data joins
17
Question:
– What does Second Stage do?– When does the Second Stage start?– Hints:
Xjoin proposes a memory management technique What occurs when data input (tuples) are too large for
memory?
– Answer: Second Stage joins Mem-to-Disk Occurs when both the inputs are blocking
18
2nd Stage of XJoin
Activated when 1st Stage is blocked Performs 3 steps:
1. Chooses the partition according to throughput and size of partition from one source
2. Uses tuples from d-r portion to probe m-r portion of other source and outputs matches, till d-r completely processed
3. Checks if either input resumed producing tuples. If yes, resume 1st Stage. If no, choose another d-r portion and continue 2nd Stage.
19
Output
i. . . . . . .. . . . . . .i. . . . . . .. . . . . . .
M E
M O
R Y
Partitions of source BPartitions of source A
D I
S K
Partitions of source BPartitions of source A
ii . . . . .. . . . .. . . . .. . . . .
DPiA MPiB
Stage 2: Disk-to-memory Joins
20
Controlling 2nd Stage
Cost of 2nd Stage is hidden when both inputs experience delays
Tradeoff ? What are the benefits of using the second stage?
– Produce results when input sources are stalled– Allows variable input rates
What is the disadvantage?– The second stage must complete a d-r portion before
checking for new input (overhead) To address the tradeoff, use an activation threshold:
– Pick a partition likely to produce many tuples right now
21
3rd Stage of XJoin
Disk-to-Disk Join Clean-up stage
– Assumes that all data for both inputs has arrived– Assumes that first and second stage completed– Makes sure that all tuples belonging in the result
are being produced. Why is this step necessary?
– Completeness of answer
22
Handling Duplicates
When could duplicates be produced? Duplicates could be produced in all 3 stages
as multiple stages may perform overlapping work.
How address it:– XJoin prevents duplicates with timestamps.
When address this:– During processing as continuous output
23
Time Stamping : part 1
2 fields are added to each tuple:– Arrival TimeStamp (ATS)
indicates when the tuple arrived first in memory– Departure TimeStamp (DTS)
used to indicated time the tuple was flushed to disk
[ATS, DTS] indicates when tuple was in memory When did two tuples get joined?
– If Tuple A’s DTS is within Tuple B’s [ATS, DTS] Tuples that meet this overlap condition are not
considered for joining by the 2nd or 3rd stages
24
Tuple B1 178 198
• Tuples joined in first stage
• B1 arrived after A, and before A was flushed to disk
Tuple A 102 234
DTSATS
Tuple B2 348 601
• Tuples not joined in first stage
• B2 arrived after A, and after A was flushed to disk
Tuple A 102 234
DTSATS
Non-Overlapping
Detecting tuples joined in 1st stage
Overlapping
25
Time Stamping : part 2
• For each partition, keep track off:– ProbeTS: time when a 2nd stage probe was done– DTSlast: the latest DTS time of all the tuples that were
available on disk at that time
• Several such probes may occur:– Thus keep an ordered history of such probe descriptors
• Usage: – All tuples before and including at time DTSlast were
joined in stage 2 with all tuples in main memory (ATS,DTS) at time ProbeTS
26
Tuple A
DTS
100 200
ATS
Tuple B
DTS
500 600
ATS
Detecting tuples joined in 2nd stage
ProbeTSDTSlast
20 340
Partition 1
Overlap
250 550
Partition 2
300 700
Partition 3
History list for the corresponding partitions
100 300
Partition 1
800 900
Partition 2
All tuples before and including DTSlast were joined in Stage 2 At time ProbeTS
All A tuples in Partition 2 up to DTSlast 250,Were joined with m-r tuples that arrived before Partition 2’s ProbeTS.
27
Experiments
• HHJ (Hybrid Hash Join)
• Xjoin (with 2nd stage and with caching)
• Xjoin (without 2nd stage)
• Xjoin (with aggressive usage of 2nd stage)
28
Case 1: Slow NetworkBoth sources are slow (bursty)
XJoin improves delivery time of initial answers -> interactive performance
The reactive background processing is an effective solution to exploit intermittant delays to keep continued output rates.
Shows that 2nd stage is very useful if there is time for it
29
Slow Network: both resources are slow
30
Case 2: Fast NetworkBoth sources are fast
All XJoin variants deliver initial results earlier. XJoin also can deliver the overall result in
equal time to HHJ HHJ delivers the 2nd half of the result faster
than XJoin. 2nd stage cannot be used too aggressively if
new data is coming in continuously
31
Case 2: Fast NetworkBoth sources are fast
32
Conclusion
Can be conservative on space (small footprint)
Can be used in conjunction with online query processing to manage the streams
Resuming Stage 1 as soon as data arrives Dynamically choosing techniques for
producing results
33
References
Urhan, Tolga and Franklin, Michael J. “XJoin: Getting Fast Answers From Slow and Bursty Networks.”
Urhan, Tolga, Franklin, Michael J. “XJoin: A Reactively-Scheduled Pipelined Join Operator.”
Hellerstein, Franklin, Chandrasekaran, Deshpande, Hildrum, Madden, Raman, and Shah. “Adaptive Query Processing: Technology in Evolution”. IEEE Data Engineering Bulletin, 2000.
Hellerstein and Avnur, Ron. “Eddies: Continuously Adaptive Query Processing.”
Babu and Wisdom, Jennefer. “Continuous Queries Over Data Streams”.