Generalized Hash Teams for Join and Group-By Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany
Feb 02, 2016
Generalized Hash Teams for Join and Group-By
Alfons Kemper Donald KossmannChristian Wiesner
Universität PassauGermany
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
2VLDB´99
Outline Motivating Example Standard Hash Teams Generalized Hash Teams for Joins Generalized Hash Teams for
Joins/Grouping False Drops Analysis Application Examples (TPC-D) Performance Evaluation
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
3VLDB´99
Traditional Join Plan
Result
R
S
A
A
T
R S T
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
4VLDB´99
Traditional Hash Team Join Plan[Graefe, Bunker, Cooper: VLDB 98]
R
S
A
A
T
Result
A
AR.AR.A
S.AS.A
T.AT.A
R AA S T
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
5VLDB´99
Generalized Hash Teams
SA B
4 3
6 2
3 5
7 0
R BA S T
TB ...
3 ...
0 ...
5 ...
2 ...
ST
A B ...
4 3 ...
3 5 ...
6 2 ...
7 0 ...
R... A
... 4
... 3
... 6
... 7
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
6VLDB´99
bit maps
0 0 0
1 0 1
2 0 1
3 1 0
4 1 0
SA B
4 3
6 2
3 5
7 0
Generalized Hash TeamsT
B ...
3 ...
0 ...
5 ...
2 ...
R... A
... 4
... 3
... 6
... 7
R BA S T
R BA S T
6 m
od 5
=
1
Partitionon B
odd: yelloweven: green
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
7VLDB´99
Generalized Hash Team for Grouping/Aggregation select c.City, sum(o.Value)from Customer c, Order owhere c.C# = o.C#group by c.City
Agg
Bit-maps(BM)
OrderCustomer
Ptn on C# Ptn on C#
Ptn on City
OrderCustomer
Ptn on City Ptn on BM
Agg
Join and
grouping
team
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
8VLDB´99
Group (Customer Order )C#City
Customer
Order
C#
City
C#
Partition on Cityand generate bitmaps for C#
Partition withbitmaps for C#
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
9VLDB´99
Group (Customer Order Lineitem)C#City O#
Customer
Order
Lineitem
O#
C#
City
C#
O#
Partition on Cityand generate bitmaps for C#
Partition withbitmaps for O#
Partition withbitmaps for C#and generate bitmaps for O#
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
10VLDB´99
bit maps
0 0 0
1 0 1
2 0 1
3 1 1
4 1 0
False Drops
R BA S T
R BA S TR... A
... 4
... 3
... 6
... 7
SA B
4 3
6 2
3 5
7 0
8 4
TB ...
3 ...
0 ...
5 ...
2 ...
4 ...
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
11VLDB´99
Overlapping Partitions
T
S
R
Customer
Order
Lineitem
Partition onC# and generatebitmaps for O#
Partition withBitmaps
Partition on B andgenerate
bitmaps for A
Partition based on the bitmaps for A
(Customer Order Lineitem)C# O#
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
12VLDB´99
Applicability ofGeneralized Hash Teams
• for partitioning hierarchical structures A B
Partitionon B
Partition onbitmaps
for A
• but it is also correct for non-strict hierarchies A B (but performance deteriorates)
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
13VLDB´99
bit maps
0 0 0
1 0 1
2 0 1
3 1 1
4 1 0
Non-strict hierarchyA B
R BA S T
R BA S TR... A
... 4
... 3
... 6
... 7
SA B
4 3
6 2
3 5
7 0
3 2
TB ...
3 ...
0 ...
5 ...
2 ...
T
S
R
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
14VLDB´99
False Drops Estimation
11
11)1(S
bnnR
b: cardinality of the bitmapsn: number of partitions
probability that some s sets a bit leading to a false drop of an r into a particular partition:
total number of false drops:
conservative approximation:
11
11S
bn
bn
SnR
1)1(
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
15VLDB´99
Implementation Details:Fine Tuning the Partitioning
0 0 1 01 0 0 02 0 1 03 0 1 14 0 0 05 1 0 0
usedeed
coll
1 00 01 01 10 01 0
R... A
... 4
... 5
... 6
... 3
10010000001001..
Bitmaps
Bloom-Filter[Bratbergsengen]
[Valduriez]
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
16VLDB´99
Implementation Details:Teaming up Join and Grouping
Group (Customer Order )C#City
Customer
Order
C#
City
C#
Partition on Cityand generate bitmaps for C#
Partition withbitmaps for C#
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
17VLDB´99
Teaming Up Join and Grouping: Build Phase
HT JoinC# Ptr
HT AggrCity Ptr
Customer1C# City5 PA
13 M25 M23 PA
5
PA
Hash-Area
City Value Hit
PA 0
M 0
M
13
25
23
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
18VLDB´99
HT JoinC# Ptr
HT AggrCity Ptr
Customer1C# City5 PA
13 M25 M23 PA
5
PA
Hash-Area
City Value Hit
PA 0
M 0
M
13
25
23 Order1
C# Value25 103 665 335 34
13 0
10 1
Teaming Up Join and Grouping: Probe Phase
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
19VLDB´99
Performance Comparison:Group (Customer Order )C#City
Memory [MB]
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
20VLDB´99
False Drops Estimation and Measurement
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
21VLDB´99
Performance Comparison:Group (Customer Order Lineitem)C#City O#
Memory [MB]
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
22VLDB´99
False Drops Estimation and Measurement
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
23VLDB´99
Conclusion and Future Work
Look-Ahead Partitioning for Joins and Grouping
Applicable for hierarchical data structures correctness does not depend on strict
hierarchies Applicable for several TPC-D (TPC-H and
TPC-R) queries: e.g., Q5, Q10, Q18 Combining Generalized Hash Teams and
Order Preserving Hash Joins (OHJ)
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
24VLDB´99
TPC-D Q5
SELECT N_NAME, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT)) AS REVENUE FROM CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = '[region]' AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 1 YEAR GROUP BY N_NAME ORDER BY REVENUE DESC;
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
25VLDB´99
TPC-D Q10
SELECT C_CUSTKEY, C_NAME, SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE, C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT FROM CUSTOMER, ORDER, LINEITEM, NATION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 3 MONTH AND L_RETURNFLAG = 'R' AND C_NATIONKEY = N_NATIONKEY GROUP BY C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS, C_COMMENT ORDER BY REVENUE DESC;
A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams
26VLDB´99
Indirectly Partitioning a Hierarchical Structure
Lineitem
Order
Customer
O#
O#
C#
C#
City
Partition 1 Partition 3Partition 2