© Hortonworks Inc. 2011 Hive Correlation Optimizer Yin Huai [email protected] [email protected] Page 1 Hadoop Summit 2013 Hive User Group Meetup
Jan 27, 2015
© Hortonworks Inc. 2011
Hive Correlation Optimizer
Yin Huai
Page 1
Hadoop Summit 2013 Hive User Group Meetup
© Hortonworks Inc. 2011
About me
•Hive contributor•Summer intern at Hortonworks•4th year Ph.D. student at The Ohio State University
•Research interests: query optimizations, file formats, distributed systems, and storage systems
Page 2Architecting the Future of Big Data
© Hortonworks Inc. 2011
Outline
•Query planning in Hive•Correlations in a query (Intra-query correlations)
•Case studies•Automatically exploiting correlations (HIVE-2206: Correlation Optimizer)
Page 3Architecting the Future of Big Data
© Hortonworks Inc. 2011
Query planning
Page 4Architecting the Future of Big Data
SELECT t1.c2, count(*)FROM t1 JOIN t2 ON (t1.c1=t2.c1)GROUP BY t1.c2
t1 t2
JOIN
AGG
t1.c1=t2.c1
Calculate count(*) for every group of t1.c2
© Hortonworks Inc. 2011
Query planning
Page 5Architecting the Future of Big Data
SELECT t1.c2, count(*)FROM t1 JOIN t2 ON (t1.c1=t2.c1)GROUP BY t1.c2
t1 t2
JOIN
AGG Evaluate this query in distributed systems
t1 t2
JOIN
AGG
Shuffle
Shuffle
c1
c2
How to shuffle?Use the key column(s)
© Hortonworks Inc. 2011
Generating MapReduce jobs
Page 6Architecting the Future of Big Data
t1 t2
JOIN
AGG
Shuffle
Shuffle c2
c1
t1 t2
JOIN
Shuffle
tmp
c1
tmp
AGG
Shuffle c2
1 MR job can shuffle data once
Job 1
Job 2
© Hortonworks Inc. 2011
Generating MapReduce jobs
Page 7Architecting the Future of Big Data
t1 t2
JOIN
Shuffle
tmp
c1
tmp
AGG
Shuffle c2
MapReuce will shuffle data for us, we just need to emit outputs from the Map phase
We use ReduceSinkOperator (RS) to emit Map outputs.RSs are the end of a Map phase.
t1 t2
JOIN
tmp
tmp
AGG
RS1 RS2
RS2
Job 1Map
Job 1Reduce
Job 2Map
Job 2Reduce
© Hortonworks Inc. 2011
Outline
•Query planning in Hive•Correlations in a query (Intra-query correlations)
•Case studies•Automatically exploiting correlations (HIVE-2206: Correlation Optimizer)
Page 8Architecting the Future of Big Data
© Hortonworks Inc. 2011
Intra-query correlations
Page 9Architecting the Future of Big Data
SELECT x.c1, count(*)FROM t1 x JOIN t1 y ON (x.c1=y.c1)GROUP BY x.c1
t1 as x t1 as y
JOIN
AGG
x.c1=y.c1
Calculate count(*) for every group of x.c1
Correlations:1. Same input tables2. JOIN and AGG using the
same key
© Hortonworks Inc. 2011
Intra-query correlations
Page 10Architecting the Future of Big Data
x.c1=y.c1
Calculate count(*) for every group of z.c1
t1 as x t2 as y
JOIN1
JOIN2
AGG1
t1 as z
p.c1=q.c1
SELECT p.c1, q.c2, q.cntFROM (SELECT x.c1 AS c1 FROM t1 x JOIN t2 y ON (x.c1=y.c1)) pJOIN (SELECT z.c1 AS c1, count(*) AS cnt FROM t1 z GROUP BY z.c1) qON (p.c1=q.c1)
Correlations:1. Same input tables (t1)2. JOIN1 and AGG1 using the
same key3. JOIN2 and all of its parents
using the same key
© Hortonworks Inc. 2011
Intra-query correlations
• Defined in “YSmart: Yet Another SQL-to-MapReduce Translator”– http://ysmart.cse.ohio-state.edu/– http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
• Targeting on operators which need to shuffle the data and inputs• Three kinds of correlations
– Input correlation (IC): independent operators share the same input tables– Transit correlation (TC): independent operators have input correlation and
also shuffle the data in the same way (e.g. using the same keys)– Job flow correlation (JFC): two dependent operators shuffle the data in
the same way
Page 11Architecting the Future of Big Data
t1 as x t2 as y
JOIN1 AGG1
t1 as z
IC
t1 as x t2 as y
JOIN1 AGG1
t1 as z
x.c1=y.c1 group by z.c1TC
JOIN
AGG
x.c1=y.c1
group by z.c1JFC
© Hortonworks Inc. 2011
Correlation-unaware query planning
Page 12Architecting the Future of Big Data
t1 t1
JOIN
AGG
Shuffle
Shuffle c1
c1
Hive does not care:1. If a table has been
used multiple times
2. If data really needs to be shuffled
t1 t1
JOIN
Shuffle
tmp
c1
Job 1
tmp
AGG
Shuffle c1 Job 2
Drawbacks:1. Unnecessary data
loading2. Unnecessary data
shuffling3. Unnecessary data
materialization
© Hortonworks Inc. 2011
Outline
•Query planning in Hive•Correlations in a query (Intra-query correlations)
•Case studies•Automatically exploiting correlations (HIVE-2206: Correlation Optimizer)
Page 13Architecting the Future of Big Data
© Hortonworks Inc. 2011
Case studies: TPC-H Q17 (Flattened)
SELECT
sum(l_extendedprice) / 7.0 as avg_yearly
FROM
(SELECT l_partkey, l_quantity, l_extendedprice
FROM lineitem JOIN part ON (p_partkey=l_partkey)
WHERE p_brand='Brand#35’ AND
p_container = 'MED PKG’) touter
JOIN
(SELECT l_partkey as lp, 0.2 * avg(l_quantity) as lq
FROM lineitem
GROUP BY l_partkey) tinner
ON (touter.l_partkey = tinnter.lp)
WHERE touter.l_quantity < tinner.lq
Page 14Architecting the Future of Big Data
© Hortonworks Inc. 2011
Case studies: TPC-H Q17 (Flattened)
Page 15Architecting the Future of Big Data
lineitem part
JOIN1
JOIN2
AGG1
lineitem
AGG2
lineitem is used by JOIN1 and AGG1JOIN1, AGG1, and JOIN2 share the same key
© Hortonworks Inc. 2011
Case studies: TPC-H Q17 (Flattened)
Page 16Architecting the Future of Big Data
lineitem part
JOIN1
JOIN2
AGG1
lineitem
AGG2
Job 1 Job 2
Job 3
Job 4
Without Correlation Optimizer
© Hortonworks Inc. 2011
Case studies: TPC-H Q17 (Flattened)
Page 17Architecting the Future of Big Data
lineitem part
JOIN1
JOIN2
AGG1
lineitem
AGG2
part
JOIN1
JOIN2
AGG1
lineitem
AGG2
Job 1 Job 2
Job 3
Job 4 Job 2
Job 1
Without Correlation Optimizer With Correlation Optimizer
© Hortonworks Inc. 2011
Case studies: TPC-DS Q95 (Flattened)SELECT count(distinct ws1.ws_order_number) as order_count,
sum(ws1.ws_ext_ship_cost) as total_shipping_cost,
sum(ws1.ws_net_profit) as total_net_profit
FROM web_sales ws1
JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
FROM web_sales ws2 JOIN web_sales ws3
ON(ws2.ws_order_number = ws3.ws_order_number)
WHERE ws2.ws_warehouse_sk <> ws3.ws_warehouse_sk) ws_wh1
ON (ws1.ws_order_number = ws_wh1.ws_order_number)
LEFT SEMI JOIN (SELECT wr_order_number
FROM web_returns wr
JOIN (SELECT ws4.ws_order_number as ws_order_number
FROM web_sales ws4 JOIN web_sales ws5
ON (ws4.ws_order_number = ws5.ws_order_number)
WHERE ws4.ws_warehouse_sk <> ws5.ws_warehouse_sk) ws_wh2
ON (wr.wr_order_number = ws_wh2.ws_order_number)) tmp1
ON (ws1.ws_order_number = tmp1.wr_order_number)
WHERE d.d_date >= '2001-05-01' AND
d.d_date <= '2001-06-30’ AND
ca.ca_state = 'NC’ AND
s.web_company_name = 'pri'
Page 18Architecting the Future of Big Data
© Hortonworks Inc. 2011
Case studies: TPC-DS Q95 (Flattened)
Page 19Architecting the Future of Big Data
web_sales
AGG
customer_address web_site
MapJoin
SemiJoin
web_sales web_sales
JOIN1
web_sales web_sales
JOIN1
web_returns
JOIN2
date_dim
© Hortonworks Inc. 2011
Case studies: TPC-DS Q95 (Flattened)
Page 20Architecting the Future of Big Data
web_sales
AGG
customer_address web_site
MapJoin
SemiJoin
web_sales web_sales
JOIN1
web_sales web_sales
JOIN1
web_returns
JOIN2
Without Correlation Optimizer• 6 MapReduce jobs• Unnecessary data loading
(black web_sales nodes)• Unnecessary data shuffling
Job 6
Job 2
Job 3
Job 4
Job 5
Job 1
date_dim
© Hortonworks Inc. 2011
Case studies: TPC-DS Q95 (Flattened)
Page 21Architecting the Future of Big Data
web_sales
AGG
customer_address web_site
MapJoin
SemiJoin
web_sales
JOIN1
JOIN1
web_returns
JOIN2
With Correlation Optimizer• Black web_sales nodes share
the same data loading
date_dim
© Hortonworks Inc. 2011
Case studies: TPC-DS Q95 (Flattened)
Page 22Architecting the Future of Big Data
web_sales
AGG
customer_address web_site
MapJoin
SemiJoin
web_sales
JOIN1
JOIN1
web_returns
JOIN2
With Correlation Optimizer• Black web_sales nodes share
the same data loading• 3 MapReduce jobs
Job 1
Job 2
Job 3
date_dim
© Hortonworks Inc. 2011
Case studies: TPC-DS Q95 (Flattened)
Page 23Architecting the Future of Big Data
web_sales
AGG
customer_address web_site
MapJoin
SemiJoin
web_sales
JOIN1
web_returns
JOIN2
Follow-up work• Evaluate JOIN1 only once
without materializing a temporary table
date_dim
© Hortonworks Inc. 2011
Case studies: TPC-DS Q95 (Flattened)
Page 24Architecting the Future of Big Data
web_sales
AGG
customer_address web_site
MapJoin
SemiJoin
web_sales
JOIN1
web_returns
JOIN2
Follow-up work• Evaluate JOIN1 only once
without materializing a temporary table
• Only use 2 MapReduce jobsJob 1
Job 2
date_dim
© Hortonworks Inc. 2011
Outline
•Query planning in Hive•Correlations in a query (Intra-query correlations)
•Case studies•Automatically exploiting correlations (HIVE-2206: Correlation Optimizer)
Page 25Architecting the Future of Big Data
© Hortonworks Inc. 2011
Objectives
• Eliminate unnecessary data loading– Query planner will be aware what data will be loaded– Do as many things as possible for loaded data
• Eliminate unnecessary data shuffling– Query planner will be aware when data really needs to be shuffled– Do as many things as possible before shuffling the data again
Page 26Architecting the Future of Big Data
© Hortonworks Inc. 2011
ReduceSink Deduplication
• HIVE-2340• Handle chained Job Flow Correlations
– e.g. Generating a single job for both Group By and Order By
• Cannot handle complex patterns– e.g. Multiple Joins involved patterns
• Need a fundamental solution• Need to exploit shared input tables
Page 27Architecting the Future of Big Data
t1
RS1
AGG1
RS2
…
t1
RS1
AGG1
…
© Hortonworks Inc. 2011
Correlation Optimizer
• 2-phase optimizer– Phase 1: Correlation Detection– Phase 2: Query plan tree transformation
• This work is not just about the optimizer– New operators to support the execution of an optimized plan– A mechanism to coordinate the operator tree inside the Reduce phase
Page 28Architecting the Future of Big Data
© Hortonworks Inc. 2011
Correlation detection
Page 29Architecting the Future of Big Data
SELECT p.c1, q.c2, q.cntFROM (SELECT x.c1 AS c1 FROM t1 x JOIN t2 y ON (x.c1=y.c1)) pJOIN (SELECT z.c1 AS c1, count(*) AS cnt FROM t1 z GROUP BY z.c1) qON (p.c1=q.c1)
1. Traverse the tree all the way down to find matching keys in ReduceSinkOperators
2. Then, check input tables to find shared data loading opportunities
t1 as x t2 as y
JOIN1
JOIN2
AGG1
t1 as z
RS1 RS2 RS3
RS4 RS5
Key: p.c1 Key: q.c1
Key: x.c1 Key: y.c1 Key: z.c1
© Hortonworks Inc. 2011
Query plan tree transformation
Page 30Architecting the Future of Big Data
SELECT p.c1, q.c2, q.cntFROM (SELECT x.c1 AS c1 FROM t1 x JOIN t2 y ON (x.c1=y.c1)) pJOIN (SELECT z.c1 AS c1, count(*) AS cnt FROM t1 z GROUP BY z.c1) qON (p.c1=q.c1)
t1 as x t2 as y
JOIN1
JOIN2
AGG1
t1 as z
Key: p.c1
RS1 RS2 RS3
RS4 RS5
Key: q.c1
Key: x.c1 Key: y.c1 Key: z.c1
t1 as x, zt2 as y
JOIN1
JOIN2
AGG1
RS1RS2 RS3
© Hortonworks Inc. 2011
Thanks
Architecting the Future of Big DataPage 31