Hive Correlation Optimizer

© Hortonworks Inc. 2011

Hive Correlation Optimizer

Yin Huai

[email protected]

[email protected]

Page 1

Hadoop Summit 2013 Hive User Group Meetup


About me

•Hive contributor•Summer intern at Hortonworks•4th year Ph.D. student at The Ohio State University

•Research interests: query optimizations, file formats, distributed systems, and storage systems

Page 2Architecting the Future of Big Data


Outline

•Query planning in Hive•Correlations in a query (Intra-query correlations)

•Case studies•Automatically exploiting correlations (HIVE-2206: Correlation Optimizer)



Query planning


SELECT t1.c2, count(*)FROM t1 JOIN t2 ON (t1.c1=t2.c1)GROUP BY t1.c2

t1 t2

JOIN

AGG

t1.c1=t2.c1

Calculate count(*) for every group of t1.c2


Query planning


SELECT t1.c2, count(*)FROM t1 JOIN t2 ON (t1.c1=t2.c1)GROUP BY t1.c2

t1 t2

JOIN

AGG Evaluate this query in distributed systems

t1 t2

JOIN

AGG

Shuffle

Shuffle

c1

c2

How to shuffle?Use the key column(s)


Generating MapReduce jobs


t1 t2

JOIN

AGG

Shuffle

Shuffle c2

c1

t1 t2

JOIN

Shuffle

tmp

c1

tmp

AGG

Shuffle c2

1 MR job can shuffle data once

Job 1

Job 2


Generating MapReduce jobs


t1 t2

JOIN

Shuffle

tmp

c1

tmp

AGG

Shuffle c2

MapReuce will shuffle data for us, we just need to emit outputs from the Map phase

We use ReduceSinkOperator (RS) to emit Map outputs.RSs are the end of a Map phase.

t1 t2

JOIN

tmp

tmp

AGG

RS1 RS2

RS2

Job 1Map

Job 1Reduce

Job 2Map

Job 2Reduce


Outline





Intra-query correlations


SELECT x.c1, count(*)FROM t1 x JOIN t1 y ON (x.c1=y.c1)GROUP BY x.c1

t1 as x t1 as y

JOIN

AGG

x.c1=y.c1

Calculate count(*) for every group of x.c1

Correlations:1. Same input tables2. JOIN and AGG using the

same key




x.c1=y.c1

Calculate count(*) for every group of z.c1

t1 as x t2 as y

JOIN1

JOIN2

AGG1

t1 as z

p.c1=q.c1

SELECT p.c1, q.c2, q.cntFROM (SELECT x.c1 AS c1 FROM t1 x JOIN t2 y ON (x.c1=y.c1)) pJOIN (SELECT z.c1 AS c1, count(*) AS cnt FROM t1 z GROUP BY z.c1) qON (p.c1=q.c1)

Correlations:1. Same input tables (t1)2. JOIN1 and AGG1 using the

same key3. JOIN2 and all of its parents

using the same key



• Defined in “YSmart: Yet Another SQL-to-MapReduce Translator”– http://ysmart.cse.ohio-state.edu/– http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf

• Targeting on operators which need to shuffle the data and inputs• Three kinds of correlations

– Input correlation (IC): independent operators share the same input tables– Transit correlation (TC): independent operators have input correlation and

also shuffle the data in the same way (e.g. using the same keys)– Job flow correlation (JFC): two dependent operators shuffle the data in

the same way


t1 as x t2 as y

JOIN1 AGG1

t1 as z

IC

t1 as x t2 as y

JOIN1 AGG1

t1 as z

x.c1=y.c1 group by z.c1TC

JOIN

AGG

x.c1=y.c1

group by z.c1JFC


Correlation-unaware query planning


t1 t1

JOIN

AGG

Shuffle

Shuffle c1

c1

Hive does not care:1. If a table has been

used multiple times

2. If data really needs to be shuffled

t1 t1

JOIN

Shuffle

tmp

c1

Job 1

tmp

AGG

Shuffle c1 Job 2

Drawbacks:1. Unnecessary data

loading2. Unnecessary data

shuffling3. Unnecessary data

materialization


Outline





Case studies: TPC-H Q17 (Flattened)

SELECT

sum(l_extendedprice) / 7.0 as avg_yearly

FROM

(SELECT l_partkey, l_quantity, l_extendedprice

FROM lineitem JOIN part ON (p_partkey=l_partkey)

WHERE p_brand='Brand#35’ AND

p_container = 'MED PKG’) touter

JOIN

(SELECT l_partkey as lp, 0.2 * avg(l_quantity) as lq

FROM lineitem

GROUP BY l_partkey) tinner

ON (touter.l_partkey = tinnter.lp)

WHERE touter.l_quantity < tinner.lq





lineitem part

JOIN1

JOIN2

AGG1

lineitem

AGG2

lineitem is used by JOIN1 and AGG1JOIN1, AGG1, and JOIN2 share the same key




lineitem part

JOIN1

JOIN2

AGG1

lineitem

AGG2

Job 1 Job 2

Job 3

Job 4

Without Correlation Optimizer




lineitem part

JOIN1

JOIN2

AGG1

lineitem

AGG2

part

JOIN1

JOIN2

AGG1

lineitem

AGG2

Job 1 Job 2

Job 3

Job 4 Job 2

Job 1

Without Correlation Optimizer With Correlation Optimizer


Case studies: TPC-DS Q95 (Flattened)SELECT count(distinct ws1.ws_order_number) as order_count,

sum(ws1.ws_ext_ship_cost) as total_shipping_cost,

sum(ws1.ws_net_profit) as total_net_profit

FROM web_sales ws1

JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)

JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)

JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)

LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number

FROM web_sales ws2 JOIN web_sales ws3

ON(ws2.ws_order_number = ws3.ws_order_number)

WHERE ws2.ws_warehouse_sk <> ws3.ws_warehouse_sk) ws_wh1

ON (ws1.ws_order_number = ws_wh1.ws_order_number)

LEFT SEMI JOIN (SELECT wr_order_number

FROM web_returns wr

JOIN (SELECT ws4.ws_order_number as ws_order_number

FROM web_sales ws4 JOIN web_sales ws5

ON (ws4.ws_order_number = ws5.ws_order_number)

WHERE ws4.ws_warehouse_sk <> ws5.ws_warehouse_sk) ws_wh2

ON (wr.wr_order_number = ws_wh2.ws_order_number)) tmp1

ON (ws1.ws_order_number = tmp1.wr_order_number)

WHERE d.d_date >= '2001-05-01' AND

d.d_date <= '2001-06-30’ AND

ca.ca_state = 'NC’ AND

s.web_company_name = 'pri'



Case studies: TPC-DS Q95 (Flattened)


web_sales

AGG

customer_address web_site

MapJoin

SemiJoin

web_sales web_sales

JOIN1

web_sales web_sales

JOIN1

web_returns

JOIN2

date_dim




web_sales

AGG


MapJoin

SemiJoin

web_sales web_sales

JOIN1

web_sales web_sales

JOIN1

web_returns

JOIN2

Without Correlation Optimizer• 6 MapReduce jobs• Unnecessary data loading

(black web_sales nodes)• Unnecessary data shuffling

Job 6

Job 2

Job 3

Job 4

Job 5

Job 1

date_dim




web_sales

AGG


MapJoin

SemiJoin

web_sales

JOIN1

JOIN1

web_returns

JOIN2

With Correlation Optimizer• Black web_sales nodes share

the same data loading

date_dim




web_sales

AGG


MapJoin

SemiJoin

web_sales

JOIN1

JOIN1

web_returns

JOIN2

With Correlation Optimizer• Black web_sales nodes share

the same data loading• 3 MapReduce jobs

Job 1

Job 2

Job 3

date_dim




web_sales

AGG


MapJoin

SemiJoin

web_sales

JOIN1

web_returns

JOIN2

Follow-up work• Evaluate JOIN1 only once

without materializing a temporary table

date_dim




web_sales

AGG


MapJoin

SemiJoin

web_sales

JOIN1

web_returns

JOIN2

Follow-up work• Evaluate JOIN1 only once

without materializing a temporary table

• Only use 2 MapReduce jobsJob 1

Job 2

date_dim


Outline





Objectives

• Eliminate unnecessary data loading– Query planner will be aware what data will be loaded– Do as many things as possible for loaded data

• Eliminate unnecessary data shuffling– Query planner will be aware when data really needs to be shuffled– Do as many things as possible before shuffling the data again



ReduceSink Deduplication

• HIVE-2340• Handle chained Job Flow Correlations

– e.g. Generating a single job for both Group By and Order By

• Cannot handle complex patterns– e.g. Multiple Joins involved patterns

• Need a fundamental solution• Need to exploit shared input tables


t1

RS1

AGG1

RS2

…

t1

RS1

AGG1

…


Correlation Optimizer

• 2-phase optimizer– Phase 1: Correlation Detection– Phase 2: Query plan tree transformation

• This work is not just about the optimizer– New operators to support the execution of an optimized plan– A mechanism to coordinate the operator tree inside the Reduce phase



Correlation detection



1. Traverse the tree all the way down to find matching keys in ReduceSinkOperators

2. Then, check input tables to find shared data loading opportunities

t1 as x t2 as y

JOIN1

JOIN2

AGG1

t1 as z

RS1 RS2 RS3

RS4 RS5

Key: p.c1 Key: q.c1

Key: x.c1 Key: y.c1 Key: z.c1


Query plan tree transformation



t1 as x t2 as y

JOIN1

JOIN2

AGG1

t1 as z

Key: p.c1

RS1 RS2 RS3

RS4 RS5

Key: q.c1

Key: x.c1 Key: y.c1 Key: z.c1

t1 as x, zt2 as y

JOIN1

JOIN2

AGG1

RS1RS2 RS3


Thanks

Architecting the Future of Big DataPage 31

Hive Correlation Optimizer

Technology

c1 group

future of big dataselect

future of big dataselect

tmpaggshuffle c1 job

t1 x join t1 y

c1 pjoin

future of big datax

t1 x join t2 y