Top Banner
© Hortonworks Inc. 2013. © Hortonworks Inc. 2013. Apache Hive and Stinger: SQL in Hadoop Arun Murthy (@acmurthy) Alan Gates (@alanfgates) Owen O’Malley (@owen_omalley) @hortonworks
26

Strata Stinger Talk October 2013

Jan 27, 2015

Download

Technology

alanfgates

Slides from the talk Apache Hive and Stinger, Petabyte scale SQL in Hadoop presented by Arun Murthy, Alan Gates, and Owen O'Malley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013. © Hortonworks Inc. 2013.

Apache Hive and Stinger: SQL in Hadoop

Arun Murthy (@acmurthy) Alan Gates (@alanfgates) Owen O’Malley (@owen_omalley) @hortonworks

Page 2: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

YARN: Taking Hadoop Beyond Batch

Page 2

Applica'ons  Run  Na'vely  IN  Hadoop  

HDFS2  (Redundant,  Reliable  Storage)  

YARN  (Cluster  Resource  Management)      

BATCH  (MapReduce)  

INTERACTIVE  (Tez)  

STREAMING  (Storm,  S4,…)  

GRAPH  (Giraph)  

IN-­‐MEMORY  (Spark)  

HPC  MPI  (OpenMPI)  

ONLINE  (HBase)  

OTHER  (Search)  (Weave…)  

Store ALL DATA in one place…

Interact with that data in MULTIPLE WAYS

with Predictable Performance and Quality of Service

Page 3: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Hadoop Beyond Batch with YARN

HADOOP 1

HDFS  (redundant,  reliable  storage)  

MapReduce  (cluster  resource  management  

 &  data  processing)  

HDFS2  (redundant,  reliable  storage)  

YARN  (opera:ng  system:  cluster  resource  management)  

MapReduce  (batch)  

Others  (varied)  

HADOOP 2

Single Use System Batch Apps

Multi Use Data Platform Batch, Interactive, Online, Streaming, …

Tez  (interac:ve)  

A shift from the old to the new…

Page 4: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Apache Tez (“Speed”) • Replaces MapReduce as primitive for Pig, Hive, Cascading etc.

– Smaller latency for interactive queries – Higher throughput for batch queries – 22 contributors: Hortonworks (13), Facebook, Twitter, Yahoo, Microsoft

YARN ApplicationMaster to run DAG of Tez Tasks

Task with pluggable Input, Processor and Output

Tez Task - <Input, Processor, Output>

Task  

Processor  Input   Output  

Page 5: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Tez: Building blocks for scalable data processing

Classical ‘Map’ Classical ‘Reduce’

Intermediate ‘Reduce’ for Map-Reduce-Reduce

Map  Processor  

HDFS  Input  

Sorted  Output  

Reduce  Processor  

Shuffle  Input  

HDFS  Output  

Reduce  Processor  

Shuffle  Input  

Sorted  Output  

Page 6: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Hive – MR Hive – Tez

Hive-on-MR vs. Hive-on-Tez SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON (a.id = b.id) GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x

ORDER BY AVG;

SELECT a.state

JOIN (a, c) SELECT c.price

SELECT b.id

JOIN(a, b) GROUP BY a.state

COUNT(*) AVERAGE(c.price)

M M M

R R

M M

R

M M

R

M M

R

HDFS

HDFS

HDFS

M M M

R R

R

M M

R

R

SELECT a.state, c.itemId

JOIN (a, c)

JOIN(a, b) GROUP BY a.state

COUNT(*) AVERAGE(c.price)

SELECT b.id

Tez avoids unneeded writes to

HDFS

Page 7: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Tez Sessions

… because Map/Reduce query startup is expensive

• Tez Sessions – Hot containers ready for immediate use – Removes task and job launch overhead (~5s – 30s)

• Hive – Session launch/shutdown in background (seamless, user not

aware) – Submits query plan directly to Tez Session

Native Hadoop service, not ad-hoc

Page 8: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Tez Delivers Interactive Query - Out of the Box!

Page 8

Feature   Descrip'on   Benefit  

Tez  Session   Overcomes  Map-­‐Reduce  job-­‐launch  latency  by  pre-­‐launching  Tez  AppMaster   Latency  

Tez  Container  Pre-­‐Launch  

Overcomes  Map-­‐Reduce  latency  by  pre-­‐launching  hot  containers  ready  to  serve  queries.   Latency  

Tez  Container  Re-­‐Use  Finished  maps  and  reduces  pick  up  more  work  rather  than  exi:ng.  Reduces  latency  and  eliminates  difficult  split-­‐size  tuning.  Out  of  box  performance!  

Latency  

Run:me  re-­‐configura:on  of  DAG  

Run:me  query  tuning  by  picking  aggrega:on  parallelism  using  online  query  sta:s:cs   Throughput  

Tez  In-­‐Memory  Cache   Hot  data  kept  in  RAM  for  fast  access.   Latency  

Complex  DAGs   Tez  Broadcast  Edge  and  Map-­‐Reduce-­‐Reduce  paXern  improve  query  scale  and  throughput.   Throughput  

Page 9: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

S'nger  Project  (announced  February  2013)  

Batch AND Interactive SQL-IN-Hadoop Stinger Initiative A broad, community-based effort to drive the next generation of HIVE

   Coming  Soon:  

•  Hive  on  Apache  Tez  •  Query  Service  •  Buffer  Cache  •  Cost  Based  Op:mizer  (Op:q)  •  Vectorized  Processing  

 

Hive  0.11,  May  2013:  •  Base  Op:miza:ons  •  SQL  Analy:c  Func:ons  •  ORCFile,  Modern  File  Format  

Hive  0.12,  October  2013:  

•  VARCHAR,  DATE  Types  •  ORCFile  predicate  pushdown  •  Advanced  Op:miza:ons  •  Performance  Boosts  via  YARN  

Speed Improve Hive query performance by 100X to allow for interactive query times (seconds)

Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB

SQL Support broadest range of SQL semantics for analytic applications running against Hadoop

…all IN Hadoop

Goals:

Page 10: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Hive 0.12

Hive 0.12

Release Theme Speed, Scale and SQL

Specific Features •  10x faster query launch when using large number (500+) of partitions

•  ORCFile predicate pushdown speeds queries •  Evaluate LIMIT on the map side •  Parallel ORDER BY •  New query optimizer •  Introduces VARCHAR and DATE datatypes •  GROUP BY on structs or unions

Included Components

Apache Hive 0.12

Page 11: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

SPEED: Increasing Hive Performance

Performance Improvements included in Hive 12 –  Base & advanced query optimization –  Startup time improvement –  Join optimizations

Interactive Query Times across ALL use cases •  Simple and advanced queries in seconds •  Integrates seamlessly with existing tools •  Currently a >100x improvement in just nine months

Page 12: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Stinger Phase 3: Interactive Query In Hadoop

Page 12

Hive 10 Trunk (Phase 3) Hive 0.11 (Phase 1)

190x  Improvement  

1400s

39s

7.2s

TPC-­‐DS  Query  27  

3200s

65s

14.9s

TPC-­‐DS  Query  82  

200x  Improvement  

Query  27:  Pricing  Analy'cs  using  Star  Schema  Join    Query  82:  Inventory  Analy'cs  Joining  2  Large  Fact  Tables  

All  Results  at  Scale  Factor  200  (Approximately  200GB  Data)  

Page 13: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

41.1s

4.2s

39.8s

4.1s TPC-­‐DS  Query  52   TPC-­‐DS  Query  55  

Query  Time  in  Seconds  

Speed: Delivering Interactive Query

Test  Cluster:  •  200  GB  Data  (Impala:  Parquet    Hive:  ORCFile)  •  20  Nodes,  24GB  RAM  each,  6x  disk  each    

Hive 0.12

Trunk (Phase 3)

Query  52:  Star  Schema  Join    Query  5:  Star  Schema  Join  

Page 14: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

22s

9.8s

31s

6.7s TPC-­‐DS  Query  28   TPC-­‐DS  Query  12  

Query  Time  in  Seconds  

Speed: Delivering Interactive Query

Test  Cluster:  •  200  GB  Data  (Impala:  Parquet    Hive:  ORCFile)  •  20  Nodes,  24GB  RAM  each,  6x  disk  each    

Hive 0.12

Trunk (Phase 3)

Query  28:  Vectoriza'on  Query  12:  Complex  join  (M-­‐R-­‐R  pabern)  

Page 15: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

AMPLab Big Data Benchmark

Page 15

45s

63s 63s

9.4s AMPLab  Query  1a   AMPLab  Query  1b   AMPLab  Query  1c  

Query  Time  in  Seconds  (lower  is  beXer)  

1.6s 2.3s

AMPLab  Query  1:  Simple  Filter  Query  

S:nger  Phase  3  Cluster  Configura:on:  •  AMPLab  Data  Set  (~135  GB  Data)  •  20  Nodes,  24GB  RAM  each,  6x  Disk  each    

Hive 0.10 (5 node EC2)

Trunk (Phase 3)

Page 16: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

AMPLab Big Data Benchmark

Page 16

466s

104.3s

490s

118.3s

552s

172.7s

AMPLab  Query  2a   AMPLab  Query  2b   AMPLab  Query  2c  Query  Time  in  Seconds  

(lower  is  beXer)  

AMPLab  Query  2:  Group  By  IP  Block  and  Aggregate  

S:nger  Phase  3  Cluster  Configura:on:  •  AMPLab  Data  Set  (~135  GB  Data)  •  20  Nodes,  24GB  RAM  each,  6x  Disk  each    

Hive 0.10 (5 node EC2)

Trunk (Phase 3)

Page 17: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

AMPLab Big Data Benchmark

Page 17

S:nger  Phase  3  Cluster  Configura:on:  •  AMPLab  Data  Set  (~135  GB  Data)  •  20  Nodes,  24GB  RAM  each,  6x  Disk  each    

Hive 0.10 (5 node EC2)

Trunk (Phase 3)

Query  Time  in  Seconds  (lower  is  beXer)  

AMPLab  Query  3:  Correlate  Page  Rankings  and  Revenues  Across  Time  

490s

145s

AMPLab  Query  3b  

466s

AMPLab  Query  3a  

40s

Page 18: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

How Stinger Phase 3 Delivers Interactive Query

Page 18

Feature   Descrip'on   Benefit  

Tez  Integra:on   Tez  is  significantly  beXer  engine  than  MapReduce   Latency  

Vectorized  Query   Take  advantage  of  modern  hardware  by  processing  thousand-­‐row  blocks  rather  than  row-­‐at-­‐a-­‐:me.   Throughput  

Query  Planner  

Using  extensive  sta:s:cs  now  available  in  Metastore  to  beXer  plan  and  op:mize  query,  including  predicate  pushdown  during  compila:on  to  eliminate  por:ons  of  input  (beyond  par::on  pruning)  

Latency  

Cost  Based  Op:mizer  (Op:q)  

Join  re-­‐ordering  and  other  op:miza:ons  based  on  column  sta:s:cs  including  histograms  etc.   Latency  

Page 19: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

SQL: Enhancing SQL Semantics

Hive  SQL  Datatypes   Hive  SQL  Seman'cs  INT   SELECT,  INSERT  

TINYINT/SMALLINT/BIGINT   GROUP  BY,  ORDER  BY,  SORT  BY  

BOOLEAN   JOIN  on  explicit  join  key  

FLOAT   Inner,  outer,  cross  and  semi  joins  

DOUBLE   Sub-­‐queries  in  FROM  clause  

STRING   ROLLUP  and  CUBE  

TIMESTAMP   UNION  

BINARY   Windowing  Func:ons  (OVER,  RANK,  etc)  

DECIMAL   Custom  Java  UDFs  

ARRAY,  MAP,  STRUCT,  UNION   Standard  Aggrega:on  (SUM,  AVG,  etc.)  

DATE   Advanced  UDFs  (ngram,  Xpath,  URL)    

VARCHAR   Sub-­‐queries  in  WHERE,  HAVING  

CHAR   Expanded  JOIN  Syntax  

SQL  Compliant  Security  (GRANT,  etc.)  

INSERT/UPDATE/DELETE  (ACID)  

Hive  0.12  

Available  

Roadmap  

SQL Compliance Hive 12 provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop

Page 20: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

ORC File Format

• Columnar format for complex data types • Built into Hive from 0.11 • Support for Pig and MapReduce via HCat • Two levels of compression – Lightweight type-specific and generic

• Built in indexes – Every 10,000 rows with position information – Min, Max, Sum, Count of each column – Supports seek to row number

Page 20

Page 21: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

SCALE: Interactive Query at Petabyte Scale

Sustained Query Times Apache Hive 0.12 provides sustained acceptable query times even at petabyte scale

131  GB  (78%  Smaller)  

File  Size  Comparison  Across  Encoding  Methods  Dataset:  TPC-­‐DS  Scale  500  Dataset  

221  GB  (62%  Smaller)  

Encoded  with  Text  

Encoded  with  RCFile  

Encoded  with  ORCFile  

Encoded  with  Parquet  

505  GB  (14%  Smaller)  

585  GB  (Original  Size)   •  Larger Block Sizes

•  Columnar format arranges columns adjacent within the file for compression & fast access

Impala  

Hive  12  

Smaller Footprint Better encoding with ORC in Apache Hive 0.12 reduces resource requirements for your cluster

Page 22: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

ORC File Format

• Hive 0.12 – Predicate Push Down – Improved run length encoding – Adaptive string dictionaries – Padding stripes to HDFS block boundaries

• Trunk – Stripe-based Input Splits – Input Split elimination – Vectorized Reader – Customized Pig Load and Store functions

Page 22

Page 23: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Vectorized Query Execution

• Designed for Modern Processor Architectures – Avoid branching in the inner loop. – Make the most use of L1 and L2 cache.

• How It Works – Process records in batches of 1,000 rows – Generate code from templates to minimize branching.

• What It Gives – 30x improvement in rows processed per second. – Initial prototype: 100M rows/sec on laptop

Page 23

Page 24: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

HDFS Buffer Cache

• Use memory mapped buffers for zero copy – Avoid overhead of going through DataNode – Can mlock the block files into RAM

• ORC Reader enhanced for zero-copy reads – New compression interfaces in Hadoop

• Vectorization specific reader – Read 1000 rows at a time – Read into Hive’s internal representation

Page 25: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013.

© Hortonworks Inc. 2013.

Next Steps

• Blog http://hortonworks.com/blog/delivering-on-stinger-a-phase-3-progress-update/ • Stinger Initiative http://hortonworks.com/labs/stinger/

• Stinger Beta: HDP-2.1 Beta, December, 2013

Page 26: Strata Stinger Talk October 2013

© Hortonworks Inc. 2013. Confidential and Proprietary.

© Hortonworks Inc. 2013. Confidential and Proprietary.

Thank You!

@acmurthy @alanfgates @owen_omalley @hortonworks