Top Banner
© 2015 IBM Corporation October 29, 2015 Access Hive Data FASTER and more SECURELY with Big SQL
24

Row and Column Security for Hive data with Big SQL

Apr 12, 2017

Download

Data & Analytics

Hadoop Dev
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation

October 29, 2015

Access Hive Data

FASTER and more SECURELY

with Big SQL

Page 2: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation3

Watch this on YouTube @

www.youtube.com/watch?v=SYQgzRGhqVU

Page 3: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation4

SQL on Hadoop Matters for Big Data Analytics

For BI Tools like Cognos

Visualizations from Cognos 10.2.2

Page 4: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation5

Hive is Really 3 Things…

Storage Format, Metastore, and Execution Engine

5

SQL Execution Engine

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)M

apR

edu

ce

Applications

Page 5: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation6

OutputReduceMap

Hive “Execution Engine”

SQL

Hive

References Hive Meta Store to understand data

Translates SQL to Map Reduce

Page 6: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation7

Big SQL preserves open source foundationLeverages Hive metastore and storage formats.

No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time.

7

SQL Execution Engines

IBM BigSQL

(IBM)

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)

Applications

Page 7: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation8

WHY WOULD YOU WANT TO

DO THAT?

Ok…. But…..

Page 8: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation9

Performance Test – TPC DS Workload

20 (Physical Node) Cluster

TPC-DS stands for Transaction Processing Council – Decision Support (workload) which is

an industry standard benchmark for SQL

Hive 1.2.1

IBM Open Platform V4.1

20 Nodes

Big SQL V4.1

IBM Open Platform V4.1

20 Nodes

*Not an official TPC-DS Benchmark.

Page 9: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation10

Big SQL V4.1 vs Hive @ 1TB TPC-DS

Page 10: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation11

Big SQL V4.1 vs Hive @ 1TB TPC-DS

Page 11: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation12

Big SQL V4.1 vs Hive @ 1TB TPC-DS

Page 12: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation13

Big SQL V4.1 vs Hive @ 1TB TPC-DS

Page 13: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation14

Performance Test Summary

Big SQL V4 vs. Hive 1.2.1 @ 1TB

In 99 / 99 Queries, Big SQL was faster

On Average, Big SQL was 21X faster

Excluding the Top 5 and Bottom 5 results, Big SQL was 19X faster

Page 14: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation15

ONLY BIG SQL COULD RUN

THE COMPLETE WORKLOAD

Actually, we originally set out to run 10TB, but …

Page 15: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation16

Performance Test Summary

Big SQL @ 10TB vs. Hive @ 1 TB

How does Big SQL running with 10X the data?

In 89 / 99 Queries, Big SQL was still faster

On Average, Big SQL still 3.8X faster

Excluding the Top/Bottom 5 results, Big SQL was still 3.2X faster

Page 16: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation17

AND, we’re really good with lots of users….

Clear benefit on workload throughput with WLM enabled:

Page 17: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation18

MORE SECURE

And, Big SQL makes SQL Access on Hadoop

Page 18: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation19

Enhanced Security - Good to Know

Role Based Access Control

Row Level Security

Column Level Security

Separation of Duties

Security Administrator

Database Administrator

Workload Manager

Others..

Page 19: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation20

Recap - Big SQL preserves open source foundation

SQL Execution Engines

IBM BigSQL

(IBM)

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)

Applications

Big SQL Makes Hive

FASTER and more SECURE

Page 20: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation

October 29, 2015

Comparing

Big SQL 10TB vs Hive @ 1TB

Page 21: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation26

Big SQL 4.1 @10TB vs Hive @ 1TB TPC-DS

Page 22: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation27

Big SQL 4.1 @10TB vs Hive @ 1TB TPC-DS (cont..)

Page 23: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation28

Big SQL 4.1 @10 TB vs Hive @ 1TB TPC-DS (cont..)

Page 24: Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation29

Big SQL 4.1 @ 10TB vs Hive @ 1TB TPC-DS (cont..)