Top Banner
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class Transactional SQL-on- HBase
18

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

Dec 18, 2015

Download

Documents

Joshua Cross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Trafodion Enterprise-Class Transactional SQL-on-HBase

Page 2: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2

Trafodion – Introduction (Welsh for transactions)

Complete: Full-function SQL Reuse existing SQL skills and improve developer productivity

Protected: Distributed ACID transactionsData consistency across multiple rows, tables, SQL statements

Efficient: Low-latency R/W transactionsOptimized for real-time transaction processing applications

Interoperable: Standard ODBC/JDBC accessWorks with existing tools and applications

Data federation: Trafodion/HBase/Hive tablesEnables multiple data model deployment

Scalable: Elastic scale for high concurrencyProvides elastic scalability as number of users / data grows

Highly Available: For enterprise applicationsLeverages HBase / Hadoop replication

Open: Hadoop and Linux distribution neutralEasy to add to existing infrastructure with no vendor lock-in

Eco-system: Leverages large Hadoop eco-systemCan use any tool or database accessing Hadoop

Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop

+Transactional SQL

Hadoop

20+ years of database investment open sourced by HP on June 10th 2014!

Page 3: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3

Hadoop workload profiles

Operational Non-interactive

• Real-time analytics

• Data preparation• Incremental batch processing• Dashboards, scorecards

Interactive• Parameterized reports• Drilldown visualization• Exploration

Batch• Operational batch processing• Enterprise reports• Data mining

•Transactional SQL = OLTP + interactions

Sub-second Response Time Hours

Current Market Focus: Data Warehousing and Analytics

OperationalOptimizations

DataIntegrity

Workload Managemen

t

Transaction Support

Real-time Performan

ce

Exposes Hadoop limitations

Page 4: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4

The Case for Operational SQL-on-Hadoop• Sector Road Map: SQL-on-Hadoop platforms in 2013 –

Joseph Turian, March 20, 2013– An operational database offers write access, not just read access, to data. However,

there are other key features for an operational database: concurrency, interactive write speed, and distributed transactional support (guarantees about data consistency). Currently no existing SQL-on-Hadoop solution satisfies these requirements. If a strong player or two emerges in the category, it will completely shake up the big data and database landscape.

• 5 Reasons Hadoop is Kicking Can and Taking Names – Mike Gualtieri, October 22, 2013– #5 The future of Hadoop is real-time and transactional. The key commercial vendors

are focusing on fast SQL access, real-time streaming, and manageability features that enterprises demand. The groundwork is being laid for an eruption in data management technologies as Hadoop sneaks its way into the transactional database market.

• • The Future of Hadoop: What Happened & What's Possible? –

Doug Cutting, Oct 30 2013 – So I think the prediction we can make here is that it is inevitable that we will see just

about every kind of workload be moved to this platform – even Online Transaction Processing.

Page 5: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

5 HP PRIVATE © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Operational SQL on Hadoop – Use cases

• Integration of structured, semi-structured, and unstructured support

• Integration of operational, historical, & external (Big) data along common master data for better insights

Item idDescriptionCost Price…Structured

TypeDisplay SizeResolutionBrand Model 3D …

…ISBN AuthorPublish DateFormat Dept

TVBoo

k…

Semi- structured

SELECT all TVs WHERE Price > 2000 and Type = ‘Plasma’ and Display Size > ‘50’ and customer sentiment is very positive

Unstructured

Image…Review…

Open distributed

HDFS structures HBase &

Hive

Free at last!

Capture data directly into open file

structures

Accessible for reporting & analytics

with no latency

Page 6: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6

• Create album

• Upload / Import pictures into album

• Create a project / photo book

• Share album / project with family / friends

Asset Management

• Print Calendars, Cards, …

• Order prints, mugs, linen, jewelry, cases, covers, cards, teddy bears, …

Shopping Trafodion

OLTP on Hadoop

Snapfish – Web-based photo sharing and photo printing. Members can upload files for free with unlimited photo storage. They can share photo albums, individual photos, and various Snapfish products via email, link URL, and other web services such as Facebook and Blogger. They can buy personalized photo products such as prints, photo books, cards and mugs. Supports retail pickup at Meijer, Walgreens and Walmart.

Versus RDBMS & NoSQL• High concurrency

low latency workloads

• Limitless elastic scale

• Very low TCO

Page 7: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7

Trafodion

Create albumINSERT into Trafodion table ALBUM (cust_id, album_id, album_name, …)

Upload picturesPictures loaded into HDFS by appBEGIN WORK INSERT list of pictures uploaded into Trafodion table PIC (cust_id, album_id, pic_id, pic_date, …) INSERT picture attributes from camera into HBase table PIC_ATTR as col-value pairs for each of the pictures using pic_idEND WORK

Transaction

Tag picturesBEGIN WORK INSERT custom tags for each tagged picture into HBase table PIC_ATTR as col-value pairsEND WORK

Share picturesINSERT into Trafodion table REL (cust_id, rel_with_cust_id, rel-type, …)BEGIN WORK INSERT list of pictures shared into Trafodion table SHARED_PIC (pic_id, rel_with_cust_id)END WORK

Order photo mug & jewelryBEGIN WORK INSERT into ORDER (cust_id, order_no, order_date, order_total, …) INSERT into ORDER_DETAIL all items that are part of the order (cust_id, order_no, item_id, pic_id, qty, amt, …)END WORK

Search for picturesSELECT pictures taken with my “Sony DSC-RX100M2” camera in the last 6 months from my “Travel” album with a tag “Emma” on it.

Backend operational workloadsOrder tracking, supply chain, inventory control, …

Versus RDBMS & NoSQL• Rich ANSI SQL RDBMS

features• Full ACID transactional

support• Integration of structured,

semi-structured, & unstructured data

Autonomy could be used to analyze the pictures in HDFS to automatically create tags to be stored in HBase PIC_ATTR

Page 8: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8

Trafodion

Vertica

Reporting & Analytics via

Vertica

Analytics in Vertica to generate

recommendation model

Web

app

Using model & customer

score / attributes, and

recent purchase

history make recommendati

ons

Rohit, consider a blanket for your granddaughter at

50% off with her image imprinted on it

50%

BI reporting• Sales growth by product, region, demo •Growth in customers, pictures, storage, …•Growth in sharing•…

Analytics• Items bought together – market basket analysis• Promotion success customer classification•…

Versus RDBMS & NoSQL• Data captured in an open file system with

open APIs• Is available with no latency for reporting &

analysis• Via a huge open source & proprietary Hadoop

eco-system

Page 9: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9

Use case examples

• Online financial management

Finance

• Billing systems

• Provisioning systems

Telecom

• RFID tracking

Manufacturing

• Smart Metering

Energy

• Authorization and claims processing

Healthcare

• 911 Emergency System

Government

• Reservation systems

Transportation

• Online shopping

Consumer & Retail

Multi-Structured

Data

ACID Protection,

Data Integrity

Low Latency, High

Concurrency

Generates Revenue

Touches the Customer

Helps Run the Business

Page 10: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10

Comprehensive DDL, DML, TCL, and utility support • ANSI Core SQL 99 complaint + other SQL 99 and SQL 2003 support with

Trafodion extensions• Full featured DDL - CREATE/DROP/ALTER statements for tables, views,

indexes, constraints– Comprehensive data type support - numeric, character, varchar, date, time, interval– Unicode encoding including UTF8, UCS2, and ISO8859-1for user data; UTF8 for

metadata

• Full featured DML – SELECT, INSERT, UPDATE, DELETE, UPSERT and MERGE statements– JOIN (INNER, LEFT/RIGHT/FULL OUTER), UNION, WHERE, GROUP BY, HAVING, ORDER

BY, SAMPLING, etc.– Correlated and nested sub-queries– Cursor support (non-holdable)– Extensive SQL function support - aggregate, date/time, character, mathematical,

OLAP, sequence, etc.

• Utilities – Update Statistics, Explain, Control Query Shape, Command Line Interface

• Transaction Control – BEGIN WORK, COMMIT WORK, ROLLBACK WORK, SET TRANSACTION

• Work-in-progress: Stability, Performance, Triggers, Referential Integrity, C++ UDFs, Java Stored Procedures, Bulk Loader, node/system failure Transaction Recovery, Grant/Revoke, …

Page 11: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11

4 F A … …

4 F B … …

5 F A … …

5 F B … …

5 F C … …

6 F A … …

7 F A … …

7 F B … …

7 F C … …

8 F B … …

9 F A … …

9 F B … …

9 F C … …

1 F A … …

1 F B … …

1 F C … …

2 F A … …

2 F C … …

3 F C … …

RK CF CN TS CV

1 F A … …

1 F B … …

1 F C … …

2 F A … …

2 F C … …

3 F C … …

4 F A … …

4 F B … …

5 F A … …

5 F B … …

5 F C … …

6 F A … …

7 F A … …

7 F B … …

7 F C … …

8 F B … …

9 F A … …

9 F B … …

9 F C … …

Leveraging HBase for scalability and availability Region

Server Layer

RegionsPhysical Layout

TableLogical View

HB

ase

Trafo

dio

n

Client

• Regions store contiguous ranges of table rows

• Regions dynamically split by HBase when they reach a configured limit i.e. “autosharding”

• Region servers are elastically scalable

• HDFS and HBase replication provide enhanced data availability and protection

Allows

• Fine-grained load balancing with dynamic movement based on load

• Fast data recovery when a server or disk fails or is decommissioned

Region Server

HDFSRegion Server

HDFSRegion Server

HDFSRegion Server

HDFSRegion Server

HDFS

RK Row Key

CFColumn Family

“F”

CNColumn Name

A, B, C

TS TimestampOne version

CV Cell Value

Clustering key

Data in different Column Families are stored separately

RK A B C… … … ….

Page 12: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12

HBase vs. Trafodion comparison

HBase Trafodion + HBase

Data abstraction Key and value pair Relational schema

Physical Layout Column family store where row data is stored together by cells

Same except there is a single column family with space-saving column encoding

Column values Uninterpreted array of bytes

Explicitly defined and enforced data types

ACID Guarantee Single row atomicity Multi- SQL statements, tables, and rows defined as part of transaction

Language API Get/put/delete SQL (Trafodion invokes native HBase API)

Row Key Index Single (string) row key Composite (multi-column) row key

Secondary Indexes

Not supported Arbitrary secondary key columns

Page 13: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13

Salting of row keys How it works• HBase table gets created, pre-split with

one region per salt value• A hash value column, “_SALT_”, is added

as a prefix to the row key• Salting is transparent to SQL statements

– Automatically computed during insert/update statements

– Predicates automatically generated where feasible

– Minimal overhead for direct lookup by key value

Benefits• Even data distributions across HBase

regions• Avoids region hotspots caused by insertion

of data in row key order

CREATE TABLE t(a integer not null primary key, b integer) SALT USING 4 PARTITIONS;

HBase Region

HDFS

HBase Region

HDFS

HBase Region

HDFS

HBase Region

HDFS

INSERT(s) SELECT(s)

PART 1 PART 2 PART 3 PART 4

Page 14: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14

Trafodion – Software architecture (3 layers)

JDBC ODBC

User and ISV Operational Applications

Driver

Client

SQL

Storage Engine

*ESP

CMP Master

ESPDTM

WMS

Compiler and Optimizer Workload Management

SQL ParallelismDistributed Transaction

Management

. . . .

FutureDatabase Connectivity

HBase

Relational Schema

Trafodion Tables

HDFS

Data Store Integratio

nHBase

Native HBase Tables KVS,

Columnar via HBase API +

coprocessors

Hive

Direct HDFS access to Hive tables using

HCatalog

*Executor Server Process

Page 15: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

Optimized for varying operational workloads• Optimized performance and efficiency

– OLT optimization for directed keyed access

– ESPs for parallel SQL operations– Multi-layered ESP for complex plans– Data Flow pipeline parallel architecture– Reusable Masters and ESPs for efficiency

• Cached SQL plans eliminates recompilation

• Sophisticated Optimizer– Leverages Equal Height Histograms– Pushes down

• Filters e.g. row selection (start-stop key)

• Coprocessors e.g. aggregates

• Multi-Dimensional Access (MDAM)

– Secondary index access

• Service persistence (via Zookeeper) and automatic query resubmission

Node 1 Node 2 Node n

Client Application

HDFS

HBase HBase HBaseFilters

HDFS HDFS HDFS HDFS

Ethernet

Coprocessors

Master

ESP ESP ESP ESP ESP

ESP ESP ESP ESP ESP

Master

Type 2 and Type 4 ODBC/JDBC driver

Page 16: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16

Trafodion: an Enterprise class operational SQL-on-Hadoop DBMS

In Summary

Structured Relational

DBMS

• Lower cost – inexpensive storage & servers

• Elastic scalability• Open and distributed file system

(HDFS)• Semi-structured & unstructured

support• Schema flexibility• Automatic data repartitioning• High availability via replication (k-

safety)• Disaster Recovery1

• Column level access control• Column level encryption• Space quotas1

• Vast open source & proprietary eco-system

• Versioning snapshot support & incremental data replication

• Cloud deployable• Industry push for Hadoop Data

Lake

K/V & document

stores

Unstructured analytics

• Innovative database engines for OLTP, ODS, and EDW (20+ years investment)

• Comprehensive ANSI SQL support• Structured data support (schema)• ACID transactional protection for

multiple rows, tables, statements, region updates

• Support for nested loop, merge, hash joins

• Optimized execution plans via incremental equal height histograms

• Efficient data flow architecture• Grant/Revoke Security support• UDFs for Complex Event processing• Workload Management• Enterprise class monitoring &

manageability

HP DBMS

Trafodion

Able to join Trafodion, HBase, Hive tables in a

single statement

• Compound primary/secondary keys

• Encoding column names for compaction

• Salting to eliminate I/O hotspots

Operational SQL on

Hadoop 1Select distributions

Page 17: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

17 HP PRIVATE © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

See for yourself…Come discover and develop on

Trafodion

www.trafodion.org

Page 18: HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class.

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank You