HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Trafodion Enterprise-Class Transactional SQL-on- HBase
Dec 18, 2015
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion Enterprise-Class Transactional SQL-on-HBase
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.2
Trafodion – Introduction (Welsh for transactions)
Complete: Full-function SQL Reuse existing SQL skills and improve developer productivity
Protected: Distributed ACID transactionsData consistency across multiple rows, tables, SQL statements
Efficient: Low-latency R/W transactionsOptimized for real-time transaction processing applications
Interoperable: Standard ODBC/JDBC accessWorks with existing tools and applications
Data federation: Trafodion/HBase/Hive tablesEnables multiple data model deployment
Scalable: Elastic scale for high concurrencyProvides elastic scalability as number of users / data grows
Highly Available: For enterprise applicationsLeverages HBase / Hadoop replication
Open: Hadoop and Linux distribution neutralEasy to add to existing infrastructure with no vendor lock-in
Eco-system: Leverages large Hadoop eco-systemCan use any tool or database accessing Hadoop
Joint HP Labs & HP-IT project for transactional SQL database capabilities on Hadoop
+Transactional SQL
Hadoop
20+ years of database investment open sourced by HP on June 10th 2014!
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
Hadoop workload profiles
Operational Non-interactive
• Real-time analytics
• Data preparation• Incremental batch processing• Dashboards, scorecards
Interactive• Parameterized reports• Drilldown visualization• Exploration
Batch• Operational batch processing• Enterprise reports• Data mining
•Transactional SQL = OLTP + interactions
Sub-second Response Time Hours
Current Market Focus: Data Warehousing and Analytics
OperationalOptimizations
DataIntegrity
Workload Managemen
t
Transaction Support
Real-time Performan
ce
Exposes Hadoop limitations
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
The Case for Operational SQL-on-Hadoop• Sector Road Map: SQL-on-Hadoop platforms in 2013 –
Joseph Turian, March 20, 2013– An operational database offers write access, not just read access, to data. However,
there are other key features for an operational database: concurrency, interactive write speed, and distributed transactional support (guarantees about data consistency). Currently no existing SQL-on-Hadoop solution satisfies these requirements. If a strong player or two emerges in the category, it will completely shake up the big data and database landscape.
• 5 Reasons Hadoop is Kicking Can and Taking Names – Mike Gualtieri, October 22, 2013– #5 The future of Hadoop is real-time and transactional. The key commercial vendors
are focusing on fast SQL access, real-time streaming, and manageability features that enterprises demand. The groundwork is being laid for an eruption in data management technologies as Hadoop sneaks its way into the transactional database market.
• • The Future of Hadoop: What Happened & What's Possible? –
Doug Cutting, Oct 30 2013 – So I think the prediction we can make here is that it is inevitable that we will see just
about every kind of workload be moved to this platform – even Online Transaction Processing.
5 HP PRIVATE © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Operational SQL on Hadoop – Use cases
• Integration of structured, semi-structured, and unstructured support
• Integration of operational, historical, & external (Big) data along common master data for better insights
Item idDescriptionCost Price…Structured
TypeDisplay SizeResolutionBrand Model 3D …
…ISBN AuthorPublish DateFormat Dept
TVBoo
k…
Semi- structured
SELECT all TVs WHERE Price > 2000 and Type = ‘Plasma’ and Display Size > ‘50’ and customer sentiment is very positive
Unstructured
Image…Review…
Open distributed
HDFS structures HBase &
Hive
Free at last!
Capture data directly into open file
structures
Accessible for reporting & analytics
with no latency
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
• Create album
• Upload / Import pictures into album
• Create a project / photo book
• Share album / project with family / friends
Asset Management
• Print Calendars, Cards, …
• Order prints, mugs, linen, jewelry, cases, covers, cards, teddy bears, …
Shopping Trafodion
OLTP on Hadoop
Snapfish – Web-based photo sharing and photo printing. Members can upload files for free with unlimited photo storage. They can share photo albums, individual photos, and various Snapfish products via email, link URL, and other web services such as Facebook and Blogger. They can buy personalized photo products such as prints, photo books, cards and mugs. Supports retail pickup at Meijer, Walgreens and Walmart.
Versus RDBMS & NoSQL• High concurrency
low latency workloads
• Limitless elastic scale
• Very low TCO
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
Trafodion
Create albumINSERT into Trafodion table ALBUM (cust_id, album_id, album_name, …)
Upload picturesPictures loaded into HDFS by appBEGIN WORK INSERT list of pictures uploaded into Trafodion table PIC (cust_id, album_id, pic_id, pic_date, …) INSERT picture attributes from camera into HBase table PIC_ATTR as col-value pairs for each of the pictures using pic_idEND WORK
Transaction
Tag picturesBEGIN WORK INSERT custom tags for each tagged picture into HBase table PIC_ATTR as col-value pairsEND WORK
Share picturesINSERT into Trafodion table REL (cust_id, rel_with_cust_id, rel-type, …)BEGIN WORK INSERT list of pictures shared into Trafodion table SHARED_PIC (pic_id, rel_with_cust_id)END WORK
Order photo mug & jewelryBEGIN WORK INSERT into ORDER (cust_id, order_no, order_date, order_total, …) INSERT into ORDER_DETAIL all items that are part of the order (cust_id, order_no, item_id, pic_id, qty, amt, …)END WORK
Search for picturesSELECT pictures taken with my “Sony DSC-RX100M2” camera in the last 6 months from my “Travel” album with a tag “Emma” on it.
Backend operational workloadsOrder tracking, supply chain, inventory control, …
Versus RDBMS & NoSQL• Rich ANSI SQL RDBMS
features• Full ACID transactional
support• Integration of structured,
semi-structured, & unstructured data
Autonomy could be used to analyze the pictures in HDFS to automatically create tags to be stored in HBase PIC_ATTR
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Trafodion
Vertica
Reporting & Analytics via
Vertica
Analytics in Vertica to generate
recommendation model
Web
app
Using model & customer
score / attributes, and
recent purchase
history make recommendati
ons
Rohit, consider a blanket for your granddaughter at
50% off with her image imprinted on it
50%
BI reporting• Sales growth by product, region, demo •Growth in customers, pictures, storage, …•Growth in sharing•…
Analytics• Items bought together – market basket analysis• Promotion success customer classification•…
Versus RDBMS & NoSQL• Data captured in an open file system with
open APIs• Is available with no latency for reporting &
analysis• Via a huge open source & proprietary Hadoop
eco-system
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
Use case examples
• Online financial management
Finance
• Billing systems
• Provisioning systems
Telecom
• RFID tracking
Manufacturing
• Smart Metering
Energy
• Authorization and claims processing
Healthcare
• 911 Emergency System
Government
• Reservation systems
Transportation
• Online shopping
Consumer & Retail
Multi-Structured
Data
ACID Protection,
Data Integrity
Low Latency, High
Concurrency
Generates Revenue
Touches the Customer
Helps Run the Business
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
Comprehensive DDL, DML, TCL, and utility support • ANSI Core SQL 99 complaint + other SQL 99 and SQL 2003 support with
Trafodion extensions• Full featured DDL - CREATE/DROP/ALTER statements for tables, views,
indexes, constraints– Comprehensive data type support - numeric, character, varchar, date, time, interval– Unicode encoding including UTF8, UCS2, and ISO8859-1for user data; UTF8 for
metadata
• Full featured DML – SELECT, INSERT, UPDATE, DELETE, UPSERT and MERGE statements– JOIN (INNER, LEFT/RIGHT/FULL OUTER), UNION, WHERE, GROUP BY, HAVING, ORDER
BY, SAMPLING, etc.– Correlated and nested sub-queries– Cursor support (non-holdable)– Extensive SQL function support - aggregate, date/time, character, mathematical,
OLAP, sequence, etc.
• Utilities – Update Statistics, Explain, Control Query Shape, Command Line Interface
• Transaction Control – BEGIN WORK, COMMIT WORK, ROLLBACK WORK, SET TRANSACTION
• Work-in-progress: Stability, Performance, Triggers, Referential Integrity, C++ UDFs, Java Stored Procedures, Bulk Loader, node/system failure Transaction Recovery, Grant/Revoke, …
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
4 F A … …
4 F B … …
5 F A … …
5 F B … …
5 F C … …
6 F A … …
7 F A … …
7 F B … …
7 F C … …
8 F B … …
9 F A … …
9 F B … …
9 F C … …
1 F A … …
1 F B … …
1 F C … …
2 F A … …
2 F C … …
3 F C … …
RK CF CN TS CV
1 F A … …
1 F B … …
1 F C … …
2 F A … …
2 F C … …
3 F C … …
4 F A … …
4 F B … …
5 F A … …
5 F B … …
5 F C … …
6 F A … …
7 F A … …
7 F B … …
7 F C … …
8 F B … …
9 F A … …
9 F B … …
9 F C … …
Leveraging HBase for scalability and availability Region
Server Layer
RegionsPhysical Layout
TableLogical View
HB
ase
Trafo
dio
n
Client
• Regions store contiguous ranges of table rows
• Regions dynamically split by HBase when they reach a configured limit i.e. “autosharding”
• Region servers are elastically scalable
• HDFS and HBase replication provide enhanced data availability and protection
Allows
• Fine-grained load balancing with dynamic movement based on load
• Fast data recovery when a server or disk fails or is decommissioned
Region Server
HDFSRegion Server
HDFSRegion Server
HDFSRegion Server
HDFSRegion Server
HDFS
RK Row Key
CFColumn Family
“F”
CNColumn Name
A, B, C
TS TimestampOne version
CV Cell Value
Clustering key
Data in different Column Families are stored separately
RK A B C… … … ….
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
HBase vs. Trafodion comparison
HBase Trafodion + HBase
Data abstraction Key and value pair Relational schema
Physical Layout Column family store where row data is stored together by cells
Same except there is a single column family with space-saving column encoding
Column values Uninterpreted array of bytes
Explicitly defined and enforced data types
ACID Guarantee Single row atomicity Multi- SQL statements, tables, and rows defined as part of transaction
Language API Get/put/delete SQL (Trafodion invokes native HBase API)
Row Key Index Single (string) row key Composite (multi-column) row key
Secondary Indexes
Not supported Arbitrary secondary key columns
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
Salting of row keys How it works• HBase table gets created, pre-split with
one region per salt value• A hash value column, “_SALT_”, is added
as a prefix to the row key• Salting is transparent to SQL statements
– Automatically computed during insert/update statements
– Predicates automatically generated where feasible
– Minimal overhead for direct lookup by key value
Benefits• Even data distributions across HBase
regions• Avoids region hotspots caused by insertion
of data in row key order
CREATE TABLE t(a integer not null primary key, b integer) SALT USING 4 PARTITIONS;
HBase Region
HDFS
HBase Region
HDFS
HBase Region
HDFS
HBase Region
HDFS
INSERT(s) SELECT(s)
PART 1 PART 2 PART 3 PART 4
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Trafodion – Software architecture (3 layers)
JDBC ODBC
User and ISV Operational Applications
Driver
Client
SQL
Storage Engine
*ESP
CMP Master
ESPDTM
WMS
Compiler and Optimizer Workload Management
SQL ParallelismDistributed Transaction
Management
. . . .
FutureDatabase Connectivity
HBase
Relational Schema
Trafodion Tables
HDFS
Data Store Integratio
nHBase
Native HBase Tables KVS,
Columnar via HBase API +
coprocessors
Hive
Direct HDFS access to Hive tables using
HCatalog
*Executor Server Process
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
Optimized for varying operational workloads• Optimized performance and efficiency
– OLT optimization for directed keyed access
– ESPs for parallel SQL operations– Multi-layered ESP for complex plans– Data Flow pipeline parallel architecture– Reusable Masters and ESPs for efficiency
• Cached SQL plans eliminates recompilation
• Sophisticated Optimizer– Leverages Equal Height Histograms– Pushes down
• Filters e.g. row selection (start-stop key)
• Coprocessors e.g. aggregates
• Multi-Dimensional Access (MDAM)
– Secondary index access
• Service persistence (via Zookeeper) and automatic query resubmission
Node 1 Node 2 Node n
Client Application
HDFS
HBase HBase HBaseFilters
HDFS HDFS HDFS HDFS
Ethernet
Coprocessors
Master
ESP ESP ESP ESP ESP
ESP ESP ESP ESP ESP
Master
Type 2 and Type 4 ODBC/JDBC driver
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
Trafodion: an Enterprise class operational SQL-on-Hadoop DBMS
In Summary
Structured Relational
DBMS
• Lower cost – inexpensive storage & servers
• Elastic scalability• Open and distributed file system
(HDFS)• Semi-structured & unstructured
support• Schema flexibility• Automatic data repartitioning• High availability via replication (k-
safety)• Disaster Recovery1
• Column level access control• Column level encryption• Space quotas1
• Vast open source & proprietary eco-system
• Versioning snapshot support & incremental data replication
• Cloud deployable• Industry push for Hadoop Data
Lake
K/V & document
stores
Unstructured analytics
• Innovative database engines for OLTP, ODS, and EDW (20+ years investment)
• Comprehensive ANSI SQL support• Structured data support (schema)• ACID transactional protection for
multiple rows, tables, statements, region updates
• Support for nested loop, merge, hash joins
• Optimized execution plans via incremental equal height histograms
• Efficient data flow architecture• Grant/Revoke Security support• UDFs for Complex Event processing• Workload Management• Enterprise class monitoring &
manageability
HP DBMS
Trafodion
Able to join Trafodion, HBase, Hive tables in a
single statement
• Compound primary/secondary keys
• Encoding column names for compaction
• Salting to eliminate I/O hotspots
Operational SQL on
Hadoop 1Select distributions
17 HP PRIVATE © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
See for yourself…Come discover and develop on
Trafodion
www.trafodion.org
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank You