Top Banner
High-Performance Storage Services With Java and HailDB Sunny Gleason April 14, 2011
50

High-Performance Storage Services with HailDB and Java

May 10, 2015

Download

Technology

sunnygleason

High-Performance Storage Services with HailDB and Java
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High-Performance Storage Services with HailDB and Java

High-Performance Storage Services

With Java and HailDB

Sunny GleasonApril 14, 2011

Page 2: High-Performance Storage Services with HailDB and Java

whoami

• Sunny Gleason, human

• passion: distributed systems engineering

• previous... Ning : custom social networks Amazon.com : infra & web services

• now... building cloud infrastructure

Page 3: High-Performance Storage Services with HailDB and Java

whereami

• twitter : twitter.com/sunnygleason

• github : github.com/sunnygleason

• linkedin : linkedin.com/in/sunnygleason

• slideshare : slideshare.net/sunnygleason

Page 4: High-Performance Storage Services with HailDB and Java

what’s in this presentation?

• MySQL & NoSQL as Inspiration

• HailDB & InnoDB

• JNA: Integration with Java

• St8 : A REST-Enabled Data Store

• A Handful of Nifty Applications

• Results & Next Steps

Page 5: High-Performance Storage Services with HailDB and Java

prior art

• Mad props to:

• MySQL & InnoDB teams for creating InnoDB and Embedded InnoDB

• Stewart Smith & Drizzle folks for leading the HailDB charge and encouraging plugin apis

• Nokia & Percona for publishing results of their Voldemort / MySQL integration

• Basho for publishing Riak / InnoStore integration

Page 6: High-Performance Storage Services with HailDB and Java

MySQL & InnoDB

• Super-Efficient Database Server

• Tried & True Replication

• Bulletproof Durability (when configured correctly)

• Fantastic Stability, Predictability & Insight into Operation

Page 7: High-Performance Storage Services with HailDB and Java

motivation

• database on 1 box : ok

• database with master/slave replication : ok

• database on cluster : tricky

• database on SAN : scary

Page 8: High-Performance Storage Services with HailDB and Java

NoSQL

• “Not Only” SQL

• What’s the point?

• Proponent: “reaching next level of scale”

• Cynic: “cloud is hype, ops nightmare”

Page 9: High-Performance Storage Services with HailDB and Java

what does it gain?

• Higher performance, scalability, availability

• More robust fault-tolerance

• Simplified systems design

• Easier operations

Page 10: High-Performance Storage Services with HailDB and Java

what does it lose?

• Reduced / simplified programming model

• No ad-hoc queries, no joins, no txns

• Not ACID: Weakened Atomicity / Consistency / Isolation / Durability

• Operations / management is still evolving

• Challenging to quantify health of system

• Fewer domain experts

Page 11: High-Performance Storage Services with HailDB and Java

NoSQL Map

NoSQL

Key-ValueStore

KV Stores(durable)

KV Stores(volatile)

Dynamo,Voldemort,

Riak

Memcached,Redis

ColumnStore Cassandra,

BigTable,HBase

GraphStore

DocumentStore

CouchDB,MongoDB

Neo4J

Page 12: High-Performance Storage Services with HailDB and Java

durable vs. volatile

• RAM is ridiculous speed (ns), not durable

• Disk is persistent and slow (3-7ms)

• RAID eases the pain a bit (4-8x throughput)

• SSD is providing good promise (100-300us)

• FusionIO is redefining the space (30-100us)

Page 13: High-Performance Storage Services with HailDB and Java

performance &operational complexity*

Com

plex

ity

Aggregate Operations / Sec

1K 10K 100K 1M

MySQL

+SSD

+FusionIO

+ Sharding

Memcached

+ClusterVoldemort

* This is not a real graph

Page 14: High-Performance Storage Services with HailDB and Java

just a thought...

What if we could use the highly optimized & durable ‘guts’ of MySQL without having to go through JDBC & SQL?

Page 15: High-Performance Storage Services with HailDB and Java

enter HailDB

• use case: Voldemort Storage Engine

• let’s evaluate relative to other NoSQL options

• focus on stability & predictability of performance

• Graphs are throughput (ops/sec) vs. time

Page 16: High-Performance Storage Services with HailDB and Java

Voldemort schema

_key VARBINARY(200)

_version VARBINARY(200)

_value BLOB

PRIMARY KEY(_key, _version)

Page 17: High-Performance Storage Services with HailDB and Java

experimental setup

• OS X: 8-Core Xeon, 32GB RAM, 200GB OWC SSD

• Faban Benchmark : PUT 64-byte key, 1024-byte value

• Scenarios:1, 2, 4, 8 threads

• 512M Java Heap

Page 18: High-Performance Storage Services with HailDB and Java

BDB-JE

• Log-Structured B-Tree

• Fast Storage When Mostly Cached

• Configured without fsync() by default - writes are batched and flushed periodically

Page 19: High-Performance Storage Services with HailDB and Java

Perf: BDB Put 100%

Page 20: High-Performance Storage Services with HailDB and Java

Krati

• Fast Hash-Oriented Storage

• Uses memory-mapped files for speed

• Configured without fsync() by default - writes are batched and flushed periodically

Page 21: High-Performance Storage Services with HailDB and Java

Perf: Krati Put 100%

Page 22: High-Performance Storage Services with HailDB and Java

Perf: HailDB Put 100%

Page 23: High-Performance Storage Services with HailDB and Java

HailDB & Java

• g414-haildb : where the magic happens

• Open Source on GitHub

• uses JNA: Java Native Access

• dynamic binding to libhaildb shared library

• auto-generate initial Java class from .h file (w/ JNAerator)

• Pointer classes & other shenanigans

Page 24: High-Performance Storage Services with HailDB and Java

implementation gotchas

• InnoDB API-level usage is unclear

• Synchronization & locking is unclear

• Therefore... I learned to love reading C

• Error handling is *nasty*

• Native library installation a bit of a pain (need to configure LD_LIBRARY_PATH)

Page 25: High-Performance Storage Services with HailDB and Java

kinder, friendlier APIs• Level 0: JNA bindings

int err = ib_dostuff();

• Level 1: Object-Oriented Transaction t = db.openTransaction(); t.commit();

• Level 2: Templated dbt.inTransaction() { dbt.insert(value); }

• Level 3: Functional Maps, Iteration, Filters, Apply

Page 26: High-Performance Storage Services with HailDB and Java

St8 Server• HTTP-enabled Access to HailDB

• PUT /1.0/t/mytable{

"columns":[  {"name":"a","type":"INT","length":4},  {"name":"b","type":"INT","length":8},  {"name":"c","type":"BLOB","length":0},],"indexes":[  {   "name":"P",   "clustered":true,"unique":true,   "indexColumns":[{"name":"a"}]  }]}

Page 27: High-Performance Storage Services with HailDB and Java

rest-enabled access

• GET /1.0/d/mytable;a=0

• POST /1.0/d/mytable;a=1;b=42;c=xyz

• PUT /1.0/d/mytable;a=1;b=43;c=abc

• DELETE /1.0/d/mytable;a=0

*This is matrix-param style, can also use form data style for specifying data

Page 28: High-Performance Storage Services with HailDB and Java

cursors & iterators

• GET /1.0/i/mytable.P?q=a+ge+4

• GET /1.0/i/mytable.SecIndex?q=b+le+4

• GET /1.0/i/mytable.SecIndex?q=b+le+4&s=abce1212121ceeee2120911

• “s” value is opaque index key of next page of results - way better than LIMIT/OFFSET!(since HailDB can seek directly to the row)

Page 29: High-Performance Storage Services with HailDB and Java

result

• REST API provides fun, straightforward access from Ruby, Python, Java, Command-line...

• very easy benchmarking with HTTP-based performance tools

• range query support, and more efficient iteration model for large result sets than MySQL provides

Page 30: High-Performance Storage Services with HailDB and Java

high-performance counts

• GET /1.0/counts/mykey0

• POST /1.0/counts/mykey[?inc=1]1

• POST /1.0/counts/mykey?inc=4243

• DELETE /1.0/counts/mykey

Page 31: High-Performance Storage Services with HailDB and Java

counts schema

• HailDB count service schema _id int 8-byte unsigned, _key_hash int 8-byte unsigned, _key varchar(80), _count int 8-byte unsigned

primary key (“_id”) unique key (“_key_hash”, “key”)

Page 32: High-Performance Storage Services with HailDB and Java

raid0 put counts

Page 33: High-Performance Storage Services with HailDB and Java

ssd put counts

Page 34: High-Performance Storage Services with HailDB and Java

raid0 put/get

Page 35: High-Performance Storage Services with HailDB and Java

ssd put/get

Page 36: High-Performance Storage Services with HailDB and Java

operation: graph store

• Social networks, recommendations, any relation you can think of

• Which would you prefer?

• SQL adjacency list, stored procedure, custom storage engine, external (Memcached), ...

• Graph-aware HailDB application in Java

Page 37: High-Performance Storage Services with HailDB and Java

nifty graph store 1

GET /1.0/graph/bfs?a=1&maxDepth=3=> [[1, 0], [2, 1], [3, 2], [4, 3], [5, 3]]

1

23

456

8

Page 38: High-Performance Storage Services with HailDB and Java

nifty graph store 21 2 3 4

5 6

8

GET /1.0/graph/topo?a=1&a=5&a=8=> [8, 6, 4, 3, 2, 5, 1]

Page 39: High-Performance Storage Services with HailDB and Java

nifty recovery tool(Just an idea)

• for recovery: shut down mysql server

• run HailDB-enabled recovery tool

• export as JSON or whatever

Page 40: High-Performance Storage Services with HailDB and Java

wrap-up

• HailDB & InnoDB are phenomenal

• With g414-haildb, can be integrated directly into applications running on the JVM

• All the InnoDB tuning tricks apply

• Opens up new applications that are tricky with a traditional SQL database

Page 42: High-Performance Storage Services with HailDB and Java

Questions? Thank You!

Page 43: High-Performance Storage Services with HailDB and Java

bonus material!

• we probably didn’t get this far in the live presentation; the following material is here for eager, brave & interested folks...

Page 44: High-Performance Storage Services with HailDB and Java

future work

• Improve Packaging / Installation

• Codify schema refinements & perf enhancements

• Online backup/export with XtraBackup

• JNI Bindings

• PBXT explorations

Page 45: High-Performance Storage Services with HailDB and Java

InnoDB tuning• Skinny columns, skinny rows! (esp. Primary Key)

• Varchar enum ‘bad’, enum, int or smallint ‘good’

• fixed-width rows allow in-place updates

• Use covering indexes strategically

• More data per page means faster index scans, more efficient buffer pool utilization

• You only get so many trx’s (read & write) on given CPU/RAM configuration - benchmark this!

• Strategically offload reads to Memcached/Redis

Page 46: High-Performance Storage Services with HailDB and Java

HailDB schema

_key VARBINARY(200)

_version VARBINARY(200)

_value BLOB

PRIMARY KEY(_key, _version)

Page 47: High-Performance Storage Services with HailDB and Java

refined schema_id BIGINT (auto increment)

_key_hash BIGINT

_key VARBINARY(200)

_version VARBINARY(200)

_value BLOB

PRIMARY KEY(_id)

KEY(_key_hash)

Page 48: High-Performance Storage Services with HailDB and Java

online backup

• hot backup of data to other machine / destination

• test Percona Xtrabackup with HailDB

• next step: backup/export to Hadoop/HDFS(similar to Cloudera Sqoop tool)

Page 49: High-Performance Storage Services with HailDB and Java

JNI bindings

• JNI can get 2-5x perf boost vs. JNA

• ... at the expense of nasty code

• Will go for schema optimizations and InnoDB tuning tips *first*

Page 50: High-Performance Storage Services with HailDB and Java

Thank You!