High-Performance Storage Services with HailDB and Java

High-Performance Storage Services

With Java and HailDB

Sunny GleasonApril 14, 2011

whoami

• Sunny Gleason, human

• passion: distributed systems engineering

• previous... Ning : custom social networks Amazon.com : infra & web services

• now... building cloud infrastructure

whereami

• twitter : twitter.com/sunnygleason

• github : github.com/sunnygleason

• linkedin : linkedin.com/in/sunnygleason

• slideshare : slideshare.net/sunnygleason

http://twitter.com/sunnygleason

http://twitter.com/sunnygleason

http://github.com/sunnygleason

http://github.com/sunnygleason

http://linkedin.com/in/sunnygleason

http://linkedin.com/in/sunnygleason

http://slideshare.net/sunnygleason

http://slideshare.net/sunnygleason

what’s in this presentation?

• MySQL & NoSQL as Inspiration

• HailDB & InnoDB

• JNA: Integration with Java

• St8 : A REST-Enabled Data Store

• A Handful of Nifty Applications

• Results & Next Steps

prior art

• Mad props to:

• MySQL & InnoDB teams for creating InnoDB and Embedded InnoDB

• Stewart Smith & Drizzle folks for leading the HailDB charge and encouraging plugin apis

• Nokia & Percona for publishing results of their Voldemort / MySQL integration

• Basho for publishing Riak / InnoStore integration

MySQL & InnoDB

• Super-Efficient Database Server

• Tried & True Replication

• Bulletproof Durability (when configured correctly)

• Fantastic Stability, Predictability & Insight into Operation

motivation

• database on 1 box : ok

• database with master/slave replication : ok

• database on cluster : tricky

• database on SAN : scary

NoSQL

• “Not Only” SQL

• What’s the point?

• Proponent: “reaching next level of scale”

• Cynic: “cloud is hype, ops nightmare”

what does it gain?

• Higher performance, scalability, availability

• More robust fault-tolerance

• Simplified systems design

• Easier operations

what does it lose?

• Reduced / simplified programming model

• No ad-hoc queries, no joins, no txns

• Not ACID: Weakened Atomicity / Consistency / Isolation / Durability

• Operations / management is still evolving

• Challenging to quantify health of system

• Fewer domain experts

NoSQL Map

NoSQL

Key-ValueStore

KV Stores(durable)

KV Stores(volatile)

Dynamo,Voldemort,

Riak

Memcached,Redis

ColumnStore Cassandra,

BigTable,HBase

GraphStore

DocumentStore

CouchDB,MongoDB

Neo4J

durable vs. volatile

• RAM is ridiculous speed (ns), not durable

• Disk is persistent and slow (3-7ms)

• RAID eases the pain a bit (4-8x throughput)

• SSD is providing good promise (100-300us)

• FusionIO is redefining the space (30-100us)

performance &operational complexity*

Com

plex

ity

Aggregate Operations / Sec

1K 10K 100K 1M

MySQL

+SSD

+FusionIO

+ Sharding

Memcached

+ClusterVoldemort

* This is not a real graph

just a thought...

What if we could use the highly optimized & durable ‘guts’ of MySQL without having to go through JDBC & SQL?

enter HailDB

• use case: Voldemort Storage Engine

• let’s evaluate relative to other NoSQL options

• focus on stability & predictability of performance

• Graphs are throughput (ops/sec) vs. time

Voldemort schema

_key VARBINARY(200)

_version VARBINARY(200)

_value BLOB

PRIMARY KEY(_key, _version)

experimental setup

• OS X: 8-Core Xeon, 32GB RAM, 200GB OWC SSD

• Faban Benchmark : PUT 64-byte key, 1024-byte value

• Scenarios:1, 2, 4, 8 threads

• 512M Java Heap

BDB-JE

• Log-Structured B-Tree

• Fast Storage When Mostly Cached

• Configured without fsync() by default - writes are batched and flushed periodically

Perf: BDB Put 100%

Krati

• Fast Hash-Oriented Storage

• Uses memory-mapped files for speed

• Configured without fsync() by default - writes are batched and flushed periodically

Perf: Krati Put 100%

Perf: HailDB Put 100%

HailDB & Java

• g414-haildb : where the magic happens

• Open Source on GitHub

• uses JNA: Java Native Access

• dynamic binding to libhaildb shared library

• auto-generate initial Java class from .h file (w/ JNAerator)

• Pointer classes & other shenanigans

implementation gotchas

• InnoDB API-level usage is unclear

• Synchronization & locking is unclear

• Therefore... I learned to love reading C

• Error handling is *nasty*

• Native library installation a bit of a pain (need to configure LD_LIBRARY_PATH)

kinder, friendlier APIs• Level 0: JNA bindings

int err = ib_dostuff();

• Level 1: Object-Oriented Transaction t = db.openTransaction(); t.commit();

• Level 2: Templated dbt.inTransaction() { dbt.insert(value); }

• Level 3: Functional Maps, Iteration, Filters, Apply

St8 Server• HTTP-enabled Access to HailDB

• PUT /1.0/t/mytable{

"columns":[ {"name":"a","type":"INT","length":4}, {"name":"b","type":"INT","length":8}, {"name":"c","type":"BLOB","length":0},],"indexes":[ { "name":"P", "clustered":true,"unique":true, "indexColumns":[{"name":"a"}] }]}

rest-enabled access

• GET /1.0/d/mytable;a=0

• POST /1.0/d/mytable;a=1;b=42;c=xyz

• PUT /1.0/d/mytable;a=1;b=43;c=abc

• DELETE /1.0/d/mytable;a=0

*This is matrix-param style, can also use form data style for specifying data

cursors & iterators

• GET /1.0/i/mytable.P?q=a+ge+4

• GET /1.0/i/mytable.SecIndex?q=b+le+4

• GET /1.0/i/mytable.SecIndex?q=b+le+4&s=abce1212121ceeee2120911

• “s” value is opaque index key of next page of results - way better than LIMIT/OFFSET!(since HailDB can seek directly to the row)

result

• REST API provides fun, straightforward access from Ruby, Python, Java, Command-line...

• very easy benchmarking with HTTP-based performance tools

• range query support, and more efficient iteration model for large result sets than MySQL provides

high-performance counts

• GET /1.0/counts/mykey0

• POST /1.0/counts/mykey[?inc=1]1

• POST /1.0/counts/mykey?inc=4243

• DELETE /1.0/counts/mykey

counts schema

• HailDB count service schema _id int 8-byte unsigned, _key_hash int 8-byte unsigned, _key varchar(80), _count int 8-byte unsigned

primary key (“_id”) unique key (“_key_hash”, “key”)

raid0 put counts

ssd put counts

raid0 put/get

ssd put/get

operation: graph store

• Social networks, recommendations, any relation you can think of

• Which would you prefer?

• SQL adjacency list, stored procedure, custom storage engine, external (Memcached), ...

• Graph-aware HailDB application in Java

nifty graph store 1

GET /1.0/graph/bfs?a=1&maxDepth=3=> [[1, 0], [2, 1], [3, 2], [4, 3], [5, 3]]

1

23

456

8

nifty graph store 21 2 3 4

5 6

8

GET /1.0/graph/topo?a=1&a=5&a=8=> [8, 6, 4, 3, 2, 5, 1]

nifty recovery tool(Just an idea)

• for recovery: shut down mysql server

• run HailDB-enabled recovery tool

• export as JSON or whatever

wrap-up

• HailDB & InnoDB are phenomenal

• With g414-haildb, can be integrated directly into applications running on the JVM

• All the InnoDB tuning tricks apply

• Opens up new applications that are tricky with a traditional SQL database

resources

• github.com/sunnygleason/g414-st8github.com/sunnygleason/g414-haildb

• haildb.com

• jna.dev.java.net

http://github.com/sunnygleason/g414-st8

http://github.com/sunnygleason/g414-st8

http://github.com/sunnygleason/g414-haildb

http://github.com/sunnygleason/g414-haildb

http://haildb.com

http://haildb.com

http://jna.java.net/

http://jna.java.net/

Questions? Thank You!

bonus material!

• we probably didn’t get this far in the live presentation; the following material is here for eager, brave & interested folks...

future work

• Improve Packaging / Installation

• Codify schema refinements & perf enhancements

• Online backup/export with XtraBackup

• JNI Bindings

• PBXT explorations

InnoDB tuning• Skinny columns, skinny rows! (esp. Primary Key)

• Varchar enum ‘bad’, enum, int or smallint ‘good’

• fixed-width rows allow in-place updates

• Use covering indexes strategically

• More data per page means faster index scans, more efficient buffer pool utilization

• You only get so many trx’s (read & write) on given CPU/RAM configuration - benchmark this!

• Strategically offload reads to Memcached/Redis

HailDB schema

_key VARBINARY(200)


_value BLOB

PRIMARY KEY(_key, _version)

refined schema_id BIGINT (auto increment)

_key_hash BIGINT

_key VARBINARY(200)


_value BLOB

PRIMARY KEY(_id)

KEY(_key_hash)

online backup

• hot backup of data to other machine / destination

• test Percona Xtrabackup with HailDB

• next step: backup/export to Hadoop/HDFS(similar to Cloudera Sqoop tool)

JNI bindings

• JNI can get 2-5x perf boost vs. JNA

• ... at the expense of nasty code

• Will go for schema optimizations and InnoDB tuning tips *first*

Thank You!

High-Performance Storage Services with HailDB and Java

Technology

byte key

haildb java g414haildb

key varbinary200

java st8

key varchar80

keyvalue durablevoldemort

value blobprimary key

inspiration haildb innodb