High-Performance Storage Services With Java and HailDB Sunny Gleason April 14, 2011
May 10, 2015
High-Performance Storage Services
With Java and HailDB
Sunny GleasonApril 14, 2011
whoami
• Sunny Gleason, human
• passion: distributed systems engineering
• previous... Ning : custom social networks Amazon.com : infra & web services
• now... building cloud infrastructure
whereami
• twitter : twitter.com/sunnygleason
• github : github.com/sunnygleason
• linkedin : linkedin.com/in/sunnygleason
• slideshare : slideshare.net/sunnygleason
what’s in this presentation?
• MySQL & NoSQL as Inspiration
• HailDB & InnoDB
• JNA: Integration with Java
• St8 : A REST-Enabled Data Store
• A Handful of Nifty Applications
• Results & Next Steps
prior art
• Mad props to:
• MySQL & InnoDB teams for creating InnoDB and Embedded InnoDB
• Stewart Smith & Drizzle folks for leading the HailDB charge and encouraging plugin apis
• Nokia & Percona for publishing results of their Voldemort / MySQL integration
• Basho for publishing Riak / InnoStore integration
MySQL & InnoDB
• Super-Efficient Database Server
• Tried & True Replication
• Bulletproof Durability (when configured correctly)
• Fantastic Stability, Predictability & Insight into Operation
motivation
• database on 1 box : ok
• database with master/slave replication : ok
• database on cluster : tricky
• database on SAN : scary
NoSQL
• “Not Only” SQL
• What’s the point?
• Proponent: “reaching next level of scale”
• Cynic: “cloud is hype, ops nightmare”
what does it gain?
• Higher performance, scalability, availability
• More robust fault-tolerance
• Simplified systems design
• Easier operations
what does it lose?
• Reduced / simplified programming model
• No ad-hoc queries, no joins, no txns
• Not ACID: Weakened Atomicity / Consistency / Isolation / Durability
• Operations / management is still evolving
• Challenging to quantify health of system
• Fewer domain experts
NoSQL Map
NoSQL
Key-ValueStore
KV Stores(durable)
KV Stores(volatile)
Dynamo,Voldemort,
Riak
Memcached,Redis
ColumnStore Cassandra,
BigTable,HBase
GraphStore
DocumentStore
CouchDB,MongoDB
Neo4J
durable vs. volatile
• RAM is ridiculous speed (ns), not durable
• Disk is persistent and slow (3-7ms)
• RAID eases the pain a bit (4-8x throughput)
• SSD is providing good promise (100-300us)
• FusionIO is redefining the space (30-100us)
performance &operational complexity*
Com
plex
ity
Aggregate Operations / Sec
1K 10K 100K 1M
MySQL
+SSD
+FusionIO
+ Sharding
Memcached
+ClusterVoldemort
* This is not a real graph
just a thought...
What if we could use the highly optimized & durable ‘guts’ of MySQL without having to go through JDBC & SQL?
enter HailDB
• use case: Voldemort Storage Engine
• let’s evaluate relative to other NoSQL options
• focus on stability & predictability of performance
• Graphs are throughput (ops/sec) vs. time
Voldemort schema
_key VARBINARY(200)
_version VARBINARY(200)
_value BLOB
PRIMARY KEY(_key, _version)
experimental setup
• OS X: 8-Core Xeon, 32GB RAM, 200GB OWC SSD
• Faban Benchmark : PUT 64-byte key, 1024-byte value
• Scenarios:1, 2, 4, 8 threads
• 512M Java Heap
BDB-JE
• Log-Structured B-Tree
• Fast Storage When Mostly Cached
• Configured without fsync() by default - writes are batched and flushed periodically
Perf: BDB Put 100%
Krati
• Fast Hash-Oriented Storage
• Uses memory-mapped files for speed
• Configured without fsync() by default - writes are batched and flushed periodically
Perf: Krati Put 100%
Perf: HailDB Put 100%
HailDB & Java
• g414-haildb : where the magic happens
• Open Source on GitHub
• uses JNA: Java Native Access
• dynamic binding to libhaildb shared library
• auto-generate initial Java class from .h file (w/ JNAerator)
• Pointer classes & other shenanigans
implementation gotchas
• InnoDB API-level usage is unclear
• Synchronization & locking is unclear
• Therefore... I learned to love reading C
• Error handling is *nasty*
• Native library installation a bit of a pain (need to configure LD_LIBRARY_PATH)
kinder, friendlier APIs• Level 0: JNA bindings
int err = ib_dostuff();
• Level 1: Object-Oriented Transaction t = db.openTransaction(); t.commit();
• Level 2: Templated dbt.inTransaction() { dbt.insert(value); }
• Level 3: Functional Maps, Iteration, Filters, Apply
St8 Server• HTTP-enabled Access to HailDB
• PUT /1.0/t/mytable{
"columns":[ {"name":"a","type":"INT","length":4}, {"name":"b","type":"INT","length":8}, {"name":"c","type":"BLOB","length":0},],"indexes":[ { "name":"P", "clustered":true,"unique":true, "indexColumns":[{"name":"a"}] }]}
rest-enabled access
• GET /1.0/d/mytable;a=0
• POST /1.0/d/mytable;a=1;b=42;c=xyz
• PUT /1.0/d/mytable;a=1;b=43;c=abc
• DELETE /1.0/d/mytable;a=0
*This is matrix-param style, can also use form data style for specifying data
cursors & iterators
• GET /1.0/i/mytable.P?q=a+ge+4
• GET /1.0/i/mytable.SecIndex?q=b+le+4
• GET /1.0/i/mytable.SecIndex?q=b+le+4&s=abce1212121ceeee2120911
• “s” value is opaque index key of next page of results - way better than LIMIT/OFFSET!(since HailDB can seek directly to the row)
result
• REST API provides fun, straightforward access from Ruby, Python, Java, Command-line...
• very easy benchmarking with HTTP-based performance tools
• range query support, and more efficient iteration model for large result sets than MySQL provides
high-performance counts
• GET /1.0/counts/mykey0
• POST /1.0/counts/mykey[?inc=1]1
• POST /1.0/counts/mykey?inc=4243
• DELETE /1.0/counts/mykey
counts schema
• HailDB count service schema _id int 8-byte unsigned, _key_hash int 8-byte unsigned, _key varchar(80), _count int 8-byte unsigned
primary key (“_id”) unique key (“_key_hash”, “key”)
raid0 put counts
ssd put counts
raid0 put/get
ssd put/get
operation: graph store
• Social networks, recommendations, any relation you can think of
• Which would you prefer?
• SQL adjacency list, stored procedure, custom storage engine, external (Memcached), ...
• Graph-aware HailDB application in Java
nifty graph store 1
GET /1.0/graph/bfs?a=1&maxDepth=3=> [[1, 0], [2, 1], [3, 2], [4, 3], [5, 3]]
1
23
456
8
nifty graph store 21 2 3 4
5 6
8
GET /1.0/graph/topo?a=1&a=5&a=8=> [8, 6, 4, 3, 2, 5, 1]
nifty recovery tool(Just an idea)
• for recovery: shut down mysql server
• run HailDB-enabled recovery tool
• export as JSON or whatever
wrap-up
• HailDB & InnoDB are phenomenal
• With g414-haildb, can be integrated directly into applications running on the JVM
• All the InnoDB tuning tricks apply
• Opens up new applications that are tricky with a traditional SQL database
resources
• github.com/sunnygleason/g414-st8github.com/sunnygleason/g414-haildb
• haildb.com
• jna.dev.java.net
Questions? Thank You!
bonus material!
• we probably didn’t get this far in the live presentation; the following material is here for eager, brave & interested folks...
future work
• Improve Packaging / Installation
• Codify schema refinements & perf enhancements
• Online backup/export with XtraBackup
• JNI Bindings
• PBXT explorations
InnoDB tuning• Skinny columns, skinny rows! (esp. Primary Key)
• Varchar enum ‘bad’, enum, int or smallint ‘good’
• fixed-width rows allow in-place updates
• Use covering indexes strategically
• More data per page means faster index scans, more efficient buffer pool utilization
• You only get so many trx’s (read & write) on given CPU/RAM configuration - benchmark this!
• Strategically offload reads to Memcached/Redis
HailDB schema
_key VARBINARY(200)
_version VARBINARY(200)
_value BLOB
PRIMARY KEY(_key, _version)
refined schema_id BIGINT (auto increment)
_key_hash BIGINT
_key VARBINARY(200)
_version VARBINARY(200)
_value BLOB
PRIMARY KEY(_id)
KEY(_key_hash)
online backup
• hot backup of data to other machine / destination
• test Percona Xtrabackup with HailDB
• next step: backup/export to Hadoop/HDFS(similar to Cloudera Sqoop tool)
JNI bindings
• JNI can get 2-5x perf boost vs. JNA
• ... at the expense of nasty code
• Will go for schema optimizations and InnoDB tuning tips *first*
Thank You!