What do we do?
Old hardware
1990
BTree File systems
RAID
Small databases
BTree indexes
What do we do?
BTree file systems
2010
New hardware
RAID
Write-optimised indexes
Distributed, shared-nothing databases
BTree file systems
New hardware
RAID
Write-optimised indexes
...
What do we do?
Castle
2011
Distributed, shared-nothing databases
New hardware
Castle
New hardware
...
What does this have to do with
Functional Programming?
Big Data Applications
Cross-Cluster Management UI
Am
azon S
3 c
om
pat
ible
...
Acunu Storage Core
Open API
Management
Deployment
Monitoring
......
...
...... ............
Java,Erlang,
COCaml
CPython, Bash,Perl
Management Stack
Miscd AlertsDFSd
Version
Collection
Disk
NamedObjects
Base
Castle
Routerdenumeration, routing, clustering
HTML5/JavaScript User Interface
Autogeneranted OCaml CLI
External Monitoring Tools (Munin etc)
Cassandrad
Keyspace
ColumnFamily
Clusterd
Cassandra
Host
Group
ServiceCassandra_Node
S3d
BigS3
S3_Node
Bucket
Another Routerdon a different machine
Filesystem
Statsd
Report
Stat
Source
Default_Report
Alert_Rule
Alert
Miscd AlertsDFSd
Version
Collection
Disk
NamedObjects
Base
Castle
Routerdenumeration, routing, clustering
HTML5/JavaScript User Interface
Autogeneranted OCaml CLI
External Monitoring Tools (Munin etc)
Cassandrad
Keyspace
ColumnFamily
Clusterd
Cassandra
Host
Group
ServiceCassandra_Node
S3d
BigS3
S3_Node
Bucket
Another Routerdon a different machine
Filesystem
Statsd
Report
Stat
Source
Default_Report
Alert_Rule
Alert
Bridges to other systems
Miscd AlertsDFSd
Version
Collection
Disk
NamedObjects
Base
Castle
Routerdenumeration, routing, clustering
HTML5/JavaScript User Interface
Autogeneranted OCaml CLI
External Monitoring Tools (Munin etc)
Cassandrad
Keyspace
ColumnFamily
Clusterd
Cassandra
Host
Group
ServiceCassandra_Node
S3d
BigS3
S3_Node
Bucket
Another Routerdon a different machine
Filesystem
Statsd
Report
Stat
Source
Default_Report
Alert_Rule
AlertClustering
Failure Detection
Monitoring
Alerting
Miscd AlertsDFSd
Version
Collection
Disk
NamedObjects
Base
Castle
Routerdenumeration, routing, clustering
HTML5/JavaScript User Interface
Autogeneranted OCaml CLI
External Monitoring Tools (Munin etc)
Cassandrad
Keyspace
ColumnFamily
Clusterd
Cassandra
Host
Group
ServiceCassandra_Node
S3d
BigS3
S3_Node
Bucket
Another Routerdon a different machine
Filesystem
Statsd
Report
Stat
Source
Default_Report
Alert_Rule
Alert
Routing & Aggregation
Successes / Failures
Prototype “Filesystem”
• CoW BTrees
• Mod List BTrees
• LSM Trees
• Doubling Arrays
• Fractional Cascading
• Stratified DAs
• Multidimensional keys
• Z curve packing
Aim: Investigate algorithms for KV
storage
Doubling Array
2
9
2 9
Doubling Array
11
8 8 11
2 9 2 8 9 11
Inserts
etc...
Similar to log-structured merge trees (LSM), cache-oblivious lookahead array (COLA), ...
https://acunu-videos.s3.amazonaws.com/dajs.html
Demo
B = “block size”, say 8KB at 100 bytes/entry ~= 100 entries
Update Range Query(Size Z)
Log Structured B-Tree
O(logB N)random IOs
O(Z/B) random IOs
Doubling Array O((log N)/B)sequential IOs
O(Z/B) sequential IOs
~ log (2^30)/log 100= 5 IOs/update
~ log (2^30)/100= 0.2 IOs/update
8KB @ 100MB/s = 13k IOs/s
8KB @ 100MB/s, w/ 8ms seek = 100 IOs/s
13k / 0.2 = 65k updates/s
100 / 5 = 20 updates/s
BTree Disk Trace
Time (s)
Bloc
k In
dex
Time (secs)
Bloc
k In
dex
Doubling Array Disk Trace
# inserted kvps
Inse
rtio
n R
ate
(kvp
s/s)
OCaml Prototype Performance
The Dark Side...
Java Prototype Performance
Time (s)
Inse
rt R
ate
(key
s/s)
What about Castle?
Castle Performance
One more thing...
SNAPSHOTS*
* And clones!
I’ll explain how....
http://bit.ly/rduBia
“Castle: Re-inventing Storage For Big Data”
London, 27th September
@tom_wilkie
http://www.acunu.comhttp://bitbucket.org/acunuhttp://github.com/acunu
References[LSM] The Log-Structured Merge-Tree (LSM-Tree)Patrick O'Neil, Edward Cheng, Dieter Gawlick, Elizabeth O'Neil
http://staff.ustc.edu.cn/~jpq/paper/flash/1996-The%20Log-Structured%20Merge-Tree%20%28LSM-
Tree%29.pdf
[COLA] Cache-Oblivious Streaming B-trees, Michael A. Bender et al
http://www.cs.sunysb.edu/~bender/newpub/BenderFaFi07.pdf
[DSST] Making Data Structures Persistent - J. R. Driscoll, N. Sarnak, D. D. Sleator, R. E. Tarjan, Making Data Structures Persistent, Journal of Computer and System Sciences, Vol. 38, No. 1, 1989
http://www.cs.cmu.edu/~sleator/papers/making-data-structures-persistent.pdf
Stratified B-trees and versioned dictionaries, - Andy Twigg, Andrew Byde, Grzegorz Miłoś, Tim Moreton, John Wilkes, Tom Wilkie, HotStorage’11
http://www.usenix.org/event/hotstorage11/tech/final_files/Twigg.pdf
[RDA] Random duplicate storage strategies for load balancing in multimedia servers, 2000, Joep Aerts and Jan Korst and Sebastian Egner
http://www.win.tue.nl/~joep/IPL.ps
Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos are trademarks of the
Apache Software Foundation.