® Introducing TokuMX: The Performance Engine for MongoDB Leif Walsh Senior Engineer, Tokutek [email protected] @leifwalsh
Jul 03, 2015
®
Introducing TokuMX: The Performance Engine for
MongoDB
Leif Walsh Senior Engineer, Tokutek
[email protected] @leifwalsh
®
What is TokuMX?
!• TokuMX = MongoDB with improved storage
!• Drop in replacement for MongoDB v2.4 applications • Including replication and sharding • Same data model • Same query language • Drivers just work • No Full Text or Geospatial !
• Open Source – http://github.com/Tokutek/mongo
®
B-tree Limitations
RAM
RAM
DISK
22
10 99
2, 3, 4 10,20 22,25 99
Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes
Plus, mmap.
®
TokuMX : Indexed Insertion
�4
®
TokuMX : Indexed Insertion
�5
®�6
TokuMX : Concurrency (>RAM)
®�7
TokuMX : Concurrency (<RAM)
®
TokuMX : Raw Compression
�8
bittorrent data, size on disk, ~31 million inserts (lower is better)
TokuMX achieved 11.6:1 compression
®
TokuMX : Compression : Field Names
�9
synthetic data, size on disk, 100 million inserts (lower is better)
TokuMX is substantially smaller, even without
compression
®
TokuMX : Compression : Field Names
�10
synthetic data, size on disk, 100 million inserts (lower is better)
In TokuMX, field name length has almost no impact on size due to
compression
MongoDB was ~10% smaller
®
TokuMX : ACID + MVCC
• ACID – In MongoDB, multi-insertion operations allow for partial
success o Asked to store 5 documents, 3 succeeded
– In TokuMX, offer “all or nothing” behavior (atomic) • MVCC – In MongoDB, queries can be interrupted by writers. o The effect of these writers are visible to the reader
– We offer MVCC o Reads are consistent as of the operation start
�11
®�13
!• indexed insertion workload (iibench)
• http://github.com/tmcallaghan/iibench-mongodb !{ dateandtime: <date-time>,! cashregisterid: 1..1000,! customerid: 1..100000,! productid: 1..10000,! price: <double> }!
!• insert only, 1000 documents per insert, 100 million inserts • indexes
• price + customerid • cashregister + price + customerid • price + dateandtime + customerid
!
TokuMX : Indexed Insertion
®
!• Sysbench read-write workload • point and range queries, update, delete, insert
• http://github.com/tmcallaghan/sysbench-mongodb !{ _id: 1..10000000,! k: 1..10000000,! c: <120 char random string ###-###-###>,! pad: <60 char random string ###-###-###>}
�14
TokuMX : Concurrency
®
• BitTorrent Peer Snapshot Data (~31 million documents) • 3 Indexes : peer_id + created, torrent_snapshot_id + created, created !{ id: 1,! peer_id: 9222,! torrent_snapshot_id: 4,! upload_speed: 0.0000,! download_speed: 0.0000,! payload_upload_speed: 0.0000,! payload_download_speed: 0.0000,! total_upload: 0,! total_download: 0,! fail_count: 0,! hashfail_count: 0,! progress: 0.0000,! created: "2008-10-28 01:57:35" }!!
http://cs.brown.edu/~pavlo/torrent/
�15
TokuMX : Raw Compression
®
TokuMX : Compression : Field Names
�16
!schema 1 - long field names (10/20/20) { first_name : “Tim”, ! last_name : “Callaghan”, ! email_address : “[email protected]” }
!schema 2 - short field names (26 less bytes per doc) { fn : “Tim”, ! ln : “Callaghan”, ! ea : “[email protected]” }
!