MongoDB Revs You Up: What Storage Engine is Right for You? Jon Tobin, Dir. of Solution Eng. --------------------- [email protected] @jontobs Linkedin.com/in/ jonathanetobin
MongoDB Revs You Up: What Storage Engine is Right for You?
Jon Tobin, Dir. of Solution Eng.---------------------
[email protected]@jontobs
Linkedin.com/in/jonathanetobin
www.percona.com
Agenda
• How did we get here?
• What storage engines are available?
• Why does the data structure matter?
• What makes them unique?
• Where should I start my evaluation?
• How can I evaluate the engines?
First: Background
www.percona.com
Let’s Level Set
{“_id” : ObjectId(“507f1f77bcf86cd799439011”),“studentID” : 100, “firstName” : “Jonathan”,“middleName” : “Eli”,“lastName” : “Tobin”,“classes” : [
{“courseID” : “PHY101”,“grade” : “B”,“courseName” : “Physics 101”“credits” : 3…
www.percona.com
MongoDB History
• MMAP rules the universe; concurrency suffers• Per mongod lock
• v1.2.0 – December 10th 2009• v2.0.7 – August 9th 2012
• Per database lock• v2.2.0 – August 29th 2012• v2.6.8 – August 25th 2015
• Concurrency!!• Per document lock
• MongoDB, Inc acquires wiredTiger – December 16th, 2014• v3.0 – March 3rd 2015
• First implementation of a storage engine API
Storage Engines & Data Structures
www.percona.com
MongoDB Storage Engines
MongoDB, Inc. & Percona Server for MongoDB• MMAP• wiredTiger
MongoDB Enterprise Advanced Only• In Memory• Encrypted (wiredTiger)
Percona Server for MongoDB• PerconaFT• RocksDB
www.percona.com
B-tree Overview
www.percona.com
B-tree Insert
Pivot Rule >=
www.percona.com
B-tree Search
www.percona.com
B-tree - Importance of I/O
15 hours VS 91 hoursAWS – Insert 200M Rows – Predictable I/O Response VS Not
6x
www.percona.com
What’s the Problem?
Performance is I/O limited when data is > RAMEach insert/update requires at least 1 I/O
plus an I/O for every extra index
www.percona.com
What’s Up With: MMAP
Overview• Very basic “storage engine”• Collection level lock• Highly reliant on the OS for caching• Uses b-tree indexes to point to disk offset
• At the offset is the “record”• In the record is the document
Best Use• In place updates
• Record migration should be minimized• $inc, $set, etc
• Read only*
www.percona.com
What’s Up With: MMAP
Problems• Record allocation is fixed size
• Space inefficient (powerof2)• What if document grows bigger than record?
Probably not for you. Going the “way of the dodo”
www.percona.com
What’s Up With: wiredTiger
Overview• Concurrency: Document level• Supports multiple data structures
• B-tree (v3.0 +)• LSM tree (v3.2 +)
• Controls cache
Best Use• Depends on data structure
• B-tree: reads (point or small range) / dataset close to cache• LSM: random updates
Promising but still a bit of a “black box”
www.percona.com
What’s Up With: RocksDB
Overview• Written & maintained by Facebook• Cut it’s teeth @ Parse• Data Structure = LSM Trees• Uses Google’s LevelDB API• Space efficient + compression• Excellent core scaling
Best Use• Point queries• Updates• Easy incremental backupsHas very advanced functionality. Lots of potential
www.percona.com
What’s Up With: LSMs
memtbl
Level 0
Level 1
Level 2
Level 3
Level 4
• Writes go to memTable + journal• Memtable fills up and overflows (flush) to file(s)• Files are read only• Acts like layers of logs• Files are eventually merged and old files are marked for deletion• Files are like small structured trees
www.percona.com
What’s Up With: LSMs – Range Ops
memtbl
Level 0
Level 1
Level 2
Level 3
Level 4
• Range scans are tough• Each file is it’s own tree• No good way to tell if data lies in any file• Read amplification is H-I-G-H
RANGE SCAN
www.percona.com
What’s Up With: LSMs – Point Ops
memtbl
Level 0
Level 1
Level 2
Level 3
Level 4
• Point operations are tough too• However, Bloom filters work well• Filter determines if the required info exists in a set• Can have false positives
www.percona.com
Fractal Tree Indexes
www.percona.com
Fractal - Insert
www.percona.com
Fractal – Message Injection
www.percona.com
What’s Up With: PerconaFT
Overview• Developed by MIT, SUNY Stony Brook & Rutgers• Concurrency: Document level• Unique data structure
• Fractal Tree• Controls cache• Compresses well (quicklz, zlib, lzma)Best Uses• Best compression
• CPU efficient (relatively)• Sequential workloadsStill developing as a pluggable engine. Needs to learn API
Benchmarks
Disclaimer: They’re just benchmarks. It’s all made up. (like economics & meteorology)
www.percona.com
Insert Workload
collections = 8 database name = sbtest writer threads = 16 documents per collection = 10,000,000 feedback seconds = 20 auto commit = N run seconds = 1200 oltp range size = 100 oltp point selects = 0 oltp simple ranges = 0 oltp sum ranges = 0 oltp order ranges = 0 oltp distinct ranges = 0 oltp index updates = 0 oltp non index updates = 0 oltp inserts = 20
Applies to all benchmarks in this presentation
www.percona.com
What’s Up With: Writes
20 80140
200260
320380
440500
560620
680740
800860
920980
10401100
11600
100
200
300
400
500
600
700
800
Mongo Engines - Write TPS
PerconaFTwiredTigerRocksDB
Elapsed Seconds
TPS
www.percona.com
Read Workload
run seconds = 1200 oltp range size = 100 oltp point selects = 10 oltp simple ranges = 1 oltp sum ranges = 1 oltp order ranges = 1 oltp distinct ranges = 1 oltp index updates = 0 oltp non index updates = 0 oltp inserts = 0
www.percona.com
What’s Up With: Reads
20 80140
200260
320380
440500
560620
680740
800860
920980
10401100
11600
200
400
600
800
1000
1200
1400
1600
Mongo Engines - Read TPS
PerconaFTwiredTigerRocksDB
Axis Title
Axis
Title
www.percona.com
Update Workload
run seconds = 1200 oltp range size = 100 oltp point selects = 0 oltp simple ranges = 0 oltp sum ranges = 0 oltp order ranges = 0 oltp distinct ranges = 0 oltp index updates = 50 oltp non index updates = 5 oltp inserts = 0
www.percona.com
What’s Up With: Updates
20 80140
200260
320380
440500
560620
680740
800860
920980
10401100
11600
50
100
150
200
250
300
350
400
450
500
Mongo Engines - Updates
PerconaFTwiredTigerRocksDB
Elapsed Seconds
TPS
www.percona.com
Mixed Workload
run seconds = 1200 oltp range size = 100 oltp point selects = 10 oltp simple ranges = 1 oltp sum ranges = 1 oltp order ranges = 1 oltp distinct ranges = 1 oltp index updates = 50 oltp non index updates = 5 oltp inserts = 10
www.percona.com
What’s Up With: Mixed Workloads
20 80140
200260
320380
440500
560620
680740
800860
920980
10401100
11600
50
100
150
200
250
Mongo Engines - Mixed
PerconaFTwiredTigerRocksDB
Elapsed Seconds
TPS
www.percona.com
Evaluation Resources
• Flashback – replay Mongo operations in real time or as fast as possible with your workload
• Benchrun – javascript benchmark harness in MongoDB. Cut out driver problems
• Sysbench & iiBench for Mongo• Yahoo Cloud Services Benchmark• Mongo-perf
*Whenever possible, run with YOUR workload, or a workload that accurately simulates yours.
www.percona.com www.percona.com
Percona LiveData Performance Conference
• April 18-21 in Santa Clara, CA at the Santa Clara Convention Center
• Register with code “WebinarPL” to receive 15% off at registration
• MySQL, NoSQL, Data in the Cloud
www.perconalive.com