Keeping your MongoDB Data Safe Tony Tam @fehguy
May 11, 2015
Keeping your MongoDB Data Safe
Tony Tam@fehguy
Backups
You care because…
•Your data matters
•You run experiments on prod data
•Your devs have sudo on production
•You've seen this before
Background
Who am I?
•MongoDB user
• Migrated Wordnik to MongoDB in 2009
•MongoDB admin
• Had to keep it running
Who is Wordnik?
•Data-driven technology company
•MongoDB is our primary data store
Strategies
•It's a function of your data size, state of your business
What's in the Standard Toolbox
•Dump files via mongodump
•Exports via mongoexport
•Binary data files
•Redundancy
•Oplog
•3rd party or community-developed OSS
•Hosted MongoDB
The Lazy Developer
•One server (You've been there)
•Small data, small usage, small problems
•Mongodump is great!
• Small(ish) files (gzip will help you)
• FAST to create
• (typically) FAST to restore via mongorestore
Tradeoffs with dump/restore
•Can be done with no downtime. But…
•Potentially inconsistent snapshot
• Why? One collection at a time
• Non-blocking (will yield to writes)
•All or nothing
•Remove then restore
•Restore *might* take time
• Indexes!
Tradeoffs with dump/restore
•Can be done with no downtime. But…
•Potentially inconsistent snapshot
• Why? One collection at a time
• Non-blocking (will yield to writes)
•All or nothing
•Remove then restore
•Restore *might* take time
• Indexes!
This can take DAYS
Cost of being lazy
Replication
•Right! Replica sets
•HA by redundancy
•Auto fail-over
•Maintenance without downtime
•You can STILL use mongodump
There is NO
excuse
Replica Sets
•Lost a server? Add a new one
• Sync from nearby master
• Announce to clients when ready
•Time depends on data size
• And… oh yea, index size
•Gah! WTF!?
Replica Sets
•Lost a server? Add a new one
• Sync from nearby master
• Announce to clients when ready
•Time depends on data size
• And… oh yea, index size
•Gah! WTF!? You need
MORE RAM to rebuild
indexes!
Replica Sets
•They are Awesome! Really! But…
Test the process
before you need it!
Tradeoffs with Replica Sets
•Need multiple servers
•Fat finger?
•Malicious access?
•Software bug?
•You still need backups
Options with Replica Sets
•Slave Delay
• Keep one slave behind by X seconds
• *Read* is delayed, not *write*
Options with Replica Sets
•Slave Delay
• Keep one slave behind by X seconds
• *Read* is delayed, not *write*Fat finger problem solved?
No! Shut them all
down! Hurry!
Alternative to Mongodump?
•Snapshot the data files
• Stop server, back them up
• It's consistent! Snapshot time is well known
•Restoring is easy
• Copy the files, start a server, add to replica set
• NO index rebuilding delays
In action
•Stop server
•Snapshot data
•Archive
•Restart
•RepeatDaily?
Hourly?
Repeat often or lose data!
•Data copy time (EC2 => 20mbps if lucky)
• 1GB => 1 min
• 100GB => 1.5 hours
• 1TB => 14 hours
•Can't write to data files while copying!
Repeat often or lose data!
•Data copy time (EC2 => 20mbps if lucky)
• 1GB => 1 min
• 100GB => 1.5 hours
• 1TB => 14 hours
•Can't write to data files while copying!
Multiple backup servers?
Fancy storage device?
Plain-old copying might not cut it
•Many alternatives
• EBS Snapshots
• Logical Volume Manager (LVM)
• RYOR (Roll your own RAID)
• Other IT Black Magic
But what about Snapshot Gaps?
•The gaps can be real (and painful)
•Your DRP might need more
• OH, and we still have the fat finger issue
• Retention?
• "Rollback everything but one operation"?
•You can do incremental backups
• (with a little help)
•Easy to add to your automated snapshots
More about the OpLog
•All participating members have one
•Capped collection of all write ops
primary replica replicat0
time
t1
t3
t2
time
OpLog for incremental BU
•SAME mechanism used by slaves (it's rock solid)
• Just write operations to disk! It's just BSON
•How? (write some code)
cursor = oplog.find();cursor.addOption(Bytes.QUERYOPTION_TAILABLE);cursor.addOption(Bytes.QUERYOPTION_AWAITDATA);while(cursor.hasNext) { DBObject x = cursor.next(); outputStream.write(BSON.encode(x)); ...}
OpLog for incremental BU
•Already done for youhttps://github.com/wordnik/wordnik-oss
• For the lazy:
• Get com.wordnik.mongo-admin-utils-distribution from sonatype/maven central
./bin/run.sh com.wordnik.system.mongodb.IncrementalBackupUtil -?
Using Wordnik Admin Tools
•Start the IncrementalBackupUtil
•Write to rotating files, last timestamp
•Kill at will
•Restart, picks up from last query
•Restore using RestoreUtil, mongorestore
How does it work?
•Easy, of course
In Summary
•Technique depends on your deployment
•Lots of tools available
•Fine grained control is available
Test before
you need it!
Questions?