Migrating data to MongoDB
Nov 19, 2014
Migrating data to MongoDB
2
Agenda
• Why move your data
• Considerations for migration
• Techniques for implementing migration
• Case study: How Shutterfly migrated 20TB of production meta data with no downtime
3
Why move your data?
• Improve development agility with documents
• Reduce cost of data management
• Scale to handle large data sets & transaction volumes
Considerations
5
Does your schema need to change?
Relational MongoDB{ first_name: ‘Paul’, surname: ‘Miller’ city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ]}
6
How does the Application move over?
Source Database
Application
7
How does data get moved?
Source Database
Snapshot
Continuous Sync
Batch Migration
Application
Application Managed
8
Can you have downtime?
Source Database Master Exporting
Importing Master
Available Degraded Down AvailableApplication View
Time
Data Migration Technique
10
Mongoimport
jsr@bruford:/tmp$ mongoimport --collection import_example < import.json connected to: 127.0.0.1Tue Jun 18 00:02:12.553 imported 1 objectsjsr@bruford:/tmp$ mongo MongoDB shell version: 2.4.3connecting to: test> db.import_example.findOne() {
"_id" : ObjectId("51bfdbc438b61619a4f2a12b"),"first" : "Jared","last" : "Rosoff","twitter" : "@forjared"
}>
11
Extract, Transform & Load
Source DatabaseETL
12
Hadoop
Source Database
jobjob
jobjob
13
App Driven Migration
Source Database
Application
Case Study
15
Case Study
Uses MongoDB to safeguard over 6 billion images served to millions of customers
Problem Why MongoDB Results
• 6B images, 20TB of data
• Brittle code base on top of Oracle database – hard to scale, add features
• High SW and HW costs
• JSON-based data model
• Agile, high performance, scalable
• Alignment with Shutterfly’s services-based architecture
• 80% cost reduction
• 900% performance improvement
• Faster time-to-market
• Dev. cycles in weeks vs. tens of months
16
• Meta data stored in XML Blobs
• App responsible for content of blob
Shutterfly
Oracle
Photo ID
XML Blob
1 <xml><meta-data>…</xml>
2 <xml><meta-data>…</xml>
3 <xml><meta-data>…</xml>
17
1. Request for photo
2. Try to read from MongoDB
3. If cache miss, read from Oracle
4. Translate document & write to MongoDB
5. Return to client
Migrating records on demand
Source Database
Application
1
23
4
5
18
Schema Migration – Initial
<?xml version="1.0" encoding="utf16"?><votes> <voteItem user="00000000" vote="1" /> <voteItem user="11111111" vote="1" /> <voteItem user="22222222" vote="1" /></votes>
19
Schema Migration – Phase 2
<?xml version="1.0" encoding="utf16"?><votes> <voteItem user="00000000" vote="1" /> <voteItem user="11111111" vote="1" /> <voteItem user="22222222" vote="1" /></votes>
{ _id : "site/the3colbys/3326/_votes", "V" : 0, "cD" : "Thu Sep 23 2010 20:38:54 GMT-0700 (PDT)", "wD" : "Thu Sep 23 2010 20:38:54 GMT-0700 (PDT)", "md5" : "71199d82ee730f271feface722a74d30", "data" : "<?xml version=\"1.0\" encoding=\"utf16\"?> <votes> <voteItem user=\"00000000\" vote=\"1\" /> <voteItem user=\"11111111\" vote=\"1\" /> <voteItem user=\"22222222\" vote=\"1\" /> </votes>" }
20
Schema Migration – Phase 2
<?xml version="1.0" encoding="utf16"?><votes> <voteItem user="00000000" vote="1" /> <voteItem user="11111111" vote="1" /> <voteItem user="22222222" vote="1" /></votes>
{ _id : "site/the3colbys/3326/_votes", "V" : 0, "cD" : "Thu Sep 23 2010 20:38:54 GMT-0700 (PDT)", "wD" : "Thu Sep 23 2010 20:38:54 GMT-0700 (PDT)", "md5" : "71199d82ee730f271feface722a74d30", "data" : "<?xml version=\"1.0\" encoding=\"utf16\"?> <votes> <voteItem user=\"00000000\" vote=\"1\" /> <voteItem user=\"11111111\" vote=\"1\" /> <voteItem user=\"22222222\" vote=\"1\" /> </votes>" }
{ _id : "site/the3colbys/3326/_votes", "V" : 0, "cD" : "Thu Sep 23 2010 20:38:54 GMT-0700 (PDT)", "wD" : "Thu Sep 23 2010 20:38:54 GMT-0700 (PDT)", "md5" : "71199d82ee730f271feface722a74d30", "votes" : { 000000000:1, 111111111:1, 222222222:1 }}