Top Banner
MongoSF 4/30/2010 From MySQL to MongoDB Migrating a Live Application Tony Tam
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From MySQL to MongoDB at Wordnik (Tony Tam)

MongoSF 4/30/2010From MySQL to MongoDB

Migrating a Live Application

Tony Tam

Page 2: From MySQL to MongoDB at Wordnik (Tony Tam)

WHAT IS WORDNIK

Project to track language like GPS for English

Dictionary is a road block to the languageRoughly 200 new words created dailyLanguage is not static

Capture information about all wordsMeaning is often undefined in traditional senseMachines can determine meaning through

analysisNeeds LOTS of data

Page 3: From MySQL to MongoDB at Wordnik (Tony Tam)

WHY SHOULD YOU CARE

Every Developer can use a Robust Language API!

Wordnik migrated to MongoDB> 5 Billion documents> 1.2 TBZero application downtime

Learn from our Experience

Page 4: From MySQL to MongoDB at Wordnik (Tony Tam)

WORDNIK

Not just a website!But we have one

Launched Wordnik entirely on MySQLHit road bumps with insert speed ~4B rows

on MyISAM tablesTables locked for 10’s of seconds during insertsBut we need more data!

Created elaborate update schemes to work around itLost lots of sleep babysitting servers while

researching LT solution

Page 5: From MySQL to MongoDB at Wordnik (Tony Tam)

WORDNIK + MONGODB

What are our storage needs?Database vs. Application LogicNo PK/FK constraintsNo Stored ProceduresConsistency?

Lots of R&DTried most all noSQL solutions

Page 6: From MySQL to MongoDB at Wordnik (Tony Tam)

MIGRATING STORAGE ENGINES

Many parts to this effortSetup & AdministrationSoftware DesignOptimization

Many types of data at WordnikCorpusStructured Hierarchical DataUser Data

Migrated #1 & #2

Page 7: From MySQL to MongoDB at Wordnik (Tony Tam)

SERVER INFRASTRUCTURE

Wordnik is Heavily Read-onlyMaster / Slave deployment

Looking at replica pairsMongoDB loves system resources

Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out)

Memory + Disk = Happy MongoMany X the disk space of MySQLEasy pill to swallow until…

Page 8: From MySQL to MongoDB at Wordnik (Tony Tam)

SERVER INFRASTRUCTURE

Physical Hardware2 x 4 core CPU, 32gb RAM, FC SANHad bad luck on VMs

(you might not)Disk speed => performance

Page 9: From MySQL to MongoDB at Wordnik (Tony Tam)

SOFTWARE DESIGN

Two distinct use cases for MongoDBIdentical structure, different storage

engineSame underlying objects, same storage fidelity

(largely key/value)Hierarchical data structure

Same underlying objects, document-oriented storage

Page 10: From MySQL to MongoDB at Wordnik (Tony Tam)

SOFTWARE DESIGN

Create BasicDBObjects from POJOs and used collection methodsBasicDBObject dbo = new BasicDBObject("sentence",s.getSentence()) .append("rating",s.getRating()).append(...);

ID Generation to manage unique _ID valuesAnalogous to MySQL AutoIncrement behaviorCompatible with MySQL Ids (more later)dbo.append("_ID", getId());collection.save(dbo);

Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at

runtime

Page 11: From MySQL to MongoDB at Wordnik (Tony Tam)

SOFTWARE DESIGN

Key-Value storage use caseEasy as implementing new DAOs

SentenceHandler h = new MongoDBSentenceHandler();

Save methods construct BasicDBObject and call save() on collection

Implement same interfaceSame methods against DAO between MySQL

and MongoDB versionsData Abstraction 101

Page 12: From MySQL to MongoDB at Wordnik (Tony Tam)

SOFTWARE DESIGN

What about bulk inserts?FAF Queued approach

Add objects to queue, return to callerEvery X seconds, process queueAll objects from same collection are appended

to a single List<DBObject>Call collection.insert(…) before 2M

charactersReduces network overheadVery fast inserts

Page 13: From MySQL to MongoDB at Wordnik (Tony Tam)

SOFTWARE DESIGN

Hierarchical Data done more elegantlyWordnik Dictionary ModelJava POJOs already had JAXB annotations

Part of public REST apiUsed Mysql

12+ tables13 DAOs2500 lines of code50 requests/second uncachedMemcache needed to maintain reasonable speed

Page 14: From MySQL to MongoDB at Wordnik (Tony Tam)

SOFTWARE DESIGN

TMGO

Page 15: From MySQL to MongoDB at Wordnik (Tony Tam)

SOFTWARE DESIGN

MongoDB’s Document Storage let us…Turn the Objects into JSON via Jackson

Mapper (fasterxml.com)Call saveSupport all fetch types, enhanced filters1000 requests / secondNo explicit cachingNo less scary code

Page 16: From MySQL to MongoDB at Wordnik (Tony Tam)

SOFTWARE DESIGN

Saving a complex objectString rawJSON = getMapper().writeValueAsString(veryComplexObject);

collection.save(new BasicDBOBject(getId(),JSON.parse(rawJSON));

Fetching complex objectBasicDBObject dbo = cursor.next();ComplexObject obj = getMapper().readValue(dbo.toString(), ComplexObject.class);

No joins, 20x faster

Page 17: From MySQL to MongoDB at Wordnik (Tony Tam)

MIGRATING DATA

Migrating => existing data logicUse logic to select DAOs appropriatelyRead from old, write with newGreat system test for MongoDB

SentenceHandler mysqlSh = new MySQLSentenceHandler();SentenceHandler mongoSh = new MongoDbSentenceHandler();while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next()); ...}

Page 18: From MySQL to MongoDB at Wordnik (Tony Tam)

MIGRATING DATA

Wordnik moved 5 billion rows from MySQLSustained 100,000 inserts/secondMigration tool was CPU bound

ID generation logic, among other

Wordnik reads MongoDB fastRead + create java objects @ 250k/second (!)

Page 19: From MySQL to MongoDB at Wordnik (Tony Tam)

GOING LIVE TO PRODUCTION

Choose your use case carefully if migrating incrementally

Scary no matter whatTest your perf monitoring system first!

Use your DAOs from migrationTurn on MongoDB on one server,

monitor, tune (rollback, repeat)Full switch over when comfortable

Page 20: From MySQL to MongoDB at Wordnik (Tony Tam)

GOING LIVE TO PRODUCTION

Really?SentenceHandler h = null;if(useMongoDb){ h = new MongoDbSentenceHandler();}else{ h = new MySQLDbSentenceHandler();}return h.find(...);

Page 21: From MySQL to MongoDB at Wordnik (Tony Tam)

OPTIMIZING PERFORMANCE

Home-grown connection poolingMaster onlyConnectionManager.getReadWriteConnection()

Slave onlyConnectionManager.getReadOnlyConnection()

Round-robin all servers, bias on slaves

ConnectionManager.getConnection()

Page 22: From MySQL to MongoDB at Wordnik (Tony Tam)

OPTIMIZING PERFORMANCE

CachingHad complex logic to handle cache

invalidationOut-of-process caches are not freeMongoDB loves your RAMLet it do your LRU cache (it will anyway)

HardwareDo not skimp on your disk or RAM

IndexesSchema-less design

Even if no values in any document, needs to read document schema to check

Page 23: From MySQL to MongoDB at Wordnik (Tony Tam)

OPTIMIZING PERFORMANCE

Disk spaceSchemaless => schema per document

(row)Choose your mappings wisely({veryLongAttributeName:true}) =>

more disk space than ({vlan:true})

Page 24: From MySQL to MongoDB at Wordnik (Tony Tam)

OPTIMIZING PERFORMANCE

A Typical Day at the Office for MongoDBAPI call rate: 47.7 calls/sec

Page 25: From MySQL to MongoDB at Wordnik (Tony Tam)

OTHER TIPS

Data TypesUse caution when changingDBObject obj = cur.next();long id = (Long) obj.get(“IWasAnIntOnce”)

Attribute namesDon’t change w/o migrating existing data!

WTFDMDG????

Page 26: From MySQL to MongoDB at Wordnik (Tony Tam)

WHAT’S NEXT?

GridFSStore audio files on disk

Requires clustered file system for shared access

Capped Collections (rolling out this week)

UGC from MySQL => MongoDBBeg/Bribe 10gen for some Features

Page 27: From MySQL to MongoDB at Wordnik (Tony Tam)

Questions?