Eliot Horowitz @eliothorowitz MongoUK March 21, 2011 caling with MongoD
May 10, 2015
Eliot Horowitz@eliothorowitz
MongoUKMarch 21, 2011
Scaling with MongoDB
Scaling
•Storage needs only go up
•Operations/sec only go up
•Complexity only goes up
Horizontal Scaling
•Vertical scaling is limited
•Hard to scale vertically in the cloud
•Can scale wider than higher
Read Scaling
•One master at any time
•Programmer determines if read hits master or a slave
•Pro: easy to setup, can scale reads very well
•Con: reads are inconsistent on a slave
•Writes don’t scale
One Master, Many Slaves
•Custom Master/Slave setup
•Have as many slaves as you want
•Can put them local to application servers
•Good for 90+% read heavy applications (Wikipedia)
Replica Sets
•High Availability Cluster
•One master at any time, up to 6 slaves
•A slave automatically promoted to master if failure
•Drivers support auto routing of reads to slaves if programmer allows
•Good for applications that need high write availability but mostly reads (Commenting System)
•Many masters, even more slaves
•Can scale in two dimensions
•Add Shards for write and data size scaling
•Add slaves for inconsistent read scaling and redundancy
Sharding
Sharding Basics
•Data is split up into chunks
•Shard: Replica sets that hold a portion of the data
•Config Servers: Store meta data about system
•Mongos: Routers, direct direct and merge requests
Architecture
client
mongos ...mongos
mongod
mongodddd ...
Shards
mongod
mongod
mongod
ConfigServers
mongod
mongod
mongodddd
mongod
mongod
mongodddd
mongod
client client client
Common Setup
•A common setup is 3 shards with 3 servers per shard: 3 masters, 6 slaves
•Can add sharding later to an existing replica set with no down time
•Can have sharded and non-sharded collections
Range Based
•collection is broken into chunks by range
•chunks default to 64mb or 100,000 objects
MIN MAX LOCATION
A F shard1
F M shard1
M R shard2
R Z shard3
Config Servers
•3 of them
•changes are made with 2 phase commit
•if any are down, meta data goes read only
•system is online as long as 1/3 is up
mongos
•Sharding Router
•Acts just like a mongod to clients
•Can have 1 or as many as you want
•Can run on appserver so no extra network traffic
•Cache meta data from config servers
Writes
•Inserts : require shard key, routed
•Removes: routed and/or scattered
•Updates: routed or scattered
Queries
•By shard key: routed
•sorted by shard key: routed in order
•by non shard key: scatter gather
•sorted by non shard key: distributed merge sort
Splitting
•Take a chunk and split it in 2
•Splits on the median value
•Splits only change meta data, no data change
SplittingMIN MAX LOCATION
A Z shard1
T1
MIN MAX LOCATION
A G shard1
G Z shard1
T2
MIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard1
T3
Balancing
•Moves chunks from one shard to another
•Done online while system is running
•Balancing runs in the background
MigratingMIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard1
T3
MIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard2
T4
MIN MAX LOCATION
A D shard1
D G shard1
G S shard2
S Z shard2
T5
Choosing a Shard Key
•Shard key determines how data is partitioned
•Hard to change
•Most important performance decision
Use Case: User Profiles
{ email : “[email protected]” ,
addresses : [ { state : “NY” } ]
}
•Shard by email
•Lookup by email hits 1 node
•Index on { “addresses.state” : 1 }
Use Case: Activity Stream
{ user_id : XXX, event_id : YYY , data : ZZZ }
•Shard by user_id
•Looking up an activity stream hits 1 node
•Writing even is distributed
•Index on { “event_id” : 1 } for deletes
Use Case: Photos
{ photo_id : ???? , data : <binary> }
What’s the right key?
•auto increment
•MD5( data )
•now() + MD5(data)
•month() + MD5(data)
Use Case: Logging
{ machine : “app.foo.com” , app : “apache” ,
when : “2010-12-02:11:33:14” , data : XXX }
Possible Shard keys
•{ machine : 1 }
•{ when : 1 }
•{ machine : 1 , app : 1 }
•{ app : 1 }
Download MongoDBhttp://www.mongodb.org
and let us know what you think@eliothorowitz @mongodb
10gen is hiring!http://www.10gen.com/jobs