May 10, 2015
MongoDBauto shardingAaron [email protected] SeattleJuly 27, 2010
MongoDB v1.6out next week!
Why Scale Horizontally?
Vertical scaling is expensive
Horizontal scaling is more incremental – works well in the cloud
Will always be able to scale wider than higher
Distribution Models
Ad-hoc partitioning
Consistent hashing (dynamo)
Range based partitioning (BigTable/PNUTS)
Auto Sharding
Each piece of data is exclusively controlled by a single node (shard)
Each node (shard) has exclusive control over a well defined subset of the data
Database operations run against one shard when possible, multiple when necessary
As system load changes, assignment of data to shards is rebalanced automatically
Mongo Sharding
Basic goal is to make this
client
?
mongod mongod mongod
Mongo Sharding
Look like this
client
mongod
Mongo Sharding
Mapping of documents to shards controlled by shard key
Can convert from single master to sharded cluster with 0 downtime
Most functionality of a single Mongo master is preserved
Fully consistent
Shard Key
Collection
Min Max location
users {name:’Miller’} {name:’Nessman’}
shard 2
users {name:’Nessman’}
{name:’Ogden’} Shard 4
…
Shard Key Examples
{ user_id: 1 }
{ state: 1 }
{ lastname: 1, firstname: 1 }
{ tag: 1, timestamp: -1 }
{ _id: 1 } This is the default Careful when using ObjectId
Architecture Overview
client
mongos
mongod mongod mongod
-inf <= user_id < 20002000 <= user_id < 55005500 <= user_id < +inf
Query I
Shard key { user_id: 1 }
db.users.find( { user_id: 5000 } )
Query appropriate shard
Query II
Shard key { user_id: 1 }
db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } )
Query appropriate shard(s)
{ a : …, b : …, c : … } a is declared shard key
find( { a : { $gt : 333, $lt : 400 } )
Range Shard
a in [-∞,2000) 2
a in [2000,2100) 8
a in [2100,5500) 3
… …
a in [88700, ∞) 0
{ a : …, b : …, c : … } a is declared shard key
find( { a : { $gt : 333, $lt : 2012 } )
Range Shard
a in [-∞,2000) 2
a in [2000,2100) 8
a in [2100,5500) 3
… …
a in [88700, ∞) 0
Query III
Shard key { user_id: 1 }
db.users.find( { hometown: ‘Seattle’ } )
Query all shards
{ a : …, b : …, c : … } a is declared shard key
find( { a : { $gt : 333, $lt : 2012 } )
Range Shard
a in [-∞,2000) 2
a in [2000,2100) 8
a in [2100,5500) 3
… …
a in [88700, ∞) 0
{ a : …, b : …, c : … } secondary query, secondary index
ensureIndex({b:1})find( { b : 99 } )
This case good when m is small (such as here), also when the queries are large tasks
Range Shard
a in [-∞,2000) 2
a in [2000,2100) 8
a in [2100,5500) 3
… …
a in [88700, ∞) 0
Query IV
Shard key { user_id: 1 }
db.users.find( { hometown: ‘Seattle’ } ).sort( { user_id: 1 } )
Query all shards, in sequence
Query V
Shard key { user_id: 1 }
db.users.find( { hometown: ‘Seattle’ } ).sort( { lastname: 1 } )
Query all shards in parallel, perform merge sort
Secondary index in { lastname: 1 } can be used
Map/Reduce
Map/Reduce was designed for distributed systems
Map/Reduce jobs will run on all relevant shards in parallel, subject to query spec
Map/Reduce
> m = function() { emit(this.user_id, 1); }
> r = function(k,vals) { return 1; }
> res = db.events.mapReduce(m, r, { query : {type:'sale'} });
> db[res.result].find().limit(2)
{ "_id" : 8321073716060 , "value" : 1 }
{ "_id" : 7921232311289 , "value" : 1 }
Writes
Inserts routed to appropriate shard (inserted doc must contain shard key)
Removes call upon matching shards
Updates call upon matching shards
Writes parallel if asynchronous, sequential if synchronous
Updates cannot modify the shard key
Choice of Shard Key
Key per document that is generally a component of your queries Often you want a unique key If not, consider granularity of key and potentially
add fields
The shard key is generally comprised of fields you would put in an index if operating on a single machine But in a sharded configuration, the shard key will
be indexed automatically
Shard Key Examples (again)
{ user_id: 1 }
{ state: 1 }
{ lastname: 1, firstname: 1 }
{ tag: 1, timestamp: -1 }
{ _id: 1 } This is the default Careful when using ObjectId
Bit.ly Example
~50M users
~10K concurrently using server at peak
~12.5B shortens per month (1K/sec peak)
History of all shortens per user stored in mongo
Bit.ly Example - Schema
{ "_id" : "lthp", "domain" : null, "keyword" : null, "title" : "bit.ly, a simple url shortener", "url" : "http://jay.bit.ly/", "labels" : [{"text" : "lthp",
"name" : "user_hash"},{"text" : "BUFx",
"name" : "global_hash"},{"text" : "http://jay.bit.ly/",
"name" : "url"},{"text" : "bit.ly, a simple url shortener",
"name" : "title"}], "ts" : 1229375771, "global_hash" : "BUFx", "user" :
"jay", "media_type" : null }
Bit.ly Example
Shard key { user: 1 }
Indices { _id: 1 } { user: 1 } { global_hash: 1 } { user: 1, ts: -1 }
Query db.history.find( {
user : user,
labels.text : …
} ).sort( { ts: -1 } )
Balancing The whole point of auto sharding is that
mongo balances the shards for you
The balancing algorithm is complicated, basic idea is that this
1000 <= user_id < +inf-inf <= user_id < 1000
Balancing
Becomes this
3000 <= user_id < +inf-inf <= user_id < 3000
Balancing
Current balancing metric is data size Future possibilities – cpu, disk utilization
There is some flexibility built into the partitioning algorithm – so we aren’t thrashing data back and forth between shards
Only move one ‘chunk’ of data at a time – a conservative choice that limits total overhead of balancing
System Architecture
Shard
Regular mongod process(es), storing all documents for a given key range
Handles all reads/writes for this key range as well Each shard indexes the data contained within it
Can be single mongod, master/slave, or replica set
Shard - Chunk
In a sharded cluster, shards partitioned by shard key
Within a shard, chunks partitioned by shard key
A chunk is the smallest unit of data for balancing Data moves between chunks at chunk
granularity
Upper limit on chunk size is 200MB
Special case if shard key range is open ended
Shard - Replica Sets
Replica sets provide data redundancy and auto failover
In the case of sharding, this means redundancy and failover per shard
All typical replica set operations are possible For example, write with w=N Replica sets were specifically designed to work
as shards
System Architecture
Mongos
Sharding router – distributes reads/writes to sharded cluster
Client interface is the same as a mongod
Can have as many mongos instances as you want
Can run on app server machine to avoid extra network traffic
Mongos also initiates balancing operations
Keeps metadata per chunk in RAM – 1MB RAM per 1TB of user data in cluster
System Architecture
Config Server
3 Config servers
Changes are made with a 2 phase commit
If any of the 3 servers goes down, config data becomes read only
Sharded cluster will remain online as long as 1 of the config servers is running
Config metadata size estimate 1MB metadata per 1TB data in cluster
Shared Machines
Bit.ly Architecture
Limitations
Unique index constraints not expressed by shard key are not enforced across shards
Updates to a document’s shard key aren’t allowed (you can remove and reinsert, but it’s not atomic)
Balancing metric is limited to # of chunks right now – but this will be enhanced
Right now only one chunk moves in the cluster at a time – this means balancing can be slow, it’s a conservative choice we’ve made to keep the overhead of balancing low for now
20 petabyte size limit
Start Up Config Servers
$ mkdir -p ~/dbs/config
$ ./mongod --dbpath ~/dbs/config --port 20000
Repeat as necessary
Start Up Mongos
$ ./mongos --port 30000 --configdb localhost:20000
No dbpath
Repeat as necessary
Start Up Shards
$ mkdir -p ~/dbs/shard1
$ ./mongod --dbpath ~/dbs/shard1 --port 10000
Repeat as necessary
Configure Shards
$ ./mongo localhost:30000/admin
> db.runCommand({addshard : "localhost:10000", allowLocal : true})
{
"added" : "localhost:10000",
"ok" : true
}
Repeat as necessary
Shard Data
> db.runCommand({"enablesharding" : "foo"})
> db.runCommand({"shardcollection" : "foo.bar", "key" : {"_id" : 1}})
Repeat as necessary
Production Configuration
Multiple config servers
Multiple mongos servers
Replica sets for each shard
Use getLastError/w correctly
Looking at config data
Mongo shell connected to config server, config database
> db.shards.find()
{ "_id" : "shard0", "host" : "localhost:10000" }
{ "_id" : "shard1", "host" : "localhost:10001" }
Looking at config data
> db.databases.find()
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "foo", "partitioned" : false, "primary" : "shard1" }
{ "_id" : "x", "partitioned" : false, "primary" : "shard0” }
{"_id" : "test", "partitioned" : true, "primary" : "shard0", "sharded" :
{
"test.foo" : { "key" : {"x" : 1}, "unique" : false }
}
}
Looking at config data
> db.chunks.find()
{"_id" : "test.foo-x_MinKey",
"lastmod" : { "t" : 1276636243000, "i" : 1 },
"ns" : "test.foo",
"min" : {"x" : { $minKey : 1 } },
"max" : {"x" : { $maxKey : 1 } },
"shard" : "shard0”
}
Looking at config data
> db.printShardingStatus() --- Sharding Status ---sharding version: { "_id" : 1, "version" : 3 } shards:
{ "_id" : "shard0", "host" : "localhost:10000" }{ "_id" : "shard1", "host" : "localhost:10001" }
databases:{ "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "foo", "partitioned" : false, "primary" : "shard1" } { "_id" : "x", "partitioned" : false, "primary" : "shard0" } { "_id" : "test", "partitioned" : true, "primary" : "shard0","sharded" : { "test.foo" : { "key" : { "x" : 1 }, "unique" : false } } }
test.foo chunks:{ "x" : { $minKey : 1 } } -->> { "x" : { $maxKey : 1 } } on : shard0 { "t" : 1276636243 …
Give it a Try!
Download from mongodb.org
Sharding production ready in 1.6, which is scheduled for release next week
For now use 1.5 (unstable) to try sharding