Page 1
Solutions Architect, 10gen
[email protected]
Twitter: @hungarianhc
Kevin Hanson
Sharding MongoDB
Q&A to follow the webinar…Feel free to ask questions in the Q&A box.
A recording will be available 24 hours from now.Any other issues? Contact [email protected]
Page 2
Agenda
• What Does it Mean?
• When to Shard
• Replica Set Crash Course
• Architecture
• Configuration
• Mechanics
• Solutions
Page 3
Traditional Scaling Up(costly)
Page 4
Visual representation of horizontal scaling
Google, ~2000: Horizontal Scalability (scale out)
Page 5
Data Store Scalability in 2005
• Custom Hardware– Oracle
• Custom Software– Facebook + MySQL
Page 6
Data Store Scalability Today
• MongoDB auto-sharding available in 2009
• A data store that is– Publicly available– Open source
(https://github.com/mongodb/mongo)– Horizontally scalable– Application independent
Page 8
Throughput Exceeds I/O
Page 9
Data Set Grows Too Large
Page 10
Replica Sets: The Building Blocks of Sharding
Page 11
Replica Sets
Primary
Secondary
Secondary
Read
Write
Read
Read
Dri
ver
Page 12
Replica Sets
Primary
Secondary
Secondary
Read
Write
Read
Read
Dri
ver
• High Availability
• Read Scalability
Page 13
Sharding Architecture
Page 14
Data stored in shard
• Shard is a node of the cluster
• Shard can be a single mongod or a replica set
Page 15
• Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have
3)– Two phase commit (not a replica set)
Config server stores meta data
Page 16
MongoS manages the data
• Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many
Page 17
Sharding Infrastructure
Page 18
Partition data based on ranges
• User defines shard key
• Shard key defines range of data
• Key space is like points on a line
• Range is a segment of that line
• Chunks are subsets of the entire range
Page 22
Chunks
-∞ +∞
[email protected]
[email protected]
[email protected]
Split!This is a chunk
This is a chunk
Page 26
Distribute data in chunks across nodes
• Initially 1 chunk
• Default max chunk size: 64mb
• MongoDB automatically splits & migrates chunks when max reached
Page 27
MongoDB manages data
• Queries routed to specific shards
• MongoDB balances cluster
• MongoDB migrates data to new nodes
Page 28
MongoDB Auto-Sharding
• Transparent to Developers– Same interface as single mongod
• Two Steps– Enable Sharding for a database– Shard collection within database
Page 30
Shard Key
• Choose a field common used in queries
• Shard key is immutable
• Shard key values are immutable
• Shard key requires index on fields contained in key
• Uniqueness of `_id` field is only guaranteed within individual shard
• Shard key limited to 512 bytes in size
Page 31
Shard Key Considerations
• Cardinality
• Write distribution
• Query isolation
• Data Distribution
Page 33
Chunk splitting
• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same
shard key• Chunk split is a logical operation (no data is moved)• If split creates too large of a discrepancy of chunk
count across cluster a balancing round starts
Page 34
Chunk Moving
• Balancer is running on mongos• Once the difference in chunks between the most
dense shard and the least dense shard is above the migration threshold, a balancing round starts
Page 35
Cluster Request Routing
• Targeted Queries
• Scatter Gather Queries
• Scatter Gather Queries with Sort
Page 36
Cluster Request Routing: Targeted Query
Page 37
Routable request received
Page 38
Request routed to appropriate shard
Page 39
Shard returns results
Page 40
Mongos returns results to client
Page 41
Cluster Request Routing: Non-Targeted Query
Page 42
Non-Targeted Request Received
Page 43
Request sent to all shards
Page 44
Shards return results to mongos
Page 45
Mongos returns results to client
Page 46
Cluster Request Routing: Non-Targeted Query with Sort
Page 47
Non-Targeted request with sort received
Page 48
Request sent to all shards
Page 49
Query and sort performed locally
Page 50
Shards return results to mongos
Page 51
Mongos merges sorted results
Page 52
Mongos returns results to client
Page 53
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Ranges ofChunks
Ranges ofChunks
Ranges ofChunks
Ranges ofChunks
MongoS MongoS
Queries
MongoS
Config
Config
Config
Page 54
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Ranges ofChunks
Ranges ofChunks
Ranges ofChunks
Ranges ofChunks
MongoS MongoS
Queries
MongoS
Config
Config
Config
Chicago
Beijing
London
Page 55
Kevin Hanson
Questions?
Solutions Architect, 10gen
[email protected]
Twitter: @hungarianhc