Top Banner
Ger Hartnett Director of Technical Services (EMEA), MongoDB @ghartnett #MongoDB Tales from the Field Part three: Choosing the Right Shard Key for High-Performance and Scale
32

Webinar: Choosing the Right Shard Key for High Performance and Scale

Feb 09, 2017

Download

Software

MongoDB
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Webinar: Choosing the Right Shard Key for High Performance and Scale

Ger HartnettDirector of Technical Services (EMEA), MongoDB @ghartnett #MongoDB

Tales from the FieldPart three: Choosing the Right Shard Key for High-Performance and Scale

Page 2: Webinar: Choosing the Right Shard Key for High Performance and Scale

Or:●Cautionary Tales●Don’t solve the wrong problems●Bad schemas & shard keys hurt ops too

Page 3: Webinar: Choosing the Right Shard Key for High Performance and Scale

●The main talk should take 30-35 minutes

●You can submit questions via the chat box

●We’ll answer as many as possible at the end

●We are recording and will send slides Friday

●This is the final webinar in a series of 3

Before we start

Page 4: Webinar: Choosing the Right Shard Key for High Performance and Scale

●You work in operations●You work in development●You have a MongoDB system in production

●You have contacted MongoDB Technical Services (support)

●You attended an earlier webinar in the series (part1, part2)

A quick poll - add a word to the chat to let me know your perspective

Page 5: Webinar: Choosing the Right Shard Key for High Performance and Scale

●We collect - observations about common mistakes - to share the experience of many

●Names have been changed to protect the (mostly) innocent

●No animals were harmed during the making of this presentation (but maybe some DBAs and engineers had light emotional scarring)

●While you might be new to MongoDB we have deep experience that you can leverage

Stories

Page 6: Webinar: Choosing the Right Shard Key for High Performance and Scale

1.Discovering a DR flaw during a data centre outage

2.Complex documents, memory and an upgrade “surprise”

3.Wild success “uncovers” the wrong shard key

The Stories (part three today)

Page 7: Webinar: Choosing the Right Shard Key for High Performance and Scale

Story #1: Quick Review

Page 8: Webinar: Choosing the Right Shard Key for High Performance and Scale

Story #1: Recovering from a disaster●Prospect in the process of signing up for a subscription

●Called us late on Friday, data centre power outage and 30+ (11 shards) servers down

●When they started bringing up the first shard, the nodes crashed with data corruption

●17TB of data, very little free disk space, JOURNALLING DISABLED!

Page 9: Webinar: Choosing the Right Shard Key for High Performance and Scale

Recovering each shard1.Start secondary

read only2.Mount NFS

storage for repair3.Repair former

primary node4.Iterative rsync to

seed a secondary

Secondary

Primary

Secondary

Page 10: Webinar: Choosing the Right Shard Key for High Performance and Scale

Key takeaways for you●If you are departing significantly from standard config, check with us (i.e. if you think journalling is a bad idea)

●Two DC in different buildings on different flood plains, not in the path of the same storm (i.e. secondaries in AWS)

●DR/backups are useless if you haven’t tested them

Page 11: Webinar: Choosing the Right Shard Key for High Performance and Scale

Story #2: Complex documents, memory and an upgrade “surprise”●Well established ecommerce site selling diverse goods in 20+ countries

●After switching to wired tiger in production, performance dropped, this is the opposite of what they were expecting

Page 12: Webinar: Choosing the Right Shard Key for High Performance and Scale

{ _id: 375 en_US : { name : ..., description : ..., <etc...> }, en_GB : { name : ..., description : ..., <etc...> }, fr_FR : { name : ..., description : ..., <etc...> }, de_DE : ..., de_CH : ..., <... and so on for other locales... > inventory: 423}

Product Catalog: Original Schema

Page 13: Webinar: Choosing the Right Shard Key for High Performance and Scale

Key Takeaways●When doing a major version/storage-engine upgrade, test in staging with some proportion of production data/workload

●Sometimes putting everything into one document is counter productive

Page 14: Webinar: Choosing the Right Shard Key for High Performance and Scale

Story #3: Wild success uncovers the wrong shard key●Started out as error “[Balancer] caught

exception … tag ranges not valid for: db.coll”

●11 shards, they had added 2 new shards to keep up traffic - 400+ databases

●Lots of code changes ahead of the Superbowl

●Spotted slow 300+s queries, decided to build some indexes without telling us

●Went production down

Page 15: Webinar: Choosing the Right Shard Key for High Performance and Scale

Adding Shards

2 More Shards….

Page 16: Webinar: Choosing the Right Shard Key for High Performance and Scale

The “Golden Hammer” Tendency

Page 17: Webinar: Choosing the Right Shard Key for High Performance and Scale

Diagnosing the issues #1●The red-herring hunt begins●Transparent Huge Pages enabled - production

●Chaotic call - 20 people talking at once, then in the middle of the call everything started working again

●Barrage of tickets and calls●Connection storms

Page 18: Webinar: Choosing the Right Shard Key for High Performance and Scale

Using mtools to analyse logs - conn churn

Page 19: Webinar: Choosing the Right Shard Key for High Performance and Scale

Diagnosing the issues #2●Got inconsistent and missing log files●Discovered repeated scatter-gather queries returning the same results

●Secondary reads●Heavy load on some shards and low disk space

Page 20: Webinar: Choosing the Right Shard Key for High Performance and Scale

Insert load on two shards (from Cloud Manager)

Page 21: Webinar: Choosing the Right Shard Key for High Performance and Scale

Diagnosing the issues #3● Shard key - string with year/month & customer id

{ _id : ObjectId("4c4ba5e5e8aabf3"), count: 1025, changes: { … } modified : { date : "2015_02", customerId: 314159 }}

Page 22: Webinar: Choosing the Right Shard Key for High Performance and Scale
Page 23: Webinar: Choosing the Right Shard Key for High Performance and Scale

Diagnosing the issues #4●First heard about DDOS attack●Missing tag ranges on some collections●Stopped the balancer which reduced system load from chunk moves

●Two clusters had a mongos each on the same server

Page 24: Webinar: Choosing the Right Shard Key for High Performance and Scale

Fixing the issues●Script to fix the tag ranges●Proposed finer granularity shard key - but

this was not possible because of 30TB of data

●Moved mongos to dedicated servers●Re-enable the balancer for short windows

with waitForDelete and secondaryThrottle●Put together scripts to pre-split and move

empty chunks to quiet shards based on traffic from month before

Page 25: Webinar: Choosing the Right Shard Key for High Performance and Scale

Monthly pre-split and move chunks

{ date : "2015_03",

customerId: min-500

customerId: 314159

customerId: 501-10000

customerId: 10001-300k

customerId: 300k-314158

customerId: 314160-max

Page 26: Webinar: Choosing the Right Shard Key for High Performance and Scale

The diagnosis in retrospect●The outage did not appear to have been

related to either the invalid tag ranges or the earlier failed moves

●The step downs did not help resolve the outage but did highlight some queries that need to be fixed

●The DDoS was the ultimate cause of the outage - lead to diagnosis of deeper issues

●The deepest issue was the shard key

Page 27: Webinar: Choosing the Right Shard Key for High Performance and Scale

Aftermath and lessons learned●Signed up for a Named TSE●Now doing pre-split and move before the end of every month

●Check before making other changes (i.e. building new indexes)

Page 28: Webinar: Choosing the Right Shard Key for High Performance and Scale

Key takeaways for you●Choosing a shard key is a pivotal decision - make it carefully

●Understand current bottleneck●Monitor insert distribution and chunk ranges

●Look for slow queries (logs & mtools)●Run mongos, mongod, config server on dedicated server or use containers/cgroups

Page 29: Webinar: Choosing the Right Shard Key for High Performance and Scale

Further ReadingProduction notesdocs.mongodb.org/manual/administration/production-notes

Mtoolsgithub.com/rueckstiess/mtools

Previous Webinarsmongodb.com/presentations

Page 30: Webinar: Choosing the Right Shard Key for High Performance and Scale

Ger HartnettDirector Technical Services (EMEA), MongoDB

@ghartnett #MongoDB

Questions?

Page 31: Webinar: Choosing the Right Shard Key for High Performance and Scale

●You can submit questions via the chat box

●We are recording and will send slides Friday

Questions

Page 32: Webinar: Choosing the Right Shard Key for High Performance and Scale

Code GerHartnett gets 25% discount