How to visually spot and analyze slow MongoDB operations Kay Agahd idealo internet GmbH
How to visually spot and analyze slow MongoDB operations
Kay Agahdidealo internet GmbH
2
About me
● Current location: Berlin/Germany
● Education: Engineer's degree, Software Engineering
● Experience:
○ Software developer from 1998 - 2009 in Paris/France
○ Database engineer since 2012 at idealo in Berlin/Germany
● Certifications:
○ MongoDB certified Java developer (final grade 100%)
○ MongoDB certified DBA (final grade 96%)
linkedin.com/in/kayag/
3
About idealo
● founded in 2000
● Europe’s leading price
comparison website
and app
● Germany, Austria,
United Kingdom,
France, Italy and Spain
● > 1 billion offers online
(November 2018)
● fast growing
4
idealo and MongoDB
● different types of databases: Oracle, MySQL, PostgreSQL,
MongoDB
● MongoDB in production since v1.6 (ca. 2011)
● sharding in production since MongoDB v1.8
● MongoDB stores mainly offers for back-end usage
● > 2 billions docs in offerStore, up to 1 bn both read/writes per day
● > 10 billions docs in offerHistory
Motivation
Why we need a better MongoDB profiler
6
Review profiling
● MongoDB supports profiling of “slow” operations
● “slow” is a threshold to be set when turning profiling on (100 ms)
● profiler writes collected data to a capped collection of the profiled
database
● profiling per-database or per-instance on a running mongod
7
Inconveniences
● each mongod and database needs to be handled separately
● sharding: shards * repl. factor * databases = #profilers
● gives only a view on a limited time span due to capped collection
● profiling/analyzing may add stress to the profiled server
● different formats of “query” field makes querying more difficult
● bug: ops through mongos omit the user (JIRA: SERVER-7538)
8
idealo requirements
● easily switch on/off profiling, even for many mongod’s involved
● quick overview of types of slow-ops and their quantity within a
time period (“types” means op type, user, server, query shape, etc.)
● historical view to see how slow-ops evolve to extrapolate them
● discovering spikes in time or in slow-op types
● filtering by slow-op types and/or time range to drill down
● usable also by non-database-admins, e.g. software developers
What we’ve built
MongoDB slow operations profiler
10
How it works
DB 1
mongod 1
DB 2
DB n
DB n1
mongod n
DB n2
DB nm
slow ops app
profiler 1..n
profiler n..m
collector DB
11
Example of slow op document
{
"op": "query",
"ns": "offerStore.offer",
"query": {
"find": "offer",
"filter": {
"shopId": 292731,
"opIds": {
"$in": [
29337,5478
]
},
"offerStatus": {
"$gt": 0
}
},
"projection": {
"traceId": 1,
"bokey": 1,
"version": 1,
"offerTitle": 1
},
"batchSize":1000,
...
"shardVersion": [
Timestamp(14944, 25276),
ObjectId("591c6...8fcde")
]
},
"keysExamined": 2210852,
"docsExamined": 232,
"cursorExhausted": true,
"keyUpdates": 0,
"writeConflicts": 0,
"numYield": 17272,
"locks": {
"Global": {
"acquireCount": {
"r":
NumberLong("34546")}},
"Database": {
"acquireCount": {
"r":
NumberLong("17273")}},
"Collection": {
"acquireCount": {
"r":
NumberLong("17273")}}
},
"nreturned": 185,
"responseLength": 119954,
"protocol": "op_command",
"millis": 4189,
"execStats": {
"stage": "PROJECTION",
"nReturned": 185,
"executionTimeMillisEstimate": 3941,
"works": 2210853,
"advanced": 185,
... 124 lines omitted ...},
"ts": ISODate("2018-10-26T07:17:12.747Z"),
"client": "10.135.128.219",
"allUsers": [
{
"user": "__system",
"db": "local"
}
],
"user": "__system@local"
}
12
Condense slow op documents
{ "_id":ObjectId("5bd3090b68b5c4203f53ce7e"),
"ts":ISODate("2018-10-26T12:31:07.752Z"),
"adr":"host523.idealo.de",
"lbl":"offerStoreDE",
"rs":"offerStoreDE09",
"db":"offerStore",
"col":"offers"
"op":"getmore",
"fields":["shopId",
"opIds.$in",
"offerStatus.$gt"],
"sort":["_id"],
"nret":500,
"reslen":94656,
"millis":5322,
"user":"__system@local" }
13
Some numbers of the collector DB
● > 250 millions slow ops stored within the last > 5 years
● average doc size = 238 Bytes
● uncompressed data size ca. 55 GB
● index size < 9 GB
● total storage size (snappy compression) < 12 GB
14
Configuration
{
"collector":{
"hosts":["myCollectorHost_member1:27017",
"myCollectorHost_member2:27017",
"myCollectorHost_member3:27017"],
"db":"profiling",
"collection":"slowops",
"adminUser":"",
"adminPw":""
},
...
● Collector
15
Configuration
"profiled":[
{ "label":"dbs foo",
"hosts":["someHost1:27017", "someHost2:27017", "someHost3:27017"],
"ns":["someDB.someCollection", "anotherDB.*"],
"enabled": false
},
{ "label":"dbs bar",
"hosts":["someMongoRouter1:27017","someMongoRouter2:27017"],
"ns":["*.*"],
"adminUser":"kay",
"adminPw":"never.tell.it!:-)",
"enabled":false,
"slowMS":500,
"responseTimeoutInMs":2000
}
],...
● databases to be profiled
16
How it looks
17
How it looks - part 2
18
Slow ops diagram
2018/10/30 10:04 = Count:95
db=offerStore
coll=offers
op=query
fields=[shopId,mCC.$gt]
Duration: avg:322 max:990 sum:31.682 ms
19
Slow ops data table
20
Slow ops search form
21
Further benefits
● global collector allows to see evolution of slow ops
● aggregate slow ops of last minute grouped by label
● send metrics i.e. count, durations and nReturned to graphite
● build grafana dashboard
{$match: { ts: {$gt:from, $lt:to}}
},
{$group: {_id: {label:"$lbl"},
count:{$sum:1},
sumMs:{$sum:"$millis"},
maxMs:{$max:"$millis"},
sumNret:{$sum:"$nret"},
sumResplen:{$sum:"$reslen"}
}
}
22
Historical view per DBS
23
Further documentation
● Open source project, you are welcome to contribute:
https://github.com/idealo/mongodb-slow-operations-profiler
● Blog post 1 of 2:
https://medium.com/idealo-tech-blog/how-to-visually-spot-
and-analyze-slow-mongodb-operations-d91ac819e0de
● Blog post 2 of 2:
https://medium.com/idealo-tech-blog/mongodb-slow-
operations-analyzer-2-0-24da414fad13
We are hiring: jobs.idealo.de
24
Rate My Session