Top Banner
Efficient CRUD Queries in MongoDB Tim Vaillancourt Sr. Technical Operations Architect Percona
50

Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Apr 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Efficient CRUD Queries in MongoDB

Tim VaillancourtSr. Technical Operations Architect

Percona

Page 2: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Agenda

●What is CRUD?●CRUD-related MongoDB Features●Troubleshooting

○ Explain○ Database Profiler○ Log File

●Read Preference●Read and Write Concerns●Deletion/Retention Strategies●Monitoring

Page 3: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

About Me

●Started at Percona in January 2016●Experience

○ Web Publishing■ Big-scale LAMP-based web infrastructures

○ Ecommerce■ Large Inventory SaaS at Amazon / AbeBooks

○ Gaming■ DevOps / NoSQL DBA at EA SPORTS■ DBA at EA DICE

Page 4: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

About Me

○ Technologies■ MongoDB, Cassandra and Couchbase■ MySQL■ Redis and Memcached■ RabbitMQ, Kafka and ActiveMQ■ Solr and Elasticsearch■ Mesos■ (Non-tech) Distributed Systems and Infrastructure Architecture

Page 5: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Terminology

● What is CRUD?○ Create

■ Operations that create entire documents, eg: .insert()■ Relatively light operation with low cache/disk IO impact

○ Read■ Operations that read documents, eg: .find(), .aggregate(), …■ Generally the main source of slowness

Page 6: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Terminology

● What is CRUD?○ Update

■ Operations that find and replace data, eg: .update()■ Expensive, operates similar to a .find() and updates data after

○ Delete■ Operations that remove documents, eg: .remove()

Page 7: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Terminology

● Document - a single MongoDB document (JSON/BSON)● Collection - a collection of documents, similar to “table”● Database - a grouping of MongoDB collections● Index - a BTree Index applied to a MongoDB collection● CRUD?

○ Create: operations that create documents, eg: .insert()○ Read: operations that read documents, eg: .find(), .aggregate(), …○ Update: operations that find and replace data, eg: .update()○ Delete: operations that deletes documents, eg: .delete()

Page 8: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Terminology

● What is efficient CRUD?○ Use the minimal server resources possible○ Scalable

● Metrics○ keysExamined - # of Index items examined○ docsExamined - # of Documents (cache or disk) examined○ Nreturned - # of Documents returned to client

● Lag○ The delay seen in reading data from a replication replica

Page 9: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Storage Engines

● MMAPv1: Default in 1.x and 2.x○ Good read performance, poor write performance○ No compression○ Collection-level Locking(!)

● WiredTiger: Default in 3.2+, available since 3.x○ Good read performance, good write performance○ Compression supported

● RocksDB: Available in Percona Server for MongoDB (or patch)○ Good 95%~ read performance, very good write performance○ Compression supported

Page 10: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Isolation and Atomicity

● Isolation○ Read Uncommitted

■ Any session can see a change, even before ack’d■ Essentially no isolation compared to RDBMs

○ Atomicity■ Single-document Update

● A reader will never see a partially-changed document■ Multi-document Update

● Multi-operation not atomic, no rollback

Page 11: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Replication

● Async changelog replication● Primary

○ Single Primary via election○ Serves Read and Write operations

● Secondary○ One or more, three required for reliable elections○ Serves Read queries only○ May take-over Primary in election

● Consistency○ Driver-level tunables for Read and Write consistency(!)

Page 12: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Insert Operations

● A Single-document Insert triggers:○ 1 x append to the journal○ 1 x document to be added to the data file(s)○ MMAPv1 Considerations

■ MMAP will scan for a free slot for the insert■ Shown as the “nmoved” metric

● Multi-document Insert Operations○ Can be used to batch inserts○ Improves insert performance in many cases○ Can be ordered <true/false> in options document

Page 13: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Insert Operations

● Write Concern accepted● RocksDB or WiredTiger recommended for high write volumes

Page 14: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Insert Operations

Example:db.users.insert([

{“username”: “tim”,“password”: 123456,

“createDate”: new Date(),},{...}

], {“writeConcern”: { w: ”majority” },ordered: true

})

Page 15: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Read Operations

● .find()○ Returns cursor of document(s) matching a set of conditions○ Option to specify which fields to return, default: full doc!

● .aggregate()○ Powerful framework for data aggregations, summaries, etc○ No .explain() support○ Scans entire collection unless using a $match as first stage

■ db.<coll>.find(<$match conditions>).explain() to explain!● .mapReduce()

○ Runs in JavaScript and offers little insight

Page 16: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Read Operations

● Read Concern○ Important for strict data integrity (often combined with WCs)

■ Replica Set failovers can cause some data “rollback”● “Rolledback” data is written to a json file in your dbPath

○ Tunable read consistency■ “local” = default, local node read only■ “majority” = read from a majority of members■ “linearizable” = read and ack’d from a majority of members

○ RC can be changed per query

Page 17: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Read Operations

● Example:

test1:PRIMARY> db.getMongo().setReadPref('primary')test1:PRIMARY> db.pages.find({title : "Geography of Guinea-Bissau"}, {_id: 1, title: 1}).pretty(){

"_id" : "0c25a313481757720f5e9e46b4cffddd08a13fef","title" : "Geography of Guinea-Bissau"

}

Page 18: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Update Operations

● Operates much like a .find() with special re-insertion● Cannot run .explain() on .update()

○ Convert the .update() match to a .find(<conds).explain()● Can change one or more documents

○ { multi: true } option required for multi-document updates○ Avoid Multi-document updates if possible

● MMAPv1 Considerations○ In many cases MMAP can do in-place updates (no replace)

● Provides many update operators to change the data○ Examples: $set, $inc, $mul, $min, $max

Page 19: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Update Operations

● Efficient Updates○ Use indexes on the match condition (very crucial)○ Avoid multi-document updates in your design

Page 20: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Update Operations

● Example:

test1:PRIMARY> db.pages.update({title : "Geography of Guinea-Bissau"},{$set: {title: "Geography of Guinea-Bisso"}

})WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Page 21: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Delete Operations

● Similar in execution impact to an .update()● Accepts array of delete conditions (batching)● Using TTL Index to Delete

○ Deletes documents based on MongoDB ISODate() objects○ Batches deletes by minute (by default)○ Doubles as an index for the date field

● Batch delete by field○ Mark a field as {deleted:true}○ Run controlled batch job to .remove({deleted:true})○ Requires some scripting / cronjob

Page 22: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Delete Operations

● RocksDB Considerations○ Compaction may struggle with very high delete volume○ “Toombstones” are written in place of the document

■ Considered by read operations until compaction

Page 23: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Delete Operations

● Example:

test1:PRIMARY> db.pages.remove({title : "Geography of Guinea-Bisso"})WriteResult({ "nRemoved" : 1 })

Page 24: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Indexing

● MongoDB supports BTree, text and geo indexes● Default behaviour

● Collection lock until indexing completes● {background:true}

○ Runs indexing in the background avoiding pauses○ Hard to monitor and troubleshoot progress○ Unpredictable performance impact

● Avoid drivers that auto-create indexes. Use real performance data● Too many indexes hurts performance● Indexes have a forward or backward direction

Page 25: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Indexing

● Compound Indexes○ Several fields supported○ Fields can be in forward or backward direction

■ Consider any .sort() query options and match sort direction!○ Read Left -> Right

■ Index can be partially-read■ Left-most fields do not need to be duplicated, example:

You have an index with fields: {status:1, date:1} and a 2nd with: {status:1}. {status:1} is duplicated!

Page 26: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Indexing

● Get Indexes Example:

test1:PRIMARY> db.pages.getIndexes()[

{"v" : 1,"key" : {

"_id" : 1},"name" : "_id_","ns" : "wikipedia.pages"

}]

Page 27: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Operation Profiling

● Writes slow database operations to a new MongoDB collection for analysis○ Capped Collection: “system.profile” in each database, default 1mb○ The collection is capped, ie: profile data doesn’t last forever○ Enable operationProfiling in “slowOp” mode

■ Start with a very high threshold and decrease it in stepsUsually 50-100ms is a good thresholdEnable in mongod.confoperationProfiling:

slowOpThresholdMs: 100mode: slowOp

Page 28: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Operation Profiling● op/ns/query: type, namespace and

query of a profile● keysExamined: # of index keys

examined● docsExamined: # of docs examined

to achieve result● writeConflicts: # of WCE

encountered during update● numYields: # of times operation

yielded for others● locks: detailed lock statistics

Page 29: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

.explain() and Profiler

● .explain() Example: Profiler:

Page 30: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Log File: Slow Queries, etc

● Interesting details are logged to the mongod/mongos log files○ Slow queries○ Storage engine details (sometimes)○ Index operations○ Chunk moves○ Connections

Page 31: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Query Efficiency

● Index Efficiency: keysExamined / nreturned● Document Efficiency: docsExamined / nreturned

● End goal: Examine only as many Documents as you return● Example: a query scanning 10 documents to return 1 has efficiency 0.1

Page 32: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Schema Design: Data Types

● Strings○ Only use strings if required○ Do not store numbers as strings!

■ Look for {field:“123456”} instead of {field:123456}■ “12345678” moved to a integer uses 25% less space■ Range queries on proper integers is more efficient

Page 33: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Schema Design: Data Types

● Strings■ Example JavaScript to convert a field in an entire collection:

db.items.find().forEach(function(x) {newItemId = parseInt(x.itemId); db.containers.update(

{ _id: x._id },{ $set: {itemId: itemId } }

)});

○ Do not store dates as strings!■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less

space!

Page 34: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Schema Design: Data Types

● Strings■ Do not store booleans as strings!

● “true” -> true = 47% less space wasted● DBRef

○ DBRefs provide pointers to another document○ DBRefs can be cross-collection

Page 35: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Schema Design: Data Locality

● MongoDB optimised for single-document operations● Single Document / Centralised

○ Greate cache/disk-footprint efficiency○ Centralised schemas may create a hotspot for write locking

● Multi Document / Decentralised○ MongoDB rarely stores data sequentially on disk○ Multi-document operations are less efficient○ Less potential for hotspots/write locking○ Increased overhead due to fan-out of updates

Page 36: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Schema Design: Data Locality

● Multi Document / Decentralised○ Suited for a background worker model

■ Example: Social Media status update, graph relationships, etc

Page 37: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Batching Operations

● Batching Inserts/Updates○ Requires less network commands○ Allows the server to do some internal batching○ Operations will be slower overall

■ Suited for queue worker scenarios batching many changes■ Traditional user-facing database traffic should aim to

operate on a single (or few) document(s)

Page 38: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Batching Operations

● Thread-per-connection model○ 1 x DB operation = 1 x CPU core only○ Executing Parallel Reads

■ Large batch queries benefit from several parallel sessions■ Break query range or conditions into several client->server

threads■ Not recommended for Primary nodes or Secondaries with

heavy reads

Page 39: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Read Preference / Scaling

● Controls the node type (not consistency) for a session:○ “primary” - Only read from Primary○ “primaryPreferred” - Read from Primary unless there is none○ “secondary” - Only read from Secondary○ “secondaryPreferred” - Read from Secondary unless there are

none○ “nearest” - Connect to the first node you can

● Using a secondary-focussed Read Preference provides read scalability○ Adding more secondaries provides more read capacity!○ 2/3rds of a replset is often sitting around idle!

Page 40: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Query Antipatterns

● No list of fields specified in .find()○ MongoDB returns entire documents unless fields are specified○ Only return the fields required for an application operation!○ Covered-index operations require only the index fields to be

specified● Using $where operators

○ This executes JavaScript with a global lock● Many $and or $or conditions

○ MongoDB doesn’t handle large lists of $and or $or efficiently

Page 41: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Monitoring MongoDB

● Methodology○ Monitor Frequently

■ Monitor every N seconds (not minutes!)■ Problems can begin/end in seconds

○ Iterative / Responsive■ “What graph would have told us the problem faster?”

○ Correlate Database and Operating System

Page 42: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Monitoring MongoDB

● Database○ Operation counters○ Cache Traffic and Capacity○ Checkpoint / Compaction Performance○ Concurrency Tickets (WiredTiger and RocksDB)○ Document and Index scanning○ Various engine-specific details

Page 43: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Monitoring MongoDB

● Operating System○ CPU○ Disk

■ Bandwidth / Util■ Average Wait Time

○ Memory○ Network

Page 44: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Monitoring MongoDB: Percona PMM

● Open-source database monitoring from Percona!

● Based on open-source technologies

● Simple deployment● Graphs in this demo are

from PMM● 800+ metrics per ping● Includes Linux OS

metrics

Page 45: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Monitoring MongoDB: Percona PMM QAN

● QAN (“Query Analyser”) for MongoDB!

○ Query data using the MongoDB profiler!

○ Still in Beta

Page 46: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Testing Performance and Capacity

General● Database

○ Restore Production backups○ Use mongoreplay or Parse/flashback tools to capture + replay real

database traffic○ Use PMM or other monitoring to analyse changes

● Full stack○ Try to emulate the real user traffic○ Add micro-pauses to simulate reality○ Cloud-based providers are great for running load generation

Page 47: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Themes

● Perform as few client->server operations as possible (batch)● For read or updates:

○ Consider if a query will benefit from an index○ Only request/set the data fields required

● Use the log file and profiler to find slow performers● Only add indexes if they’re required● Store data in the smallest data type possible● Use Secondaries to scale read traffic● Use Read/Write Concerns for strong integrity● Review performance data (profiler, monitoring, etc) on a schedule

Page 48: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support
Page 49: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Questions?

Page 50: Efficient CRUD Queries in MongoDB - Percona · Solr and Elasticsearch Mesos (Non-tech) Distributed Systems and Infrastructure Architecture. ... summaries, etc No .explain() support

Thanks for joining!Be sure to checkout

the Percona Blog for more technical blogs and topics!