Lecture 3: Document Databases – MongoDB – Aggregation, Map-Reduce, and Distributed Operation Databases (3): NoSQL & Deductive Databases Martin Homola, Ján Kľuka , Alexander Šimko, Jozef Šiška Department of Applied Informatics Faculty of Mathematics, Physics and Informatics Comenius University in Bratislava 7 Oct 2021
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Martin Homola, Ján Kľuka, Alexander Šimko, Jozef Šiška
Department of Applied InformaticsFaculty of Mathematics, Physics and Informatics
Comenius University in Bratislava
7 Oct 2021
Aggregation
Aggregation in MongoDB
I So far, we have only done CRUD operationsI MongoDB can also perform aggregation operations:
I Single-purpose: counting and distinct values collectionI Map-Reduce processingI Aggregation pipelines
I Data and work on aggregation operations can bedistributed over many nodes
I Allows processing of “big data” –data sets too large to fit into or be processed by one machine
Aggregation Simple Aggregation
Simple Aggregation
I Counting:db.telefony.countDocuments( { "casti.cislo": { $gte: 190000 } } )
I Distinct values selection:db.telefony.distinct( "casti.predvolba",
{ "casti.cislo": { $gte: 190000 } } )
I countDocuments and distinct are actually computed using muchmore powerful aggregation pipelines
Aggregation Map-Reduce
Map-Reduce Aggregation
I Map-Reduce is quite general mass data-processing frameworkI Collection is processed in 2 main stages:
1. A map JavaScript function takes each documentand emit()s zero or more key-value pairs.
2. Mongo groups emitted pairs by keys.3. All values for one key are sent to a reduce JS function
to produce a single value.
Aggregation Map-Reduce
Map-Reduce Example
Aggregation Map-Reduce
Reduce Requirements and Additional Tools
I The reduce function must (be):I Produce a value of the same type as all input valuesI Associative: reduce(k, [u, reduce(k, [v , w ])]) = reduce(k, [u, v , w ])I Commutative: reduce(k, [u, v ]) = reduce(k, [v , u])I Idempotent: reduce(k, [reduce(k, vs)]) = reduce(k, vs)
I MongoDB also allows:I Selection of documents by a query and a limitI Pre-sortingI Post-processing by a finalize JS functionI outputting the results as a new collection
Aggregation Map-Reduce
mapReduce() Example – AverageLet’s compute for each regionthe average number of inhabitants of its towns:
I aggregate() processes a collection using a multi-stage pipelineI Similar to Unix shell pipelinesI Less flexible than Map-Reduce, but without slow JavaScriptI Declarative pipeline description
=⇒ can be optimized by the serverI Can also process change streams
Aggregation Aggregation Pipeline
Aggregation Pipeline example
Aggregation Aggregation Pipeline
aggregate() Example – Match, Group, and Sort
Let’s sort regions by the average number of inhabitants of their towns:
Joins with $lookup$lookup adds to every document the array all docs from anothercollection with a matching value of some field ' left outer joindb.sidla.aggregate( [
Pipeline stages can:I select documents ($match), limit, skip;I group by a field or expression ($group, $bucket) and in each group
accumulate the values of other fields (sum, min, max, avg, collect toan array, . . . );
I join documents with documents from a collection ($lookup)also recursively ($graphLookup)
I explode an array field into one document for each element ($unwind);I project, set, unset, and compute fields;I sort documents;I merge with an existing collection or output to a new collection...
Distribution
Distributed Operation
I MongoDB can distribute big collections over multiple server nodesI A collection can be partitioned into shards –
distribution of storage and processing powerI Unshared collections and shards can replicated in replica sets –
safety, distribution of processing power for readsI Multi-document transactions available only in replica sets
(but a single “replica”, i.e., just the original, is enough)I Aggregation pipelines and map-reduce run distributed over shards
Sharding config server, shard servers, and routers must be started
sh.addShard("mongo1.example.net:27017")sh.addShard("mongo2.example.net:27017")sh.enableSharding("prednaska")// Shard a collection by a keysh.shardCollection("prednaska.telefony",