MongoDB Berlin Aggregation

AggregationNew framework in MongoDB

Alvin Richards

Technical Director, EMEAalvin@10gen.com

@jonnyeight

What problem are we solving?

• Map/Reduce can be used for aggregation…• Currently being used for totaling, averaging, etc

• Map/Reduce is a big hammer• Simpler tasks should be easier

• Shouldn’t need to write JavaScript• Avoid the overhead of JavaScript engine

• We’re seeing requests for help in handling complex documents• Select only matching subdocuments or arrays

How will we solve the problem?

• New aggregation framework• Declarative framework (no JavaScript)• Describe a chain of operations to apply• Expression evaluation

• Return computed values• Framework: new operations added easily• C++ implementation

Aggregation - Pipelines

• Aggregation requests specify a pipeline• A pipeline is a series of operations• Members of a collection are passed

through a pipeline to produce a result• e.g. ps -ef | grep -i mongod

Example - twitter{

"_id" : ObjectId("4f47b268fb1c80e141e9888c"),

"user" : {

"friends_count" : 73,

"location" : "Brazil",

"screen_name" : "Bia_cunha1",

"name" : "Beatriz Helena Cunha",

"followers_count" : 102,

• Find the # of followers and # friends by location

Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });

Predicate

Parts of the document you want to project

Predicate

Parts of the document you want to project

Function to apply to the

result set

Example - twitter{ "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 },... ], "ok" : 1}

Pipeline Operations• $match

• Uses a query predicate (like .find({…})) as a filter• $project

• Uses a sample document to determine the shape of the result (similar to .find()’s optional argument)• This can include computed values

• $unwind• Hands out array elements one at a time

• $group• Aggregates items into buckets defined by a key

Pipeline Operations (continued)

• $sort• Sort documents

• $limit• Only allow the specified number of

documents to pass• $skip

• Skip over the specified number of documents

Computed Expressions

• Available in $project operations• Prefix expression language

• $add:[“$field1”, “$field2”]• $ifNull:[“$field1”, “$field2”]• Nesting:

$add:[“$field1”, $ifNull:[“$field2”, “$field3”]]• Other functions….

• $divide, $mod, $multiply

Computed Expressions

• String functions• $toUpper, $toLower, $substr

• Date field extraction• $year, $month, $day, $hour...

• Date arithmetic• $ifNull• Ternary conditional

• Return one of two values based on a predicate

Projections

• $project can reshape results• Include or exclude fields• Computed fields

• Arithmetic expressions• Pull fields from nested documents to the top• Push fields from the top down into new virtual

documents

Unwinding

• $unwind can “stream” arrays• Array values are doled out one at time in the

context of their surrounding documents• Makes it possible to filter out elements before

returning

Grouping

• $group aggregation expressions• Define a grouping key as the _id of the result• Total grouped column values: $sum• Average grouped column values: $avg• Collect grouped column values in an array or

set: $push, $addToSet• Other functions

• $min, $max, $first, $last

Sorting

• $sort can sort documents• Sort specifications are the same as today,

e.g., $sort:{ key1: 1, key2: -1, …}

DemoDemo files are at https://gist.github.com/2036709

Usage Tips

• Use $match in a pipeline as early as possible• The query optimizer can then be used to

choose an index and avoid scanning the entire collection

• Use $sort in a pipeline as early as possible• The query optimizer can sometimes be used

to choose an index to scan instead of sorting the result

Driver Support

• Initial version is a command• For any language, build a JSON database

object, and execute the command• { aggregate : <collection>, pipeline : {…} }

• Beware of result size limit of 16MB

When is this being released?

• Now!• 2.1.0 - unstable• 2.2.0 - stable (soon)

Sharding support

• Initial release supports sharding• Mongos analyzes pipeline

• forwards operations up to $group or $sort to shards

• combines shard server results and returns them

Pipeline Operations – Future

• $out• Saves the document stream to a collection• Similar to M/R $out, but with sharded output• Functions like a tee, so that intermediate

results can be saved

Documentation, Bug Reports• http://www.mongodb.org/display/DOCS/

Aggregation+Framework

• https://jira.mongodb.org/browse/SERVER/component/10840

@mongodb

conferences, appearances, and meetupshttp://www.10gen.com/events

http://bit.ly/mongoE Facebook | Twitter | LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

alvin@10gen.com

MongoDB Berlin Aggregation

Documents

MongoDB Aggregation and Data...

Webinar: Applikationsentwicklung mit MongoDB: Teil 5:...

history using Comet Access to short term context Twitter...

MongoDB Aggregation Framework in action !

Data Analysis and Map-Reduce with MongoDB and pymongo ·...

MongoDB World 2016: MongoDB & IBM

MongoDB for Time Series Data Part 2: Analyzing Time Series.....

MongoDB for Time Series Data: Analyzing Time Series Data...

MongoDB and using MongoDB with .NET

Online FIB Aggregation without Update...

MongoDB: What, why, when. Solutions Architect, MongoDB Inc.....

Data Processing and Aggregation with MongoDB

Joins and Other Aggregation Enhancements Coming in MongoDB.....

MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey

MongoDB - Aggregation Pipeline · What is the Aggregation.....

MongoDB Aggregation Guide