Aggregation New framework in MongoDB Alvin Richards Technical Director, EMEA [email protected] @jonnyeight 1
AggregationNew framework in MongoDB
Alvin Richards
Technical Director, [email protected]
@jonnyeight
1
What problem are we solving?
• Map/Reduce can be used for aggregation…• Currently being used for totaling, averaging, etc
• Map/Reduce is a big hammer• Simpler tasks should be easier
• Shouldn’t need to write JavaScript• Avoid the overhead of JavaScript engine
• We’re seeing requests for help in handling complex documents• Select only matching subdocuments or arrays
2
How will we solve the problem?
• New aggregation framework• Declarative framework (no JavaScript)• Describe a chain of operations to apply• Expression evaluation
• Return computed values• Framework: new operations added easily• C++ implementation
3
Aggregation - Pipelines
• Aggregation requests specify a pipeline• A pipeline is a series of operations• Members of a collection are passed
through a pipeline to produce a result• e.g. ps -ef | grep -i mongod
4
Example - twitter{
"_id" : ObjectId("4f47b268fb1c80e141e9888c"),
"user" : {
"friends_count" : 73,
"location" : "Brazil",
"screen_name" : "Bia_cunha1",
"name" : "Beatriz Helena Cunha",
"followers_count" : 102,
}
}
• Find the # of followers and # friends by location
5
Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
6
Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
Predicate
7
Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
Predicate
Parts of the document you want to project
8
Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });
Predicate
Parts of the document you want to project
Function to apply to the
result set
9
Example - twitter{ "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 },... ], "ok" : 1}
10
Pipeline Operations• $match
• Uses a query predicate (like .find({…})) as a filter• $project
• Uses a sample document to determine the shape of the result (similar to .find()’s optional argument)• This can include computed values
• $unwind• Hands out array elements one at a time
• $group• Aggregates items into buckets defined by a key
11
Pipeline Operations (continued)
• $sort• Sort documents
• $limit• Only allow the specified number of
documents to pass• $skip
• Skip over the specified number of documents
12
Computed Expressions
• Available in $project operations• Prefix expression language
• $add:[“$field1”, “$field2”]• $ifNull:[“$field1”, “$field2”]• Nesting:
$add:[“$field1”, $ifNull:[“$field2”, “$field3”]]• Other functions….
• $divide, $mod, $multiply
13
Computed Expressions
• String functions• $toUpper, $toLower, $substr
• Date field extraction• $year, $month, $day, $hour...
• Date arithmetic• $ifNull• Ternary conditional
• Return one of two values based on a predicate
14
Projections
• $project can reshape results• Include or exclude fields• Computed fields
• Arithmetic expressions• Pull fields from nested documents to the top• Push fields from the top down into new virtual
documents
15
Unwinding
• $unwind can “stream” arrays• Array values are doled out one at time in the
context of their surrounding documents• Makes it possible to filter out elements before
returning
16
Grouping
• $group aggregation expressions• Define a grouping key as the _id of the result• Total grouped column values: $sum• Average grouped column values: $avg• Collect grouped column values in an array or
set: $push, $addToSet• Other functions
• $min, $max, $first, $last
17
Sorting
• $sort can sort documents• Sort specifications are the same as today,
e.g., $sort:{ key1: 1, key2: -1, …}
18
DemoDemo files are at https://gist.github.com/2036709
19
Usage Tips
• Use $match in a pipeline as early as possible• The query optimizer can then be used to
choose an index and avoid scanning the entire collection
• Use $sort in a pipeline as early as possible• The query optimizer can sometimes be used
to choose an index to scan instead of sorting the result
20
Driver Support
• Initial version is a command• For any language, build a JSON database
object, and execute the command• { aggregate : <collection>, pipeline : {…} }
• Beware of result size limit of 16MB
21
When is this being released?
• Now!• 2.1.0 - unstable• 2.2.0 - stable (soon)
22
Sharding support
• Initial release supports sharding• Mongos analyzes pipeline
• forwards operations up to $group or $sort to shards
• combines shard server results and returns them
23
Pipeline Operations – Future
• $out• Saves the document stream to a collection• Similar to M/R $out, but with sharded output• Functions like a tee, so that intermediate
results can be saved
24
Documentation, Bug Reports• http://www.mongodb.org/display/DOCS/
Aggregation+Framework
• https://jira.mongodb.org/browse/SERVER/component/10840
25
@mongodb
conferences, appearances, and meetupshttp://www.10gen.com/events
http://bit.ly/mongoE Facebook | Twitter | LinkedIn
http://linkd.in/joinmongo
download at mongodb.org
26