Database Document MONGODB Database R&D Documentation Document Control Document Location Location \\10.0.1.96\Team Shares\DB_Activities\ MongoDB\ Template History Author Date Version Changes Tharun B 07/09/20 15 1.0 Document Created Revision History Author Date Vers ion Revision Description Tharun B 07/09/201 5 1.0 Introduction to MongoDB Tharun B 07/09/201 5 1.0 Installation Tharun B 10/09/201 5 1.1 CRUD Operations, Aggregations, Data Modelling & Referencing Techniques and Indexing Tharun B 16/09/201 1.2 Regular Expressions, Sequences, Capped 1 | Page
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Database Document
MONGODB
Database R&D Documentation
Document Control
Document Location
Location
\\10.0.1.96\Team Shares\DB_Activities\ MongoDB\
Template History
Author Date Version Changes
Tharun B 07/09/2015 1.0 Document Created
Revision History
Author Date Version
Revision Description
Tharun B 07/09/2015 1.0 Introduction to MongoDB
Tharun B 07/09/2015 1.0 Installation
Tharun B 10/09/2015 1.1 CRUD Operations, Aggregations, Data Modelling & Referencing Techniques and Indexing
Tharun B 16/09/2015 1.2 Regular Expressions, Sequences, Capped Collections
Tharun B 16/09/2015 1.3 Creating JavaScript Functions
Tharun B 18/09/2015 1.4 Map Reduce Paradigm
Tharun B 21/09/2015 1.5 Operators, Commands and Mongo Shell Methods.
1 | P a g e
Database Document
Approvals
Name Role Reviewed Version Approved
Approval Date
Kishore Kumar I Lead – DBA No NA NA
2 | P a g e
Database Document
WHAT IS MONGODB?
MongoDB is a cross platform, document oriented database that provides, high performance, high availability and easy scalability.
MongoDB works on concept of Collection and document.
If you have one of the following challenges, you should consider MongoDB:
You Expect a High Write Load
MongoDB by default prefers high insert rate over transaction safety. If you need to load tons of data lines with a low business value for each one, MongoDB should fit.
You need High Availability in an Unreliable Environment (Cloud and Real Life)
Setting replica Set (set of servers that act as Master-Slaves) is easy and fast. Moreover, recovery from a node (or a data center) failure is instant, safe and automatic.
You need to Grow Big (and Shard Your Data)
Databases scaling is hard (a single MySQL table performance will degrade when crossing the 5-10GB per table). If you need to partition and shard your database, MongoDB has a built in easy solution for that.
Your Data is Location Based
MongoDB has built in spacial functions, so finding relevant data from specific locations is fast and accurate. Geographically redundant replicas can be deployed that logically represented as a single database.
Your Data Set is Going to be Big (starting from 1GB) and Schema is Not Stable
Adding new columns to RDBMS can lock the entire database in some database, or create a major load and performance degradation in other. Usually it happens when table size is larger than. As MongoDB is schema-less, adding a new field, does not effect old rows (or documents) and will be instant.
3 | P a g e
Database Document
You Don't have a DBA
If you don't have a DBA, and you don't want to normalize your data and do joins, you should consider MongoDB. Not only in terms of DBA’s , but also if you have regularly changing schemas where using rdbms is little difficult, one should consider MongoDB as MongoDb doesn’t have fixed schema.
Based on the above considerations, the MongoDB is suitable for applications that deal with the following things:
Account and user profiles: can store arrays of addresses with ease
Form data: MongoDB makes it easy to evolve the structure of form data over time
Blogs / user-generated content: can keep data with complex relationships together in one object
Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas
System configuration: just a nice object graph of configuration values, which is very natural in MongoDB
Log data of any kind: structured log data is the future
Graphs: just objects and pointers – a perfect fit
Location based data: MongoDB understands geo-spatial coordinates and natively supports geo-spatial indexing
The main advantages of MongoDB over any RDBMS are the following:
5 | P a g e
Database Document
Schema less - MongoDB is document database in which one collection holds different documents, number of fields, contract and size of the document can be differ from one documents to another.
Structure of a single object is clear. No complex joins. Deep query ability - Its supports dynamic queries on documents using a document-based query
language and its powerful as SQL. Easy to Scale-out(Horizontal Scalable).
How data is Modeled in MongoDB?
Data is stored in the form of JSON documents. A simple Key-value pair JSON looks like :
6 | P a g e
Database Document
{
_id: ObjectId(7df78ad8902c)
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
by: 'tutorials point',
url: 'http://www.tutorialspoint.com',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100,
comments: [
{
user:'user1',
message: 'My first comment',
dateCreated: new Date(2011,1,20,2,15),
like: 0
},
{
user:'user2',
message: 'My second comments',
dateCreated: new Date(2011,1,25,7,45),
like: 5
}
]
}
7 | P a g e
Database Document
DATA MODELLING IN MONGODB
Modelling relationships
Relationships in MongoDB represent how various documents are logically related to each other.
Relationships can be modeled via Embedded and Referenced approaches. Such relationships
can be either 1:1, 1: N, N: 1 or N: N.
Let us consider the case of storing addresses for users. So, one user can have multiple addresses
making this a 1: N relationship.
8 | P a g e
Database Document
Following is the sample document structure of user document:
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"name": "Tom Hanks",
"contact": "987654321",
"dob": "01-01-1991"
}
Following is the sample document structure of address document:
{
"_id":ObjectId("52ffc4a5d85242602e000000"),
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
}
MODELING EMBEDDED RELATIONSHIPS
In the embedded approach, we will embed the address document inside the user document.
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address": [
{
"building": "22 A, Indiana Apt",
"pincode": 123456,
9 | P a g e
Database Document
"city": "Los Angeles",
"state": "California"
},
{
"building": "170 A, Acropolis Apt",
"pincode": 456789,
"city": "Chicago",
"state": "Illinois"
}]
}
This approach maintains all the related data in a single document which makes it easy to retrieve
and maintain. The whole document can be retrieved in a single query like this:
On Windows MongoDB requires Windows Server 2008 R2, Windows Vista, or later.
Sources
We have two ways to download mongodb from the official site those are
1. .zip file 2. .msi file
The .msi installer includes all other software dependencies and will automatically upgrade any older version of MongoDB installed using an .msi file.
BEFORE DOWNLOADING MONGODB FILES FROM OFFICIAL SITE WE HAVE TO DETERMINE WHICH MONGODB BUILD YOU NEED BECAUSE THERE ARE THREE BUILDS OF MONGODB FOR WINDOWS THOSE ARE LISTED BELOW.
MongoDB for Windows 64-bit: Runs only on Windows Server 2008 R2, Windows 7 64-bit, and newer versions of Windows. This build takes advantage of recent enhancements to the Windows Platform and cannot operate on older versions of Windows.
MongoDB for Windows 32-bit: Runs on any 32-bit version of Windows newer than Windows Vista. 32-bit versions of MongoDB are only intended for older systems and for use in testing and development systems. 32-bit versions of MongoDB only support databases smaller than 2GB.
MongoDB for Windows 64-bit Legacy: Runs on Windows Vista, Windows Server 2003, and Windows Server 2008 and does not include recent performance enhancements.
To find the version of windows you are running, open the command prompt and enter wmic os get architecture.
Download the latest production release of MongoDB from the MongoDB downloads page. Ensure you download the correct version of MongoDB for your Windows system. The 64-bit versions of MongoDB do not work with 32-bit Windows.
Installation Steps:
1. After downloading .msi file open Downloads folder in your system and run .msi file (mongodb-win32-x86_64-2008plus-ssl-3.0.4-signed).
By clicking on .msi file the first step is welcome window
2. Click ‘Next’ to move forward for the installation in this step we have accept the license agreement.
16 | P a g e
Database Document
3. Click ‘Next’ to move step 3, In this step you have to select one option out of two either ‘complete’ or ‘custom’. Description about two options preset in below image.
17 | P a g e
Database Document
4. Now Mongodb is ready to install just click on install button to move forward.
18 | P a g e
Database Document
5. In this step we can see the installation status once installation of mongodb was succeeded it will move to next step.
6. Just click ‘Finish’ Button to finish the installation
19 | P a g e
Database Document
After installation of mongodb open program files folder in your system there you can find MONGODB folder. If you go through that folder you will identify some list of sub folders like Server, 3.0, bin. If you open bin folder you can identify some list .exe and .dll files will be present. Description of those file will be listed below.
Component Set BinariesServer mongod.exeRouter mongos.exe
Mongodb stores data in db folder within data folder. But, since this data folder is not created automatically, you have to create it manually. Remember that data directory should be created in the root (i.e. C:\ or D:\ or so).
In this document we have mongodb folder in C:\ drive so we have created data with db folders in C drive itself.
2. Start Mongodb: To start mongodb type mongod in command prompt like below.
Open bin folder from mongodb folder then right click on bin and select ‘open command window here’ then command prompt window will open like below.
21 | P a g e
Database Document
After that type mongod
Next click Enter command then server will start .For authentication ensured ,type mongod -- auth or for help type mongod -- help. See help for more options.
After successfully starting the server it will shows the port number like below.
22 | P a g e
Database Document
After that, we have to start client i.e mongo for this you have to open one more command prompt window as like above and type mongo. The connection can be made to a remote server if IP and port number is known and mongod server is running .
By default test database will be created at that time of installation itself. We can create our own databases with using ‘USE’ command. Example: use mongodb_sysbiz. This will create mongodb_sysbiz DB.
23 | P a g e
Database Document
Note: Mongodb is pure case sensitive.
For listing number of collections present in a database we will use ‘show collections’ command. For displaying Databases at least one collection should be present in a database. For creating collection use db.createcollection (“sysbiz”) it will create sysbiz collection.
CRUD OPERATIONS:
Already we’ve seen how to use “use” command to create a database.
“use trainee_sysbiz” will create a database with the name trainee_sysbiz with some default amount of memory allocated.
Similarly use the “use” command to switch between two dbs.
>use mydb
switched to db mydb
If you want to check your databases list, then use the command “show dbs”.
>show dbs
local 0.78125GB
test 0.23012GB
Your created database (mydb) is not present in list. To display database you need to insert atleast
one document into it.
>db.movie.insert({"name":"SPIDERMAN"})
>show dbs
local 0.78125GB
mydb 0.23012GB
test 0.23012GB
In mongodb default database is test. If you didn't create any database then collections will be
stored in test database.
THE DROPDATABASE() METHOD
24 | P a g e
Database Document
SYNTAX:Basic syntax of dropDatabase() command is as follows:
db.dropDatabase()
This will delete the selected database. If you have not selected any database, then it will delete
default 'test' database
EXAMPLE:First, check the list available databases by using the command show dbs
>show dbs
local 0.78125GB
mydb 0.23012GB
test 0.23012GB
>
If you want to delete new database <mydb>, then dropDatabase() command would be as follows:
>use mydb
switched to db mydb
>db.dropDatabase()
>{ "dropped" : "mydb", "ok" : 1 }
>
Now check list of databases
>show dbs
local 0.78125GB
test 0.23012GB
>
THE CREATECOLLECTION() METHOD
25 | P a g e
Database Document
MongoDB db.createCollection(name, options) is used to create collection, with options on size and
indexing. With inserting documents into a collection a collection is created implicitly. But to have
options on indexes and size of the collection we proceed by this method.
SYNTAX:Basic syntax of createCollection() command is as follows
db.createCollection(name, options)
In the command, name is name of collection to be created. Options is a document and used to
specify configuration of collection
Parameter Type Description
Name String Name of the collection to be created
Options Document (Optional) Specify options about memory size and indexing
Options parameter is optional, so you need to specify only name of the collection. Following is the
list of options you can use:
Field Type Description
capped Boolean (Optional) If true, enables a capped collection. Capped collection is a collection fixed size collecction that automatically overwrites its oldest entries when it reaches its maximum size. If you specify true, you need to specify size parameter also.
autoIndexID Boolean (Optional) If true, automatically create index on _id field.s Default value is false.
size number (Optional) Specifies a maximum size in bytes for a capped collection. If If capped is true, then you need to specify this
26 | P a g e
Database Document
field also.
max number (Optional) Specifies the maximum number of documents allowed in the capped collection.
While inserting the document, MongoDB first checks size field of capped collection, then it checks
max field.
EXAMPLES:Basic syntax of createCollection() method without options is as follows
>use test
switched to db test
>db.createCollection("mycollection")
{ "ok" : 1 }
>
You can check the created collection by using the command show collections
>show collections
mycollection
system.indexes
Following example shows the syntax of createCollection() method with few important options:
The significance of the options will be seen at a later stage.
For the above given example equivalent where clause will be ' where by='tutorials point' AND title='MongoDB Overview' '. You can pass any number of key, value pairs in find clause.
OR IN MONGODBSYNTAX:To query documents based on the OR condition, you need to use $or keyword. Basic syntax
Below given example will show the documents that have likes greater than 100 and whose title is
either 'MongoDB Overview' or by is 'tutorials point'. Equivalent sql where clause is 'where likes>10 AND (by = 'tutorials point' OR title = 'MongoDB Overview')'
We often come across situations where we need to limit the number of documents returned by a query. Few a times we also fall in a situation where we need to select few documents by skipping a few count. For such requirements , we have :
THE LIMIT() METHOD
To limit the records in MongoDB, you need to use limit() method. limit() method accepts one
number type argument, which is number of documents that you want to displayed.
SYNTAX:Basic syntax of limit() method is as follows
>db.COLLECTION_NAME.find().limit(NUMBER)
EXAMPLEConsider the collection myycol has the following data
ensureIndex() method also accepts list of options (which are optional), whose list is given below:
Parameter Type Description
background Boolean Builds the index in the background so that building an index does not block other database activities. Specify true to build in the background. The default value is false.
unique Boolean Creates a unique index so that the collection will not accept insertion of documents where the index key or keys match an existing value in the index. Specify true to create a unique index. The default value is false.
name string The name of the index. If unspecified, MongoDB generates an index name by concatenating the names of the indexed fields and the sort order.
42 | P a g e
Database Document
dropDups Boolean Creates a unique index on a field that may have duplicates. MongoDB indexes only the first occurrence of a key and removes all documents from the collection that contain subsequent occurrences of that key. Specify true to create unique index. The default value is false.
sparse Boolean If true, the index only references documents with the specified field. These indexes use less space but behave differently in some situations (particularly sorts). The default value is false.
expireAfterSeconds integer Specifies a value, in seconds, as a TTL to control how long MongoDB retains documents in this collection.
v index version
The index version number. The default index version depends on the version of mongodb running when creating the index.
weights document The weight is a number ranging from 1 to 99,999 and denotes the significance of the field relative to the other indexed fields in terms of the score.
default_language string For a text index, the language that determines the list of stop words and the rules for the stemmer and tokenizer. The default value is English.
language_override string For a text index, specify the name of the field in the document that contains, the language to override the default language. The default value is language.
WHAT IS A COVERED QUERY
A covered query is a query in which:
43 | P a g e
Database Document
all the fields in the query are a part of an index and
all the fields returned in the query are in the same index
Since all the fields present in the query are part of an index, MongoDB matches the query
conditions and returns the result using the same index without actually looking inside documents.
Since indexes are present in RAM, fetching data from indexes is much faster as compared to
fetching data by scanning documents. For better understanding check out the following example.
USING COVERED QUERIES
To test covered queries, consider the following document in users collection:
{"_id": ObjectId("53402597d852426020000002"),
"dob": "01-01-1991",
"gender": "M",
"name": "Tom Benzamin",
"user_name": "tombenzamin"
}
We will first create a compound index for users collection on fields gender and user_name using
following query:
>db.users.ensureIndex({gender:1,user_name:1})
Now, this index will cover the following query:
>db.users.find({gender:"M"},{user_name:1,_id:0})
That is to say that for the above query, MongoDB would not go looking into database documents.
Instead it would fetch the required data from indexed data which is very fast.
Since our index does not include _id field, we have explicitly excluded it from result set of our query
as MongoDB by default returns _id field in every query. So the following query would not have been
covered inside the index created above:
>db.users.find({gender:"M"},{user_name:1})
Lastly, remember that an index cannot cover a query if:
44 | P a g e
Database Document
any of the indexed fields is an array
any of the indexed fields is a subdocument
Aggregation
Aggregations are operations that process data records and return computed results. MongoDB provides a rich set of aggregation operations that examine and perform calculations on the data sets. Like queries, aggregation operations in MongoDB use collections of documents as an input and return results in the form of one or more documents. The basic overview of aggregation is given below, followed by detailed examples.
THE AGGREGATE() METHOD
For the aggregation in mongodb you should use aggregate() method.
SYNTAX:Basic syntax of aggregate() method is as follows
Sql equivalent query for the above use case will be:
Select by_user, count(*) from mycol group by by_use
Below is a list of available aggregation expressions:
Expression Description Example
$sum Sums up the defined value from all documents in the collection. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg Calculates the average of all given values from all documents in the collection. db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min Gets the minimum of the corresponding values from all documents in the collection.
$last Gets the last document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.
To implement a normalized database structure in MongoDB we use the concept of Referenced Relationships also referred to as Manual References in which we manually store the referenced
document's id inside other document. However, in cases where a document contains references
from different collections, we can use MongoDB DBRefs.
DBREFS VS MANUAL REFERENCES
As an example scenario where we would use DBRefs instead of Manual References, consider a
database where we are storing different types of addresses (home, office, mailing, etc) in different
collections (address_home, address_office, address_mailing, etc). Now, when a user collection's
document references an address, it also needs to specify which collection to look into based on the
address type. In such scenarios where a document references documents from many collections,
we should use DBRefs.
USING DBREFS
There are three fields in DBRefs:
$ref: This field specifies the collection of the referenced document
$id: This field specifies the _id field of the referenced document
$db: This is an optional field and contains name of the database in which the referenced document
Consider a sample user document having DBRef field address as shown below:
{
"_id":ObjectId("53402597d852426020000002"),
"address": {
"$ref": "address_home",
"$id": ObjectId("534009e4d852427820000002"),
"$db": "test"},
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin"
}
The address DBRef field here specifies that the referenced address document lies
inaddress_home collection under test database and has an id of 534009e4d852427820000002.
The following code dynamically looks in the collection specified by $ref parameter
(address_home in our case) for a document with id as specified by $id parameter in DBRef.
>var user = db.users.findOne({"name":"Tom Benzamin"})
>var dbRef = user.address
>db[dbRef.$ref].findOne({"_id":(dbRef.$id)})
The above code returns the following address document present in address_homecollection:
{
"_id" : ObjectId("534009e4d852427820000002"),
"building" : "22 A, Indiana Apt",
"pincode" : 123456,
"city" : "Los Angeles",
"state" : "California"
}
50 | P a g e
Database Document
Advanced Indexing:
Consider the following document of users collection:
{
"address": {
"city": "Los Angeles",
"state": "California",
"pincode": "123"
},
"tags": [
"music",
"cricket",
"blogs"
],
"name": "Tom Benzamin"
}
Here if we want to search any user documents based on field tags, we will create an index on tags array in the collection. Creating an index on array in turn creates separate index entries for each of its fields. So in our case when we create an index on tags array, separate indexes will be created for its values music, cricket and blogs.
To create an index on tags array, use the following code:
>db.users.ensureIndex({"tags":1})
After creating the index, we can search on the tags field of the collection like this:
>db.users.find({tags:"cricket"})
To verify that proper indexing is used, use the following explain command:
51 | P a g e
Database Document
>db.users.find({tags:"cricket"}).explain()
INDEXING SUB-DOCUMENT FIELDS:Suppose that we want to search documents based on city, state and pincode fields. Since all these
fields are part of address sub-document field, we will create index on all the fields of the sub-
document.
For creating index on all the three fields of the sub-document, use the following code:
Sometimes its really important to search a document whose content or data is known but completely unaware of collection, and field name. In such cases searching db for such data is merely done by querying all the tables. By creating text index on the fields of a collection its easy to query using the indexes. The Text Search uses stemming techniques to look for specified words in the string fields by dropping stemming stop words like a, an, the, etc. At present, MongoDB supports around 15 languages.
CREATING TEXT INDEX:Consider the following document under posts collection containing the post text and its tags:
{
52 | P a g e
Database Document
"post_text": "enjoy the mongodb articles by tharun",
"tags": ["mongodb","Tharun”]
}
We will create a text index on post_text field so that we can search inside our posts' text:
>db.posts.ensureIndex({post_text:"text"})
USING TEXT INDEX:Now that we have created the text index on post_text field, we will search for all the posts that have
word tutorialspoint in their text.
>db.posts.find({$text:{$search:"tharun"}})
The above command returned the following result documents having ‘tharun’ word in their post text:
{
"_id" : ObjectId("53493d14d852429c10000002"),
"post_text" : "enjoy the mongodb articles by tharun",
"tags" : [ "mongodb", "tharun" ]
}
{
"_id" : ObjectId("53493d1fd852429c10000003"),
"post_text" : "writing tutorials on mongodb",
"tags" : [ "mongodb", "tutorial" ]
}
If you are using old versions of MongoDB, you have to use the following command:
>db.posts.runCommand("text",{search:" tharun "})
Using Text Search highly improves the search efficiency as compared to normal search.
DELETING TEXT INDEX:To delete an existing text index, first find the name of index using following query:
>db.posts.getIndexes()
53 | P a g e
Database Document
After getting the name of your index from above query, run the following command.
Here,post_text_text is the name of the index.
>db.posts.dropIndex("post_text_text")
Indexing Limitations
EXTRA OVERHEAD:Every index occupies some space as well as causes an overhead on each insert, update and
delete. So if you rarely use your collection for read operations, it makes sense not to use indexes.
RAM USAGE:Since indexes are stored in RAM, you should make sure that the total size of the index does not
exceed the RAM limit. If the total size increases the RAM size, it will start deleting some indexes
and hence causing performance loss.
QUERY LIMITATIONS:Indexing can't be used in queries which use:
Regular expressions or negation operators like $nin, $not, etc.
Arithmetic operators like $mod, etc.
$where clause
Hence, it is always advisable to check the index usage for your queries.
INDEX KEY LIMITS:Starting from version 2.6, MongoDB will not create an index if the value of existing index field
exceeds the index key limit.
INSERTING DOCUMENTS EXCEEDING INDEX KEY LIMIT:MongoDB will not insert any document into an indexed collection if the indexed field value of this
document exceeds the index key limit. Same is the case with mongorestore and mongoimport
utilities.
54 | P a g e
Database Document
MAXIMUM RANGES:
A collection cannot have more than 64 indexes.
The length of the index name cannot be longer than 125 characters
A compound index can have maximum 31 fields indexed
Transactions In MongoDB:
MongoDB does not support multi-document atomic transactions. However, it does provide
atomic operations on a single document. So if a document has hundred fields the update statement
will either update all the fields or none, hence maintaining atomicity at document-level.
MODEL DATA FOR ATOMIC OPERATIONS
The recommended approach to maintain atomicity would be to keep all the related information
which is frequently updated together in a single document using embedded documents. This
would make sure that all the updates for a single document are atomic.
Consider the following products document:
{
"_id":1,
"product_name": "Samsung S3",
"category": "mobiles",
"product_total": 5,
"product_available": 3,
"product_bought_by": [
{
"customer": "john",
"date": "7-Jan-2014"
},
{
"customer": "mark",
55 | P a g e
Database Document
"date": "8-Jan-2014"
}
]
}
In this document, we have embedded the information of customer who buys the product in
the product_bought_by field. Now, whenever a new customer buys the product, we will first check
if the product is still available using product_available field. If available, we will reduce the value of
product_available field as well as insert the new customer's embedded document in the
product_bought_by field. We will use findAndModify command for this functionality because it
Map-reduce is a data processing paradigm for condensing large volumes of data into useful
aggregated results. MongoDB uses mapReduce command for map-reduce operations.
MapReduce is generally used for processing large data sets.
Following is the syntax of the basic mapReduce command:
>db.collection.mapReduce(
function() {emit(key,value);}, //map function
function(key,values) {return reduceFunction}, //reduce function
{
63 | P a g e
Database Document
out: collection,
query: document,
sort: document,
limit: number
}
)
The map-reduce function first queries the collection, then maps the result documents to emit key-
value pairs which is then reduced based on the keys that have multiple values.
In the above syntax:
map is a javascript function that maps a value with a key and emits a key-valur pair
reduce is a javscript function that reduces or groups all the documents having the same key
out specifies the location of the map-reduce query result
query specifies the optional selection criteria for selecting documents
sort specifies the optional sort criteria
limit specifies the optional maximum number of documents to be returned
USING MAPREDUCE:Consider the following document structure storing user posts. The document stores user_name of
the user and the status of post.
{
"post_text": "tutorialspoint is an awesome website for tutorials",
"user_name": "mark",
"status":"active"
}
Now, we will use a mapReduce function on our posts collection to select all the active posts, group
them on the basis of user_name and then count the number of posts by each user using the
following code:
>db.posts.mapReduce(
function() { emit(this.user_id,1); },
64 | P a g e
Database Document
function(key, values) {return Array.sum(values)},
{
query:{status:"active"},
out:"post_total"
}
)
The above mapReduce query outputs the following result:
{
"result" : "post_total",
"timeMillis" : 9,
"counts" : {
"input" : 4,
"emit" : 4,
"reduce" : 2,
"output" : 2
},
"ok" : 1,
}
The result shows that a total of 4 documents matched the query (status:"active"), the map function
emitted 4 documents with key-value pairs and finally the reduce function grouped mapped
documents having the same keys into 2.
To see the result of this mapReduce query use the find operator:
>db.posts.mapReduce(
function() { emit(this.user_id,1); },
function(key, values) {return Array.sum(values)},
{
query:{status:"active"},
out:"post_total"
}
65 | P a g e
Database Document
).find()
The above query gives the following result which indicates that both users tom and markhave two
posts in active states:
{ "_id" : "tom", "value" : 2 }
{ "_id" : "mark", "value" : 2 }
In similar manner, MapReduce queries can be used to construct large complex aggregation
queries. The use of custom Javascript functions makes usage of MapReduce very flexible and
powerful.
Operators , Database Commands and Mongo Shell Methods
For the reference of all the operators, database commands and Mongo Shell Methods used in MongoDB please refer the following links. The following links have complete info and with few examples.