1 NoSQL & MongoDB Part I Arindam Chatterjee
Jan 15, 2015
1
NoSQL & MongoDB Part I
Arindam Chatterjee
2
Introduction to NoSQL
• NoSQL stands for “Not only SQL”
• NoSQL is
– SQL for non-relational database management system
– Different from traditional relational database system
– designed for distributed data storage that
• typically not requires fixed schema,
• avoid join operations and
• scale horizontally
Used by Facebook, Google and other applications requiring large volume of unstructured Web application data
3
History of NoSQL
• RDBMS systems have limitations with respect to the following
– Scalability,
– Parallelization
– Cost
• Example: Google that gets billions of requests a month across applications which are geographically distributed.
• The above led to research on the following concepts
– GFS: Distributed files System
– Chubby: Distributed coordination system
– MapReduce: Parallel execution system
– Big Data: Column oriented database
4
NoSQL..Where to use
• NoSQL is useful in the following cases
– Online stores and portals like amazon where the transaction of an individual should not “lock” a database or part of a database
– Where “committed” transaction is not critical (e.g. a buyer orders an item and someone else clicks for the same item at the same time, one of them may end up not getting the item if the same is the last piece left. An “apology” mail and refund can sort the matter.
– Cost
• NoSQL SHOULD NOT be used in the following cases
– Stock exchanges or banking where transactions are critical, cached or state data will just not work
– Other non financial transactions where completion of transactions are critical
5
Benefits of NoSQL
• Schemaless data representation:
– Almost all NoSQL implementations offer schema-less data representation. This means that we do not have to think too far ahead to define a structure and we can continue to evolve over time, including adding new fields or even nesting the data, for example, in case of JSON representation.
• Development time:
– reduced development time time because one doesn’t have to deal with complex SQL queries and joins
• Speed:
– NoSQL databases are much faster than relational databases
• Ability to plan ahead for scalability:
– The applications can be quite elastic, can handle sudden spikes of load.
– Provides horizontal scalability and partitioning to new servers
6
List of NoSQL Databases
• Document Based: MongoDB, CouchDB, RavenDB, Terrastore
• Key Value: Redis, Membase, Voldemort
• XML based: BaseX, eXist
• Column based: BigTable, Hadoop/HBase, Cassandra, SimpleDB, Cloudera
• Graph bases: Neo4J, FLockDB, InfiniteGraph
7
Storage Types for NoSQL Databases
• Column Oriented storage– Data stored as columns as opposed to rows (in traditional RDBMS)– Used for On Line Analytical Processing type databases
• Example: We want to store the following information
• Advantages: – New column can be added without worrying for filling up default values of existing rows
Efficient for computing maxima, minima, averages and sums, specifically on large datasets
Employee ID First Name Last Name Dept
1234 Asim Das HR3242 Noel David Marketing5678 Raj Malhotra Production4543 Rohan Singh R&D
Traditional RDBMS Approach
• Data serialized as follows1234, Asim, Das, HR3242, Noel, David, Marketing5678, Raj, Malhotra, Production4543, Rohan, Singh, R&D
Column Oriented Approach
• Data stored as follows1234, 3242, 5678, 4543Asim, Noel, Raj, RohanDas,David, Malhotra, SinghHR, Marketing, Production, R&D
8
Storage Types for NoSQL Databases..2• Document Oriented storage
– Allows the inserting, retrieving, and manipulating of semi-structured data– Documents themselves act as records (or rows)– two records may have completely different set of fields or columns– The records may or may not adhere to a specific schema– Most of the databases available under this category use XML, JSON, BSON data types
• Example: Different record contain different level of Employee information as follows
{"EmployeeID": "SM1","FirstName" : "Anuj","LastName" : "Sharma","Age" : 45,"Salary" : 10000000}
{"EmployeeID": "MM2","FirstName" : "Anand","Age" : 34,"Salary" : 5000000,"Address" : {"Line1" : "123, 4th Street","City" : "Bangalore","State" : "Karnataka"},"Projects" : ["nosql-migration","top-secret-007"]}
Record 1 Record 2
9
Storage Types for NoSQL Databases..3• Key value storage
– Similar to document oriented storage with the following differences– Unlike a document store that can create a key when a new document is inserted, a key-value
store requires the key to be specified– Unlike a document store where the value can be indexed and queried, for a key-value store,
the value is opaque and as such, the key must be known to retrieve the value
• Advantages:– Key-value stores are optimized for querying against keys. – They serve great in-memory caches.
10
RDBMS vs NoSQL
RDBMS• Structured and organized data• Structured query language (SQL)• Data and its relationships are stored in
separate tables. • Follows ACID rules• Data Manipulation Language, Data
Definition Language • Tight Consistency
NoSQL
• No declarative query language• No predefined schema • Key-Value pair storage, Column Store,
Document Store, Graph databases• Eventual consistency rather ACID
property • Unstructured and unpredictable data• CAP Theorem • Prioritizes high performance, high
availability and scalability
BASE:Basically AvailableSoft StateEventual Consistency
ACID:AtomicConsistentIsolatedDurable
CAP:ConsistencyAvailabilityPartition Tolerance
11
CAP
• CAP theorem (Brewer’s Theorem) : Three basic requirements which exist in a special relation when designing applications for a distributed architecture.
– Consistency : This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.
– Availability : This means that the system is always on (service guarantee availability), no downtime.
– Partition Tolerance : This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another.
• It is theoretically impossible to fulfill all 3 requirements C, A and P• CAP provides the basic requirements for a distributed system to follow 2 of the
3 requirements. • Therefore all the current NoSQL database follow the different combinations of
the C, A, P from the CAP theorem.
12
BASE
• BASE system gives up on Consistency
– Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem.
– Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
– Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time.
13
MongoDB
14
MongoDB
• Open Source database written in C++.
• Document Oriented database
– Example format : FirstName="Arun", Address="St. Xavier's Road", Spouse=[{Name:"Kiran"}], Children=[{Name:"Rihit", Age:8}] –
• Used to store data for very high performance applications with unforeseen growth in data
– If load increases (more storage space, more processing power), it can be distributed to other nodes across computer networks (sharding)
• MongoDB supports Map/Reduce framework for batch processing of data and aggregation operation
– Map : A master node takes an input. Splits it into smaller sections. Sends it to the associated nodes. These nodes may perform the same operation in turn to send those smaller section of input to other nodes. It processes the problem (taken as input) and sends it back to the Master Node.
– Reduce : The master node aggregates those results to find the output.
15
RDBMS vs. MongoDB
RDBMS MongoDb
Record/Row Document/ObjectTable CollectionColumn Key, fieldValue ValueIndex IndexTable join embedded documents and linking
16
NoSQL operations in MongoDB
• Creating Table (Collections)
Other SQL Schema
CREATE TABLE users ( id MEDIUMINT NOT NULL AUTO_INCREMENT, user_id Varchar(30), age Number, status char(1), PRIMARY KEY (id)
)
MongoDb statement
db.users.insert( { user_id: "abc123", age: 55, status: "A" } )
Alternatively,
db.createCollection("users")
In MongoDB, collections are implicitly created on first insert() operation. The primary key _id is automatically added if _id field is not specified.
Reference:See insert() and db.createCollection() for more information.
17
NoSQL operations in MongoDB
• Altering Table (Collections)
Other SQL Schema
Adding a ColumnALTER TABLE users
ADD join_date DATETIME
Dropping a ColumnALTER TABLE users DROP column join_date
MongoDb statementAdding a fielddb.users.update( { }, { $set: { join_date: new () } }, { multi: true }
)
Dropping a fielddb.users.update( { }, { $unset: { join_date: “” } }, { multi: true } )
Collections do not describe or enforce the structure of its documents; i.e. there is no structural alteration at the collection level for adding/removing fields
Reference:See the Data Modeling Considerations for MongoDB Applications, update(), $setand $unset for more information
18
NoSQL operations in MongoDB
• INSERT and SELECT operations
Other SQL SchemaInserting dataINSERT INTO users(user_id, age,status) VALUES (“abc001", 35,
“U")
SELECT operationSELECT * FROM users WHERE status = "A"
MongoDb statementInserting data
db.users.insert( { user_id: “abc001", age: 35, status: “U" } )
Find operationdb.users.find( { status: "A" }
)
Reference:
See insert() and find() for more information
Use pretty() to display data in formatted way: db.users.find().pretty();
19
NoSQL operations in MongoDB
• UPDATE and DELETE operations
Other SQL SchemaUPDATE
UPDATE users SET status = "C"
WHERE age > 10
UPDATE usersSET age = age + 5 WHERE status = “U"
DELETE
DELETE FROM usersWHERE status = "D"
MongoDb statementUPDATE
db.users.update( { age: { $gt: 10 } }, { $set: { status: "C" } }, { multi: true } )
db.users.update( { status: “U" } , { $inc: { age: 5 } }, { multi: true } )
REMOVE
db.users.remove( { status: "D" } )
Reference:
See update(), $gt, $inc , $set and remove() for more information
20
More examples in MongoDB
• Run the database(Windows):
– Open Command prompt
– Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-10-04/bin
– Run mongod.exe
• Connect to the database:
– Open Command prompt
– Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-10-04/bin
– Run mongo
– A mongo “shell” will open
• Show database: show dbs
• Select a database: use <database name>
21
More examples in MongoDB..2
• Switch to database testData (use testData;)
• Task I: Insert data directly : The following operation inserts a row/document in CollectionstestData
– db.testData.insert({ name : "OtherDB" } );
• Task 2: Insert data with JavaScript operations : The following operation inserts 2 rows/ documents in Collections testData
j = { name : "mongo" }
k = { x : 3 }
db.testData.insert( j );
db.testData.insert( k );
• Task 3: Check to see that the 3 records are inserted in the collections testData
– db.testData.find();
22
More examples in MongoDB..3
Inserting multiple documents using a For loop
• Task : use the following loop from the mongo shell
– for (var i = 1; i <= 25; i++) db.testData.insert( { x : i } )
• Use find() to see the result. 25 records will be shown
– db.testData.find()
Note: If the collection and database do not exist, MongoDB creates them implicitly before inserting documents.
23
More examples in MongoDB..4
Queries with conditions
• Task : In the above example, 25 rows were created. We want to show the rows there x is less than 15. We also want to limit to first 5 rows in the display.
– db.testData.find( { "x": { $lt: 15 } }).limit(5)
Condition: x<15
Limit to 5 rows
24
More examples in MongoDB..5
Inserting with explicit “id”
• Task : Insert a record in collections named “inventory” with explicit id, type and quantity.
– db.inventory.insert( { _id: 10, type: "misc", item: "card", qty: 15 } );
Inserting with update() method
• Call the update() method with the upsert flag to create a new document if no document matches the update’s query criteria. .
–db.inventory.update(
{ type: "book", item : "journal" },
{ $set : { qty: 10 } },
{ upsert : true }
);
The above example creates a new document if no document in the inventory collection contains { type: "books", item : "journal" } and assigns an unique ID
Explicit ID
25
More examples in MongoDB..6
Inserting with save() method
• To insert a document with the save() method, pass the method a document that does not contain the _id field or a document that contains an _id field that does not exist in the collection. .
– db.inventory.save( { type: "book", item: "notebook", qty: 40 } )
The above example creates a new document in the inventory collection , adds the ID field and assigns an unique ID
26
More examples in MongoDB..6
Conditional queries
• Task: Select all documents in the inventory collection where the value of the type field is either 'food' or 'snacks‘
–db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } );
• Task: “AND” condition- specifying an equality match on the field food AND a less than ($lt) comparison match on the field price
–db.inventory.find( { type: 'food', price: { $lt: 9.95 } } );
• Task: “OR” condition- the query document selects all documents in the collection where the field qty has a value greater than ($gt) 100 OR the value of the price field is less than ($lt) 9.95
–db.inventory.find(
{ $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] }
);
27
More examples in MongoDB..7
Compound queries (using “AND” and “OR” both)
• Task: Select all documents in the collection where the value of the type field is 'food' and either the qty has a value greater than ($gt) 100 or the value of the price field is less than ($lt) 9.95:
–db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } },
{ price: { $lt: 9.95 } } ]
} );
28
More examples in MongoDB..8
Matching on “subdocuments”
• When the field holds an embedded document (i.e. subdocument), we can either specify the entire subdocument as the value of a field, or “reach into” the subdocument using “dot” notation, to specify values for individual fields in the subdocument.
• In the following example, the query matches all documents where the value of the field producer is a subdocument that contains only the field company with the value 'ABC123' and the field address with the value '123 Street', in the exact order:
–db.inventory.find(
{ producer: { company: 'ABC123', address: '123 Street' }
});
• In the following example, the query uses the dot notation to match all documents where the value of the field producer is a subdocument that contains a field company
with the value 'ABC123' and may contain other fields
–db.inventory.find( { 'producer.company': 'ABC123' } );
29
More examples in MongoDB..9
Matching on Arrays• To specify equality match on an array, use the query document { <field>: <value> }
where <value> is the array to match. Equality matches on the array require that the array field match exactly the specified <value>, including the element order.
• Exact Match: In the following example, the query matches all documents where the value of the field tags is an array that holds exactly three elements, 'fruit', 'food', and 'citrus', in this order:
– db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } );
• Matching Array Elements: In the following example, the query matches all documents where the value of the field tags is an array that contains 'fruit' as one of its elements:
–db.inventory.find( { tags: 'fruit' } );
• In the following example, the query uses the dot notation to match all documents where the value of the tags field is an array whose first element equals 'fruit‘.
–db.inventory.find( { 'tags.0' : 'fruit' } )
30
More examples in MongoDB..10
Array of subdocuments
• Match a Field in the Subdocument Using the Array Index :The following example selects all documents where the memos contains an array whose first element (i.e. index is 0) is a subdocument with the field by with the value 'shipping':
–db.inventory.find( { 'memos.0.by': 'shipping' } )
.
• Match a Field without specifying Array Index: The following example selects all documents where the memos field contains an array that contains at least one subdocument with the field by with the value 'shipping':
–db.inventory.find( { 'memos.by': 'shipping' } )
• Match multiple Fields: The following example uses dot notation to query for documents where the value of the memos field is an array that has at least one subdocument that contains the field memo equal to 'on time' and the field by equal to 'shipping':
–db.inventory.find( { 'memos.memo': 'on time', 'memos.by': 'shipping' } )
31
More examples in MongoDB..11
Using findOne()
db.collection.findOne(<criteria>, <projection>)• The above returns one document that satisfies the specified query criteria. If multiple
documents satisfy the query, this method returns the first document according to the natural order which reflects the order of documents on the disk.
• The <projection> parameter takes a document in the following form
–{ field1: <boolean>, field2: <boolean> ... }
–Boolean can be 1(true, to include) or 0(false, to exclude)
• Example: Create a collection named bios with multiple fields. Return “name”,“contribs” and “_id” fields:
db.bios.findOne(
{ },
{ name: 1, contribs: 1 }
)
32
Exercise I
• Go to database “test”
•Insert data in a collection named userdetails with the following attributes“user_id" : "ABCDBWN","password" :"ABCDBWN" ,"date_of_join" : "15/10/2010" ,"education" :"B.C.A." , "profession" : "DEVELOPER","interest" : "MUSIC","community_name" :["MODERN MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR. JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" : ["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]});
• View the inserted data using find() and pretty()
•Insert another set of data in the same collection with the following–{"user_id" : "testuser","password" :"testpassword" ,"date_of_join" : "16/10/2010" ,"education" :"M.C.A." , "profession" : "CONSULTANT","interest" : "MUSIC","community_name" :["MODERN MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR. JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" : ["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]}
33
Exercise I..contd
•Use update() to change password to “Newpd” and date_of_join to 12/12/2010 for user id "ABCDBWN”
•Fetch only the "user_id" for all documents from the collection 'userdetails' which hold the educational qualification "M.C.A
•Fetch the "user_id" , "password" and "date_of_join" for all documents from the collection 'userdetails' which hold the educational qualification "M.C.A."
•Remove one record from collection userdetails where userid= testuser
•Remove the entire collection userdetails using drop()