Top Banner
1 NoSQL & MongoDB Part I Arindam Chatterjee
33

Nosql part1 8th December

Jan 15, 2015

Download

Education

RuruChowdhury

Weekend Business Analytics Praxis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Nosql part1 8th December

1

NoSQL & MongoDB Part I

Arindam Chatterjee

Page 2: Nosql part1 8th December

2

Introduction to NoSQL

• NoSQL stands for “Not only SQL”

• NoSQL is

– SQL for non-relational database management system

– Different from traditional relational database system

– designed for distributed data storage that

• typically not requires fixed schema,

• avoid join operations and

• scale horizontally

Used by Facebook, Google and other applications requiring large volume of unstructured Web application data

Page 3: Nosql part1 8th December

3

History of NoSQL

• RDBMS systems have limitations with respect to the following

– Scalability,

– Parallelization

– Cost

• Example: Google that gets billions of requests a month across applications which are geographically distributed.

• The above led to research on the following concepts

– GFS: Distributed files System

– Chubby: Distributed coordination system

– MapReduce: Parallel execution system

– Big Data: Column oriented database

Page 4: Nosql part1 8th December

4

NoSQL..Where to use

• NoSQL is useful in the following cases

– Online stores and portals like amazon where the transaction of an individual should not “lock” a database or part of a database

– Where “committed” transaction is not critical (e.g. a buyer orders an item and someone else clicks for the same item at the same time, one of them may end up not getting the item if the same is the last piece left. An “apology” mail and refund can sort the matter.

– Cost

• NoSQL SHOULD NOT be used in the following cases

– Stock exchanges or banking where transactions are critical, cached or state data will just not work

– Other non financial transactions where completion of transactions are critical

Page 5: Nosql part1 8th December

5

Benefits of NoSQL

• Schemaless data representation:

– Almost all NoSQL implementations offer schema-less data representation. This means that we do not have to think too far ahead to define a structure and we can continue to evolve over time, including adding new fields or even nesting the data, for example, in case of JSON representation.

• Development time:

– reduced development time time because one doesn’t have to deal with complex SQL queries and joins

• Speed:

– NoSQL databases are much faster than relational databases

• Ability to plan ahead for scalability:

– The applications can be quite elastic, can handle sudden spikes of load.

– Provides horizontal scalability and partitioning to new servers

Page 6: Nosql part1 8th December

6

List of NoSQL Databases

• Document Based: MongoDB, CouchDB, RavenDB, Terrastore

• Key Value: Redis, Membase, Voldemort

• XML based: BaseX, eXist

• Column based: BigTable, Hadoop/HBase, Cassandra, SimpleDB, Cloudera

• Graph bases: Neo4J, FLockDB, InfiniteGraph

Page 7: Nosql part1 8th December

7

Storage Types for NoSQL Databases

• Column Oriented storage– Data stored as columns as opposed to rows (in traditional RDBMS)– Used for On Line Analytical Processing type databases

• Example: We want to store the following information

• Advantages: – New column can be added without worrying for filling up default values of existing rows

Efficient for computing maxima, minima, averages and sums, specifically on large datasets

Employee ID First Name Last Name Dept

1234 Asim Das HR3242 Noel David Marketing5678 Raj Malhotra Production4543 Rohan Singh R&D

Traditional RDBMS Approach

• Data serialized as follows1234, Asim, Das, HR3242, Noel, David, Marketing5678, Raj, Malhotra, Production4543, Rohan, Singh, R&D

Column Oriented Approach

• Data stored as follows1234, 3242, 5678, 4543Asim, Noel, Raj, RohanDas,David, Malhotra, SinghHR, Marketing, Production, R&D

Page 8: Nosql part1 8th December

8

Storage Types for NoSQL Databases..2• Document Oriented storage

– Allows the inserting, retrieving, and manipulating of semi-structured data– Documents themselves act as records (or rows)– two records may have completely different set of fields or columns– The records may or may not adhere to a specific schema– Most of the databases available under this category use XML, JSON, BSON data types

• Example: Different record contain different level of Employee information as follows

{"EmployeeID": "SM1","FirstName" : "Anuj","LastName" : "Sharma","Age" : 45,"Salary" : 10000000}

{"EmployeeID": "MM2","FirstName" : "Anand","Age" : 34,"Salary" : 5000000,"Address" : {"Line1" : "123, 4th Street","City" : "Bangalore","State" : "Karnataka"},"Projects" : ["nosql-migration","top-secret-007"]}

Record 1 Record 2

Page 9: Nosql part1 8th December

9

Storage Types for NoSQL Databases..3• Key value storage

– Similar to document oriented storage with the following differences– Unlike a document store that can create a key when a new document is inserted, a key-value

store requires the key to be specified– Unlike a document store where the value can be indexed and queried, for a key-value store,

the value is opaque and as such, the key must be known to retrieve the value

• Advantages:– Key-value stores are optimized for querying against keys. – They serve great in-memory caches.

Page 10: Nosql part1 8th December

10

RDBMS vs NoSQL

RDBMS• Structured and organized data• Structured query language (SQL)• Data and its relationships are stored in

separate tables. • Follows ACID rules• Data Manipulation Language, Data

Definition Language • Tight Consistency

NoSQL

• No declarative query language• No predefined schema • Key-Value pair storage, Column Store,

Document Store, Graph databases• Eventual consistency rather ACID

property • Unstructured and unpredictable data• CAP Theorem • Prioritizes high performance, high

availability and scalability

BASE:Basically AvailableSoft StateEventual Consistency

ACID:AtomicConsistentIsolatedDurable

CAP:ConsistencyAvailabilityPartition Tolerance

Page 11: Nosql part1 8th December

11

CAP

• CAP theorem (Brewer’s Theorem) : Three basic requirements which exist in a special relation when designing applications for a distributed architecture.

– Consistency : This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data.

– Availability : This means that the system is always on (service guarantee availability), no downtime.

– Partition Tolerance : This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another.

• It is theoretically impossible to fulfill all 3 requirements C, A and P• CAP provides the basic requirements for a distributed system to follow 2 of the

3 requirements. • Therefore all the current NoSQL database follow the different combinations of

the C, A, P from the CAP theorem.

Page 12: Nosql part1 8th December

12

BASE

• BASE system gives up on Consistency

– Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem.

– Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.

– Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time.

Page 13: Nosql part1 8th December

13

MongoDB

Page 14: Nosql part1 8th December

14

MongoDB

• Open Source database written in C++.

• Document Oriented database

– Example format : FirstName="Arun", Address="St. Xavier's Road", Spouse=[{Name:"Kiran"}], Children=[{Name:"Rihit", Age:8}] –

• Used to store data for very high performance applications with unforeseen growth in data

– If load increases (more storage space, more processing power), it can be distributed to other nodes across computer networks (sharding)

• MongoDB supports Map/Reduce framework for batch processing of data and aggregation operation

– Map : A master node takes an input. Splits it into smaller sections. Sends it to the associated nodes. These nodes may perform the same operation in turn to send those smaller section of input to other nodes. It processes the problem (taken as input) and sends it back to the Master Node.

– Reduce : The master node aggregates those results to find the output.

Page 15: Nosql part1 8th December

15

RDBMS vs. MongoDB

RDBMS MongoDb

Record/Row Document/ObjectTable CollectionColumn Key, fieldValue ValueIndex IndexTable join embedded documents and linking

Page 16: Nosql part1 8th December

16

NoSQL operations in MongoDB

• Creating Table (Collections)

Other SQL Schema

CREATE TABLE users ( id MEDIUMINT NOT NULL AUTO_INCREMENT, user_id Varchar(30), age Number, status char(1), PRIMARY KEY (id)

)

MongoDb statement

db.users.insert( { user_id: "abc123", age: 55, status: "A" } )

Alternatively,

db.createCollection("users")

In MongoDB, collections are implicitly created on first insert() operation. The primary key _id is automatically added if _id field is not specified.

Reference:See insert() and db.createCollection() for more information.

Page 17: Nosql part1 8th December

17

NoSQL operations in MongoDB

• Altering Table (Collections)

Other SQL Schema

Adding a ColumnALTER TABLE users

ADD join_date DATETIME

Dropping a ColumnALTER TABLE users DROP column join_date

MongoDb statementAdding a fielddb.users.update( { }, { $set: { join_date: new () } }, { multi: true }

)

Dropping a fielddb.users.update( { }, { $unset: { join_date: “” } }, { multi: true } )

Collections do not describe or enforce the structure of its documents; i.e. there is no structural alteration at the collection level for adding/removing fields

Reference:See the Data Modeling Considerations for MongoDB Applications, update(), $setand $unset for more information

Page 18: Nosql part1 8th December

18

NoSQL operations in MongoDB

• INSERT and SELECT operations

Other SQL SchemaInserting dataINSERT INTO users(user_id, age,status) VALUES (“abc001", 35,

“U")

SELECT operationSELECT * FROM users WHERE status = "A"

MongoDb statementInserting data

db.users.insert( { user_id: “abc001", age: 35, status: “U" } )

Find operationdb.users.find( { status: "A" }

)

Reference:

See insert() and find() for more information

Use pretty() to display data in formatted way: db.users.find().pretty();

Page 19: Nosql part1 8th December

19

NoSQL operations in MongoDB

• UPDATE and DELETE operations

Other SQL SchemaUPDATE

UPDATE users SET status = "C"

WHERE age > 10

UPDATE usersSET age = age + 5 WHERE status = “U"

DELETE

DELETE FROM usersWHERE status = "D"

MongoDb statementUPDATE

db.users.update( { age: { $gt: 10 } }, { $set: { status: "C" } }, { multi: true } )

db.users.update( { status: “U" } , { $inc: { age: 5 } }, { multi: true } )

REMOVE

db.users.remove( { status: "D" } )

Reference:

See update(), $gt, $inc , $set and remove() for more information

Page 20: Nosql part1 8th December

20

More examples in MongoDB

• Run the database(Windows):

– Open Command prompt

– Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-10-04/bin

– Run mongod.exe

• Connect to the database:

– Open Command prompt

– Go to bin folder of Mongodb specific directory (e.g. mongodb-win32-x86_64-2008plus-v2.4-2013-10-04/bin

– Run mongo

– A mongo “shell” will open

• Show database: show dbs

• Select a database: use <database name>

Page 21: Nosql part1 8th December

21

More examples in MongoDB..2

• Switch to database testData (use testData;)

• Task I: Insert data directly : The following operation inserts a row/document in CollectionstestData

– db.testData.insert({ name : "OtherDB" } );

• Task 2: Insert data with JavaScript operations : The following operation inserts 2 rows/ documents in Collections testData

j = { name : "mongo" }

k = { x : 3 }

db.testData.insert( j );

db.testData.insert( k );

• Task 3: Check to see that the 3 records are inserted in the collections testData

– db.testData.find();

Page 22: Nosql part1 8th December

22

More examples in MongoDB..3

Inserting multiple documents using a For loop

• Task : use the following loop from the mongo shell

– for (var i = 1; i <= 25; i++) db.testData.insert( { x : i } )

• Use find() to see the result. 25 records will be shown

– db.testData.find()

Note: If the collection and database do not exist, MongoDB creates them implicitly before inserting documents.

Page 23: Nosql part1 8th December

23

More examples in MongoDB..4

Queries with conditions

• Task : In the above example, 25 rows were created. We want to show the rows there x is less than 15. We also want to limit to first 5 rows in the display.

– db.testData.find( { "x": { $lt: 15 } }).limit(5)

Condition: x<15

Limit to 5 rows

Page 24: Nosql part1 8th December

24

More examples in MongoDB..5

Inserting with explicit “id”

• Task : Insert a record in collections named “inventory” with explicit id, type and quantity.

– db.inventory.insert( { _id: 10, type: "misc", item: "card", qty: 15 } );

Inserting with update() method

• Call the update() method with the upsert flag to create a new document if no document matches the update’s query criteria. .

–db.inventory.update(

{ type: "book", item : "journal" },

{ $set : { qty: 10 } },

{ upsert : true }

);

The above example creates a new document if no document in the inventory collection contains { type: "books", item : "journal" } and assigns an unique ID

Explicit ID

Page 25: Nosql part1 8th December

25

More examples in MongoDB..6

Inserting with save() method

• To insert a document with the save() method, pass the method a document that does not contain the _id field or a document that contains an _id field that does not exist in the collection. .

– db.inventory.save( { type: "book", item: "notebook", qty: 40 } )

The above example creates a new document in the inventory collection , adds the ID field and assigns an unique ID

Page 26: Nosql part1 8th December

26

More examples in MongoDB..6

Conditional queries

• Task: Select all documents in the inventory collection where the value of the type field is either 'food' or 'snacks‘

–db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } );

• Task: “AND” condition- specifying an equality match on the field food AND a less than ($lt) comparison match on the field price

–db.inventory.find( { type: 'food', price: { $lt: 9.95 } } );

• Task: “OR” condition- the query document selects all documents in the collection where the field qty has a value greater than ($gt) 100 OR the value of the price field is less than ($lt) 9.95

–db.inventory.find(

{ $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] }

);

Page 27: Nosql part1 8th December

27

More examples in MongoDB..7

Compound queries (using “AND” and “OR” both)

• Task: Select all documents in the collection where the value of the type field is 'food' and either the qty has a value greater than ($gt) 100 or the value of the price field is less than ($lt) 9.95:

–db.inventory.find( { type: 'food', $or: [ { qty: { $gt: 100 } },

{ price: { $lt: 9.95 } } ]

} );

Page 28: Nosql part1 8th December

28

More examples in MongoDB..8

Matching on “subdocuments”

• When the field holds an embedded document (i.e. subdocument), we can either specify the entire subdocument as the value of a field, or “reach into” the subdocument using “dot” notation, to specify values for individual fields in the subdocument.

• In the following example, the query matches all documents where the value of the field producer is a subdocument that contains only the field company with the value 'ABC123' and the field address with the value '123 Street', in the exact order:

–db.inventory.find(

{ producer: { company: 'ABC123', address: '123 Street' }

});

• In the following example, the query uses the dot notation to match all documents where the value of the field producer is a subdocument that contains a field company

with the value 'ABC123' and may contain other fields

–db.inventory.find( { 'producer.company': 'ABC123' } );

Page 29: Nosql part1 8th December

29

More examples in MongoDB..9

Matching on Arrays• To specify equality match on an array, use the query document { <field>: <value> }

where <value> is the array to match. Equality matches on the array require that the array field match exactly the specified <value>, including the element order.

• Exact Match: In the following example, the query matches all documents where the value of the field tags is an array that holds exactly three elements, 'fruit', 'food', and 'citrus', in this order:

– db.inventory.find( { tags: [ 'fruit', 'food', 'citrus' ] } );

• Matching Array Elements: In the following example, the query matches all documents where the value of the field tags is an array that contains 'fruit' as one of its elements:

–db.inventory.find( { tags: 'fruit' } );

• In the following example, the query uses the dot notation to match all documents where the value of the tags field is an array whose first element equals 'fruit‘.

–db.inventory.find( { 'tags.0' : 'fruit' } )

Page 30: Nosql part1 8th December

30

More examples in MongoDB..10

Array of subdocuments

• Match a Field in the Subdocument Using the Array Index :The following example selects all documents where the memos contains an array whose first element (i.e. index is 0) is a subdocument with the field by with the value 'shipping':

–db.inventory.find( { 'memos.0.by': 'shipping' } )

.

• Match a Field without specifying Array Index: The following example selects all documents where the memos field contains an array that contains at least one subdocument with the field by with the value 'shipping':

–db.inventory.find( { 'memos.by': 'shipping' } )

• Match multiple Fields: The following example uses dot notation to query for documents where the value of the memos field is an array that has at least one subdocument that contains the field memo equal to 'on time' and the field by equal to 'shipping':

–db.inventory.find( { 'memos.memo': 'on time', 'memos.by': 'shipping' } )

Page 31: Nosql part1 8th December

31

More examples in MongoDB..11

Using findOne()

db.collection.findOne(<criteria>, <projection>)• The above returns one document that satisfies the specified query criteria. If multiple

documents satisfy the query, this method returns the first document according to the natural order which reflects the order of documents on the disk.

• The <projection> parameter takes a document in the following form

–{ field1: <boolean>, field2: <boolean> ... }

–Boolean can be 1(true, to include) or 0(false, to exclude)

• Example: Create a collection named bios with multiple fields. Return “name”,“contribs” and “_id” fields:

db.bios.findOne(

{ },

{ name: 1, contribs: 1 }

)

Page 32: Nosql part1 8th December

32

Exercise I

• Go to database “test”

•Insert data in a collection named userdetails with the following attributes“user_id" : "ABCDBWN","password" :"ABCDBWN" ,"date_of_join" : "15/10/2010" ,"education" :"B.C.A." , "profession" : "DEVELOPER","interest" : "MUSIC","community_name" :["MODERN MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR. JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" : ["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]});

• View the inserted data using find() and pretty()

•Insert another set of data in the same collection with the following–{"user_id" : "testuser","password" :"testpassword" ,"date_of_join" : "16/10/2010" ,"education" :"M.C.A." , "profession" : "CONSULTANT","interest" : "MUSIC","community_name" :["MODERN MUSIC", "CLASSICAL MUSIC","WESTERN MUSIC"],"community_moder_id" : ["MR. BBB","MR. JJJ","MR MMM"],"community_members" : [500,200,1500],"friends_id" : ["MMM123","NNN123","OOO123"],"ban_friends_id" :["BAN123","BAN456","BAN789"]}

Page 33: Nosql part1 8th December

33

Exercise I..contd

•Use update() to change password to “Newpd” and date_of_join to 12/12/2010 for user id "ABCDBWN”

•Fetch only the "user_id" for all documents from the collection 'userdetails' which hold the educational qualification "M.C.A

•Fetch the "user_id" , "password" and "date_of_join" for all documents from the collection 'userdetails' which hold the educational qualification "M.C.A."

•Remove one record from collection userdetails where userid= testuser

•Remove the entire collection userdetails using drop()