Top Banner
NoSQL & MongoDB..Part II Arindam Chatterjee
28

Nosql part 2

May 12, 2015

Download

Education

RuruChowdhury

Praxis Weekend Analytics
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Nosql part 2

NoSQL & MongoDB..Part II

Arindam Chatterjee

Page 2: Nosql part 2

Indexes in MongoDB

• Indexes support the efficient resolution of queries in MongoDB.

–Without indexes, MongoDB must scan every document in a collection to select

those documents that match the query statement.

–These collection scans are inefficient and require the mongod to process a large

volume of data for each operation.

• Indexes are special data structures that store a small portion of the

collection’s data set in an easy to traverse form.

–The index stores the value of a specific field or set of fields, ordered by the value of

the field.

• Indexes in MongoDB are similar to indexes in other database systems.

• MongoDB defines indexes at the collection level and supports indexes on

any field or sub-field of the documents in a MongoDB collection.

Page 3: Nosql part 2

Indexes in MongoDB..2

• The following diagram illustrates a query that selects documents using an

index.

MongoDB narrows the query by scanning the range of documents with values of score

less than 30.

Page 4: Nosql part 2

Indexes in MongoDB..3

• MongoDB can use indexes to return documents sorted by the index key

directly from the index without requiring an additional sort phase.

Descending

Page 5: Nosql part 2

Indexes in MongoDB..4

Index Types

• Default _id

–All MongoDB collections have an index on the _id field that exists by default. If

applications do not specify a value for _id the driver or the mongod will create an

_id field with an ObjectID value.

–The _id index is unique, and prevents clients from inserting two documents with

the same value for the _id field.

• Single Field

–MongoDB supports user-defined indexes on a single field of a document.

Example: Index on score filed (ascending)

Page 6: Nosql part 2

Indexes in MongoDB..5

Index Types

• Compound Index

–These are user-defined indexes on multiple fields

Example: Diagram of a compound index on the userid field (ascending) and the

score field (descending). The index sorts first by the userid field and then by the

score field.

Page 7: Nosql part 2

Indexes in MongoDB..6

Index Types

• Multikey Index

–MongoDB uses multikey indexes to index the content stored in arrays.

–If we index a field that holds an array value, MongoDB creates separate index

entries for every element of the array.

–These multikey indexes allow queries to select documents that contain arrays by

matching on element or elements of the arrays.

–MongoDB automatically determines whether to create a multikey index if the

indexed field contains an array value; we do not need to explicitly specify the

multikey type.

Page 8: Nosql part 2

Indexes in MongoDB..7

Index Types

• Multikey Index: Illustration

Diagram of a multikey index on the addr.zip field.

The addr field contains an array of address

documents. The address documents contain the

zip field.

Page 9: Nosql part 2

Indexes in MongoDB..8

Other Index Types

• Geospatial Index

– MongoDB provides two special indexes: 2d indexes that uses planar geometry

when returning results and 2sphere indexes that use spherical geometry to

return results.

• Text Index

– MongoDB provides a beta text index type that supports searching for string

content in a collection.

– These text indexes do not store language-specific stop words (e.g. “the”, “a”,

“or”) and stem the words in a collection to only store root words.

• Hashed Index

– To support hash based sharding, MongoDB provides a hashed index type,

which indexes the hash of the value of a field. These indexes have a more

random distribution of values along their range, but only support equality

matches and cannot support range-based queries.

Page 10: Nosql part 2

Indexes in MongoDB..9

Explicit creation of Index

• Using ensureIndex() from shell

– The following creates an index on the phone-number field of the people collection

• db.people.ensureIndex( { "phone-number": 1 } ) .

– The following operation will create an index on the item, category, and price fields of the products collection

• db.products.ensureIndex( { item: 1, category: 1, price: 1 } )

– unique constraint prevent applications from inserting documents that have duplicate values for the inserted fields. The following example creates a unique index on the "tax-id": of the accounts collection to prevent storing multiple account records for the same legal entity

• db.accounts.ensureIndex( { "tax-id": 1 }, { unique: true } )

– ensureIndex() only creates an index if an index of the same specification does not already exist.

Page 11: Nosql part 2

Indexes in MongoDB..10

Indexing Strategies

• Create Indexes to Support Your Queries

– An index supports a query when the index contains all the fields scanned by the query. Creating indexes that supports queries results in greatly increased query performance.

• Use Indexes to Sort Query Results

– To support efficient queries, use the strategies here when you specify the sequential order and sort order of index fields.

• Ensure Indexes Fit in RAM

– When your index fits in RAM, the system can avoid reading the index from disk and you get the fastest processing.

• Create Queries that Ensure Selectivity

– Selectivity is the ability of a query to narrow results using the index. Selectivity allows MongoDB to use the index for a larger portion of the work associated with fulfilling the query.

Page 12: Nosql part 2

Indexes in MongoDB..11

• Indexes to Support Queries

– For commonly issued queries, create indexes. If a query searches multiple

fields, create a compound index. Scanning an index is much faster than

scanning a collection.

– Consider a posts collection containing blog posts, and if we need to regularly

issue a query that sorts on the author_name field, then we can optimize the

query by creating an index on the author_name field

• db.posts.ensureIndex( { author_name : 1 } )

– If we regularly issue a query that sorts on the timestamp field, then we can

optimize the query by creating an index on the timestamp field

• db.posts.ensureIndex( { timestamp : 1 } )

If we want to limit the results to reduce network load, we can use limit()

• db.posts.find().sort( { timestamp : -1 } ).limit(10) [

Page 13: Nosql part 2

Indexes in MongoDB..12

• Index Administration

– Detailed information about indexes is stored in the system.indexes collection of

each database.

– system.indexes is a reserved collection, so we cannot insert documents into it

or remove documents from it. We can manipulate its documents only through

ensureIndex and the dropIndexes database command.

• Running Index at Background

– Building indexes is time-consuming and resource-intensive. Using the

{"background" : true} option builds the index in the background, while handling

incoming requests.

• > db.people.ensureIndex({"username" : 1}, {"background" : true})

– If we do not include the “background” option, the database will block all other

requests while the index is being built.

– Creating indexes on existing documents is faster than creating the index first

and then inserting all of the documents.

Page 14: Nosql part 2

Indexes in MongoDB..12

• Do’s and Do not’s

– Create index only on the keys required for the query

• Indexes create additional overhead on the database

• Insert, Update and Delete operations become slow with too many idexes

– Index direction is important if there are more than one keys

• Index with {"username" : 1, "age" : -1} and {"username" : 1, "age" : 1} have different connotation

– There is a built-in maximum of 64 indexes per collection, which is more than

almost any application should need.

– Delete Index with “dropIndexes” if it is not required

– Sometimes the most efficient solution is actually not to use an index. In general,

if a query is returning a half or more of the collection, it will be more efficient for

the database to just do a table scan instead of having to look up the index and

then the value for almost every single document.

Page 15: Nosql part 2

Exercise 2

• Insert records in collection userdetail

– {"username" : "smith", "age" : 48, "user_id" : 0 }

– {"username" : "smith", "age" : 30, "user_id" : 1 }

– {"username" : "john", "age" : 36, "user_id" : 2 }

– {"username" : "john", "age" : 18, "user_id" : 3 }

– {"username" : "joe", "age" : 36, "user_id" : 4 }

– {"username" : "john", "age" : 7, "user_id" : 5 }

– {"username" : "simon", "age" : 3, "user_id" : 6 }

– {"username" : "joe", "age" : 27, "user_id" : 7 }

– {"username" : "jacob", "age" : 17, "user_id" : 8 }

– {"username" : "sally", "age" : 52, "user_id" : 9 }

– {"username" : "simon", "age" : 59, "user_id" : 10 }

• Run the ensureIndex operation

– db.userdetail.ensureIndex({"username" : 1, "age" : -1})

Page 16: Nosql part 2

Data Modelling in MongoDB

Page 17: Nosql part 2

Data Modelling in MongoDB

• MongoDB has flexible Schema unlike Relational Databases. We need not declare

Table’s schema before inserting data.

• MongoDB’s collections do not enforce document structure

• There are 2 ways of mapping Relationships

–References

–Embedded Documents

Example: References• Both the “contact” and

“access” documents

contain a reference to the

“user” document.

• These are normalized data models

Page 18: Nosql part 2

Data Modelling in MongoDB..2

Example: Embedded Documents“contact” and “access” are subdocuments embedded in main document. This is a “denormalized” data model

Page 19: Nosql part 2

Data Modelling in MongoDB..3

References vs. Embedded Documents

References: Used when

• embedding would result in

duplication of data but would not

provide sufficient read

performance advantages to

outweigh the implications of the

duplication.

• to represent more complex many-

to-many relationships.

• to model large hierarchical data

sets.

Embedded documents: Used when

• we have “contains” relationships

between entities.

• we have one-to-many relationships

between entities. In these

relationships the “many” or child

documents always appear with or

are viewed in the context of the

“one” or parent documents.

• We need applications to store

related pieces of information in the

same database record.

Page 20: Nosql part 2

Data Modelling in MongoDB..4

One to many relationships : Example where Embedding is advantageous

Using References{

_id: “chat",name: "ABC Chat"

}

{

patron_id: "chat",street: "10 Simla Street",

city: "Kolkata",

zip: 700006

}

{

patron_id: "chat",

street: "132 Lanka Street",

city: "Mumbai",

zip: 400032}

Issue with above: If the application frequently retrieves the address data with the name information, then your application needs to issue multiple queries to resolve the references

Using Embedded documents{

_id: "chat",name: "ABC Chat",

addresses: [

{

street: "10 Simla Street",

city: "Kolkata",zip: 700006

},

{

street: "132 Lanka Street",

zip: 400032}

]

}

With the embedded data model, the application can retrieve the complete patron information with one query.

Page 21: Nosql part 2

Data Modelling in MongoDB..5One to many relationships : Example where referencing is advantageous

Using Embedded documents{

title: "MongoDB: The Definitive Guide",author: [ "Kristina Chodorow", "Mike Dirolf" ],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English",

publisher: {name: "O'Reilly Media",

location: "CA",

}

}

{

title: "50 Tips and Tricks for MongoDB Developer",

author: "Kristina Chodorow",

published_date: ISODate("2011-05-06"),

pages: 68,language: "English",

publisher: {

name: "O'Reilly Media",

location: "CA",}

}

Issue with above: Embedding leads to repetition of publisher data.

Using Reference{

_id: "oreilly",name: "O'Reilly Media",

location: "CA"

}

{_id: 123456789,

title: "MongoDB: The Definitive Guide",

author: [ "Kristina Chodorow", "Mike Dirolf" ],

published_date: ISODate("2010-09-24"),

pages: 216,language: "English",

publisher_id: "oreilly"

}

{

_id: 234567890,title: "50 Tips and Tricks for MongoDB Developer",

author: "Kristina Chodorow",

published_date: ISODate("2011-05-06"),

pages: 68,language: "English",

publisher_id: "oreilly"

}

Publisher Information kept separately in the above example to avoid repetition.

Page 22: Nosql part 2

Data Modelling in MongoDB..6Tree structure with parent references

Page 23: Nosql part 2

Data Modelling in MongoDB..7

• The following lines of code describes the tree structure in previous slide

– db.categories.insert( { _id: "MongoDB", parent: "Databases" } )

– db.categories.insert( { _id: “dbm", parent: "Databases" } )

– db.categories.insert( { _id: "Databases", parent: "Programming" } )

– db.categories.insert( { _id: "Languages", parent: "Programming" } )

– db.categories.insert( { _id: "Programming", parent: "Books" } )

– db.categories.insert( { _id: "Books", parent: null } )

• The query to retrieve the parent of a node

– db.categories.findOne( { _id: "MongoDB" } ).parent;

• Query by the parent field to find its immediate children nodes

– db.categories.find( { parent: "Databases" } );

Modelling Tree structure with Parent reference

Page 24: Nosql part 2

Data Modelling in MongoDB..8

• The following lines of code describes the sametree structure

– db.categories.insert( { _id: "MongoDB", children: [] } );

– db.categories.insert( { _id: “dbm", children: [] } );

– db.categories.insert( { _id: "Databases", children: [ "MongoDB", “dbm" ] } );

– db.categories.insert( { _id: "Languages", children: [] } )

– db.categories.insert( { _id: "Programming", children: [ "Databases", "Languages" ] } );

– db.categories.insert( { _id: "Books", children: [ "Programming" ] } );

• The query to retrieve the immediate child of a node

– db.categories.findOne( { _id: "Databases" } ).children;

• Query by the child field to find its parent nodes

– db.categories.find( { children: "MongoDB" } );

Modelling Tree structure with Child reference

Page 25: Nosql part 2

Data Modelling in MongoDB..8

• Example (Online purchase portal):

– Step I: Insert data in a collection called “books” including the number of available copies

– Step II: Check if the book is available during checkout

Code

– Step I:

db.book.insert ({

_id: 123456789,

title: "MongoDB: The Definitive Guide",

author: [ "Kristina Chodorow", "Mike Dirolf" ],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English",

publisher_id: "oreilly",

available: 3,

checkout: [ { by: "joe", date: ISODate("2012-10-15") } ]

});

Data Modelling for “Atomic” operations

Page 26: Nosql part 2

Data Modelling in MongoDB..9

Code

– Step II

db.book.findAndModify ( {

query: {

_id: 123456789,

available: { $gt: 0 }

},

update: {

$inc: { available: -1 },

$push: { checkout: { by: "abc", date: new Date() } }

}

} );

– In the above example, db.collection.findAndModify() method is used to atomically determine if a book is available for checkout and update with the new checkout information.

– Embedding the available field and the checkout field within the same document ensures that the updates to these fields are in sync:

Data Modelling for “Atomic” operations

Page 27: Nosql part 2

Data Modelling in MongoDB..10

Example: Perform a keyword based search in a collection “volumes”

– Step I: Insert data in a collection “volumes”

db.volumes.insert ({

title : "Moby-Dick" ,

author : "Herman Melville" ,

published : 1851 ,

ISBN : 0451526996 ,

topics : [ "whaling" , "allegory" , "revenge" , "American" ,

"novel" , "nautical" , "voyage" , "Cape Cod" ]

});

In the above example, several topics are included on which we can perform keyword search

– Step II: create a multi-key index on the topics array

db.volumes.ensureIndex( { topics: 1 } )

– Step III: Search based on keyword “voyage”

• db.volumes.findOne( { topics : "voyage" }, { title: 1 } )

Keyword based Search

Page 28: Nosql part 2

Exercise

• Create a collection named product meant for albums. The album can have several

product types including Audio Album and Movie.

• Record of Audio album can be created with the following attributes

– Record 1 (music Album) sku (character, unique identifier), type-Audio Album ,title:”Remembering Manna De”, description “By Music lovers”, physical_description (weight, width, height, depth), pricing (list, retail, savings, pct_savings), details (title, artist,genre (“bengalimodern”, “bengali film”), tracks (“birth”, “childhood”, “growing up”, “end”)

– Record 2 (movie) with similar details and description pertaining to movie (e.g. director, writer, music director, actors)

• Assignment

– Write a query to return all products with a discount>10%

– Write a query which will return the documents for the albums of a specific genre, sorted in reverse chronological order

– Write a query which selects films that a particular actor starred in, sorted by issue date