Top Banner
MongoDB Danny Jackowitz SE521 4/10/13
34

MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Aug 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

MongoDBDanny JackowitzSE5214/10/13

Page 2: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

What is MongoDB?

● NoSQL database management system (DBMS)

● Humongous○ => Intended for large datasets

● Document-oriented● Developed by 10gen● Started in 2007● Open-sourced in 2009● Production-ready as of version 1.4 (now 2.4)

Page 3: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

NoSQL or NoSQL?

● NoSQL popular buzzword● "No SQL" or "Not only SQL"?

○ Most NoSQL DBMSs allow you to execute SQL (or close to it) commands■ Ex. Cassandra Query Language

○ MongoDB does NOT!■ Takes a completely different approach

Page 4: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

DBMS Showdown

RDBMS vs. MongoDB

Page 5: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Round 1: Schemas

● RDBMS○ Explicitly define schema before inserting data

● MongoDB○ Schema implicitly created on first insert○ "_id" primary key automatically generated if not

specified○ Just throw data at Mongo, it can handle it!

CREATE TABLE stuff (id int PRIMARY KEY,some_data varchar(64)

)

Page 6: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Round 2: Tables

● RDBMS○ Tables store rows of data○ Data is organized by column

■ All rows in a table have same column structure

● MongoDB○ Collections store documents of data○ Data is organized by fields

■ Documents in a collection need not have identical fields

Page 7: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Round 3: Joins

● RDBMS

○ Returns a (logical) single table● MongoDB

○ No such concept○ Manual linking

■ Store _id of document within other document■ "Join" on the client

○ Embedded documents■ Denormalized data to remove need for join

table_1 JOIN table_2 ON table_1.a = table_2.b

Page 8: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Round 4: Transactions

● RDBMS

● MongoDB○ Atomic operations within a single document○ No multi-document commit with rollback

BEGIN;-- Do some stuffCOMMIT;

Page 9: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

MongoDB Query Language

● No SQL!● BSON

○ Binary JSON○ JSON == JavaScript Object Notation

■ Key-value pairs{ _id: ObjectId("5099803df3f4948bd2f98391"), name: { first: "Alan", last: "Turing" }, birth: new Date('Jun 23, 1912'), death: new Date('Jun 07, 1954'), contribs: ["Turing machine", "Turing test"], views : NumberLong(1250000)}

Page 10: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Inserting Documentsrecord = { _id : 1, name : "mongo" }db.records.insert( record )

db.records.insert({_id : 2, name : "mongo"})

// batch insert using JavaScriptfor (var i = 1; i <= 20; i++) {

db.records.insert( { x : i } )}

Page 11: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Retrieving Documents// find alldb.records.find()// find specific (WHERE)db.records.find( { name : "mongo" } )

var cursor = db.records.find()while ( cursor.hasNext() ) { printjson( cursor.next() )}printjson( cursor[0] )

Page 12: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Updating Documentsdb.records.update({_id : 1}, { $set : { name : "mongodb" }})

db.records.update({_id : 2}, { $unset : { name : "ignored" }})

var r = db.records.find({name : "mongodb"})r[0]["name"] = "mongo"db.records.save(r[0])

Page 13: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Deleting Documents

// delete specific documentsdb.records.remove({name:"mongo"})

// delete all documentsdb.records.remove()

// delete collectiondb.records.drop()

Page 14: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Aggregation Framework

● db.collection.aggregate(...)● Uses a pipeline system

○ Works like the UNIX pipeline○ ls | grep "text" | more

db.collection.aggregate( { $op1 : val1 }, { $op2 : val2 }, { $op3 : val3 },);

Page 15: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Aggregation Framework

● $project○ Include fields from the original document○ Insert computed fields○ Rename fields○ Create and populate fields that hold sub-documents

db.zips.aggregate({ $project : { city : 1, state : 1, _id : 0 }})

Page 16: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Aggregation Framework

● $match○ Can work with implied equality or any of comparison

operators■ ==, !=, >, <, >=, <=

db.zips.aggregate( { $match : {pop : 8000}})db.zips.aggregate( { $match : { pop : { $gt : 80000, $lte : 82000 }}})

Page 17: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Aggregation Framework

● $limit○ Restricts the number of documents that pass

through pipeline at this point

db.zips.aggregate( {$match : { pop : { $gt : 80000, $lte : 82000 }}}, {$limit : 2})

Page 18: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Aggregation Framework

● $unwind○ Peels off the elements of an array individually○ Returns one document for every member of the

unwound arraydb.zips.aggregate( {$limit : 1}, {$project : { city : 1, state : 1, loc : 1, _id : 0 }}, {$unwind : "$loc" })

Page 19: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Aggregation Framework

● $group○ Groups documents together for the purpose of

calculating aggregate values based on a collection of documents

db.zips.aggregate( { $group : { _id : "$state", totalPop : { $sum : "$pop" }, avgPop : { $avg : "$pop"} }})

Page 20: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Aggregation Framework

● $sort○ Obvious...○ 1 ascending, -1 descending

db.zips.aggregate( { $sort : { state : 1, pop: -1 } })

Page 21: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

More complex queries?

● MongoDB provides MANY other functions that allow for complex queries to be executed efficiently.

● Craigslist○ Archiving (still RDBMS for active listings)○ 2+ billion listings!

● SourceForge○ All project and download pages

● Lots of gaming back ends○ Disney, EA○ Storing scores, stats, achievements, etc.

Page 22: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

What's the catch?

● MongoDB is designed for non-relational data● Faking relational loses efficiency

○ "Joining" on the client is slow● Embedded documents to preserve speed

○ De-normalizes data■ Consider books written by authors■ Each book document has own embedded copy of

author■ Author changes contact info■ Must update ALL books written by author!

Page 23: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Conclusion

● MongoDB is awesome for non-relational data○ Self-contained documents

● MongoDB is awesome for loosely structured data○ Each document in collection can have different

format● MongoDB is awesome for (mostly) static

data○ Throw all the data at it○ Normalization not as much of a concern○ Super fast queries with indices, etc.

● MongoDB is NOT a replacement for RDBMSs

Page 24: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Configuring a MongoDB Cluster

● MongoDB intended as a distributed system○ Different components run on different machines

● Three components○ mongod

■ --configsvr■ --replSet■ --shardsvr

○ mongos○ mongo

Page 25: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

mongod

● "MongoDB Daemon"● Primary daemon process● Runs on every machine acting as data store● Comparable to postgresql-server● Defaults to port 27017● Configuration server

○ Started with --configsvr○ Special instance that stores all metadata for cluster○ Defaults to port 27019

Page 26: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Replication

● Exact same data stored on multiple instances

● Primary vs. Secondary○ Only primary accepts writes - propagates to

secondaries○ Fully Consistent (by default)

■ All reads and writes go through single primary○ Asynchronous replication

● Failover○ If primary fails, secondaries elect new primary○ Must have at least 2 secondaries for voting to work

● --replSet [name]

Page 27: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Sharding

● Partitions collections○ Based on shard key

● Stores different portions on different machines○ Ex. Storing transaction records

■ 1/1/10 - 12/31/10 -> server1■ 1/1/11 - 12/31/11 -> server2■ ...

● Easy scaling - add more racks!● --shardsvr

○ Switches to port 27018

Page 28: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

mongos

● "MongoDB Shard"● Not a data store● Routing service for shards

○ Knows what data on what shard○ Directs request to appropriate shard

● To user/application looks same as single mongod instance○ Same interface as mongod○ Same default port (27017)○ Connect in same way

Page 29: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

mongo

● Interactive shell interface● Comparable to psql● JavaScript

○ Can use loops, conditionals, etc. in queries

Page 30: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Our Architecture

Page 31: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Our Architecture

● 4 machine cluster○ server1

■ mongod --configsvr (27019)■ mongod --shardsvr (27018)■ mongos (27017)

○ server2■ mongod --shardsvr --replSet rs0 (27018)

○ server3■ mongod --shardsvr --replSet rs0 (27018)

○ server4■ mongod --shardsvr --replSet rs0 (27018)

Page 32: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Starting Everything Up...server1: sudo -u mongodb mongod --configsvr sudo -u mongodb mongod --shardsvrserver2, server3, server4: sudo -u mongodb mongod --shardsvr --replSet rs0server1: mongos --configdb 134.198.169.41

Page 33: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

Setting Up Replication & Shardingserver2 (or 3 or 4): mongo --port 27018 rs.initiate() rs.add("134.198.169.43:27018") rs.add("134.198.169.44:27018") rs.conf()

server1: mongo sh.addShard("rs0/134.198.169.42:27018") sh.addShard("134.198.169.41:27018")

Page 34: MongoDB - cs.scranton.edubi/2014s-html/se521/MongoDB.pdf · Round 1: Schemas RDBMS Explicitly define schema before inserting data MongoDB Schema implicitly created on first insert

... And Watching It Worksh.enableSharding("test")sh.shardCollection("test.shardtest", { _id : 1 })

for (var i = 1; i <= 2000000; i++) { db.shardtest.insert( { _id : i, junk : "Some reasonably long text that will make this take up more space in the database and better illustrate sharding"})}