MongoDB Workshop

Post on 10-May-2015

2714 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Workshop held at NYC Open Data Meetup

Transcript

MONGODB WORKSHOP{

meetup: “NYC Open Data”, presenters: [“Kannan Sankaran”, “Roman Kubiak”],host: “Vivian”,location: “ThoughtWorks”,audience: “You guys”

}

MONGODB WORKSHOP{

meetup: “NYC Open Data”, presenters: [“Kannan Sankaran”, “Roman Kubiak”],host: “Vivian is awesome, THANK YOU”,location: “ThoughtWorks is awesome, THANK YOU”,audience: “You guys are awesome, THANK YOU”

}

OUR TOPICSOVERVIEW OF DATABASES

WHAT IS MONGODB?

MONGODB, NOSQL, AND RELATIONAL DATABASES

A PEEK AT MONGODB COMMANDS

SHARDING AND REPLICATION IN MONGODB

FUTURE OF MONGODB AND US

DEMO

WORKSHOP

ARCHITECT

MONGO PIE

OVERVIEW OF DATABASES

ROWSCOLUMNS

TABLES

ORGANIZING DATA

DATA SPREAD OUT IN VARIOUS

TABLES

DATA MAY BE RELATED

1980s 1990s 2000s 20071970s

RELATIONAL DATABASES

(RDBMS) CREATED

CLIENT/SERVER MODEL

STRUCTURED QUERY LANGUAGE (SQL) CREATED

RDBMS CONTINUE TO BE POPULAR

INTERNET ARRIVES

INTERNET GROWS

NoSQL DATABASES EMERGE

MONGODB CREATED

DATABASES AND THEIR GROWTH

WHAT IS NoSQL?

A TWITTER HASHTAG#nosql

NOSQL GENERALLY REFERS TO DATABASES THAT DO NOT HAVE

A FIXED ROW-COLUMN DATA ORGANIZATION STRUCTURE.

WHAT IS MONGODB?

A HUMONGOUS NoSQL DB

DOCUMENTS NOT ROWSCOLLECTIONS NOT TABLES

A HUMONGOUS NoSQL DBWHERE DATA IS ORGANIZED BY

WHAT IS A DOCUMENT?

A DOCUMENT IS LIKE A ROW…

{_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”

}

…BUT IT IS MORE FLEXIBLE{

_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”,payments: {

car: “100.50”,hotel: “200”

}}

THAT LOOKS LIKE A DOCUMENT WITHIN ANOTHER DOCUMENT!

{_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”,payments: {

car: “100.50”,hotel: “200”

},tags: [“shirt”, “tie”]

}

WHAT IS THIS? MULTIPLE VALUES WITHIN A COLUMN?

HOW LARGE CAN THIS DOCUMENT BE?

{_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”,payments: {

car: “100.50”,hotel: “200”

}………

}

UP TO 16 MB

LEO TOLSTOY’S 1225-PAGE BOOK ON WAR AND PEACE CAN FIT IN 1 DOCUMENT, AS IT IS ONLY AROUND 3 MB.

WELL, ALMOST!

ISN’T THAT JSON?

WHAT IS JSON?

WEB SERVER

MONGODB DATABASE

{

“vehicle”: “Chevy Malibu 2014”,“price”: { “min”: 22340, “max”: 29950 },“citympg”: 25

}

{ “make”: “Chevy”,“model”: “Malibu”,“year”: 2014

}

WHAT IS JSON?

{

vehicle: “car”, make: “Malibu”,color: “blue”

}

JAVASCRIPT OBJECT NOTATION NAME-VALUE PAIRS

{ name: “Kannan”, gender: “male”,favorites: {

color: “blue”},interests: [“MongoDB”, “R”]

}

MONGODB DOCUMENT{

_id: ObjectID(“12AB34CD56EF”),name: “Kannan”,

gender: “male”,

favorites:

{

color: “blue”

},

interests: [“MongoDB”, “R”],

date: new Date()

}

WHAT IS A COLLECTION?

A GROUP OF DOCUMENTS

{_id: ObjectID(“12AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”

}{

_id: ObjectID(“78AB34CD56EF”),name: “Roman Ku”,orderDate: “2-1-2014”

}{

_id: ObjectID(“56AB34CD56EF”),name: “Eva Green”,orderDate: “2-1-2014”

}

{_id: ObjectID(“34AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”,tags: [“shirt”, “tie”]

}{

_id: ObjectID(“90AB34CD56EF”),name: “Roman Ku”,orderDate: “2-1-2014”,payments: { car: “100.50”, hotel: “200” }

}{

_id: ObjectID(“13AB34CD56EF”),name: “Eva Green”,orderDate: “2-1-2014”

}

{_id: ObjectID(“35AB34CD56EF”),name: “Ed Brown”,orderDate: “2-1-2014”

}{

_id: ObjectID(“79AB34CD56EF”),vehicle: “car”, make: “Malibu”,color: “blue”

}{

_id: ObjectID(“57AB34CD56EF”),name: “Eva Green”,orderDate: “2-1-2014”,tags: [“shirt”, “tie”]

}

SIMILAR DIFFERENT VERY DIFFERENT

MONGODB IS...

A DOCUMENT-ORIENTED NOSQL DATABASE WHERE DATA CONSISTS OF

DOCUMENTS STORED IN COLLECTIONS.

MONGODB FEATURES

EASY TO LEARNDYNAMIC QUERY LANGUAGE - SEARCH BY FIELDS, REGULAR EXPRESSIONS- USER-DEFINED JAVASCRIPT FUNCTIONS- AGGREGATION, INCLUDING MAP/REDUCEINDEXING – SINGLE, COMPOUND, GEOSPATIALREPLICATIONLOAD BALANCING USING SHARDINGGRIDFS TO STORE FILES

MONGODB USAGE

CONTENT MANAGEMENT SYSTEMSE-COMMERCE WEBSITESLOG DATA AND HIERARCHICAL AGGREGATIONREAL-TIME ANALYTICS

MONGODB, NOSQL, AND RELATIONAL DATABASES

1980s 1990s 2000s 20071970s

BERKELEY INGRES

ORACLE

INFORMIX

DB2

SYBASE

SQL SERVER

MS ACCESS

POSTGRESQL

MYSQL

NETEZZA

GREENPLUM

VERTICA

MARIADB

MONGODB

DATABASE MANAGEMENT SYSTEMS

MOST SYSTEMS USE SOME FLAVOR OF SQL

RELATIONAL DATABASES WERE / STILL ARE THE DEFACTO IN SEVERAL

COMPANIES.

RELATIONAL DATABASE FEATURESC.R.U.D. OPERATIONS

STRUCTURED QUERY LANGUAGE (SQL)

FIXED DATABASE SCHEMAS

NORMALIZATION

REFERENTIAL INTEGRITY(E.G. FOREIGN KEYS, CONSTRAINTS)

JOINS

TRANSACTIONS - A.C.I.D. PROPERTIES

INDEXES

IN THE LATE 90s/EARLY 2000s…

DOT COM BUBBLE

DOT COM BUST

WEB SERVICES

SOCIAL NETWORKS

GOOGLE, AMAZON

COMPUTER OWNERS/USERS

WEBSITE DATA COLLECTION

DATABASE SIZES

COMPUTING/STORAGE RESOURCES BECAME A

CHALLENGE FOR SMALLER COMPANIES LIKE GOOGLE AND

AMAZON THAT HAD LOTS OF DATA.

SCALE UP

MORE DISK SPACE

MORE RAM

MORE PROCESSORS

MORE EXPENSIVE

SINGLE POINT OF FAILURE

HARDWARE HAS LIMITS!

BIGGER MACHINE

SCALE OUT

LESS DISK SPACE

LESS RAM

LESS PROCESSORS

LESS EXPENSIVE

NO SINGLE POINT OF FAILURE

HIGHER RELIABILITY DESPITE FAILURE OF INDIVIDUAL MACHINES

SMALLER MACHINES

RELATIONAL DATABASES WERE DESIGNED TO OPERATE ON A

SINGLE MACHINE, AND SCALING OUT MEANT A LOT OF

CHALLENGES.

SPLITTING DATA FOR SCALE OUT

BY COLUMNS BY

ROWS

WORDPRESS MYSQL SCHEMA WITH 2 TABLES

A JOIN QUERY IN MYSQLWP_POSTS

SELECT p.post_author, p.post_date, c.comment_author, c.comment_dateFROM wp_posts AS p INNER JOIN wp_comments AS c ON p.ID = c.comment_post_IDWHERE p.ID = 1;

WP_COMMENTS

A JOIN QUERY IN MYSQLWP_POSTS WP_COMMENTS

RESULT

SCALE OUT DATA BY ROWSWP_POSTS

A

B

WP_COMMENTSC

D

HOW COMPLICATED

WOULD SCALING THIS

BE?

JOINS MAY GET REALLY MESSY WITH MANY MACHINES

(DISTRIBUTED JOINS)

TRANSACTIONSWP_POSTS

BEGIN TRANSACTIONTRY

DELETE FROM wp_comments AS cWHERE c.comment_post_ID = 1;

DELETE FROM wp_posts AS pWHERE p.ID = 1;

CATCHIF ERROR THEN ROLLBACK TRANSACTION

COMMIT TRANSACTIONEND TRANSACTION

WP_COMMENTS

MUST SATISFY A.C.I.D.

PROPERTIES

TRANSACTIONS MAY TAKE A LONG TIME TO EXECUTE IF DATA

IS ON DIFFERENT MACHINES (DISTRIBUTED TRANSACTIONS)

TO SPLIT THE DATA, A WHOLE BUNCH OF COMPROMISES

MUST BE MADE IN RELATIONAL DATABASES

THIS GAVE RISE TO NON-RELATIONAL SOLUTIONS

GOOGLEAMAZON

NoSQL SYSTEM CHARACTERISTICSC.R.U.D. OPERATIONS

STRUCTURED QUERY LANGUAGE (SQL)

FIXED DATABASE SCHEMAS

NORMALIZATION

REFERENTIAL INTEGRITY(E.G. FOREIGN KEYS, CONSTRAINTS)

JOINS

TRANSACTIONS – LIMITED A.C.I.D. PROPERTIES

INDEXES

OPEN SOURCE

HOW IS THIS SCALABILITY ACHIEVED IN MONGODB?

STACKING THE DATA

STACKING THE DATAWP_POSTS

WP_COMMENTS

NO NEED TO JOIN

{_id: 1,post_author: “Amy W”,post_date: “1/1/2014”,comments: [{

comment_author: “bestguy”,comment_date: “1/1/2014”

},{comment_author: “baddie”,comment_date: “1/10/2014”

},{comment_author: “clever24”,comment_date: “1/11/2014”

}]}

NOW, EACH DOCUMENT CAN BE IN A DIFFERENT MACHINE

WHAT ABOUT TRANSACTIONS?

MONGODB DOES NOT SUPPORT TRANSACTIONS

BUT SINGLE DOCUMENT UPDATE IS ATOMIC{

_id: 1,post_author: “Amy W”,post_date: “1/1/2014”,comments: [{

comment_author: “bestguy”,comment_date: “1/1/2014”

},{comment_author: “baddie”,comment_date: “1/10/2014”

},{comment_author: “clever24”,comment_date: “1/11/2014”

}]}

THE KEY IS TO FOCUS ONTHE DATA MODEL

MONGODB CHARACTERISTICSC.R.U.D. OPERATIONS

STRUCTURED QUERY LANGUAGE (SQL) DYNAMIC QUERY LANGUAGE

FIXED DATABASE SCHEMASFLEXIBLE DATABASE SCHEMAS

NORMALIZATION

REFERENTIAL INTEGRITY(E.G. FOREIGN KEYS, CONSTRAINTS)

JOINS

TRANSACTIONS – LIMITED A.C.I.D. PROPERTIES

INDEXES

OPEN SOURCE

WHEN NOT TO USE MONGODB

IF TRANSACTIONS ARE A MUST

IF JOINS ARE ABSOLUTELY NECESSARY

SOFTWARE PRODUCTS LIKE WORDPRESS THAT ALREADY HAVE TONS OF SUPPORT FOR RELATIONAL DATABASES

FOR MONGODB vs MYSQL ARGUMENTS, WATCH…

Source: http://www.youtube.com/watch?v=b2F-DItXtZs

A PEEK AT MONGODB COMMANDS

{ _id: ObjectID(“A1234566789”), name: “Ed Brown”, orderDate: “2-1-2014”

}{

_id: ObjectID(“A1234566789”), name: “Roman Ku”, orderDate: “1-1-2014”

}{

_id: ObjectID(“A1234566789”), name: “Eva Green”, orderDate: “10-12-2013”

}

MONGODB IS A DOCUMENT-ORIENTED DATABASE

DOCUMENTS ARE INTERNALLY STORED AS BSON (BINARY JSON)

MONGODB FEATURES

EASY TO LEARNDYNAMIC QUERY LANGUAGE - SEARCH BY FIELDS, REGULAR EXPRESSIONS- USER-DEFINED JAVASCRIPT FUNCTIONS- AGGREGATION, INCLUDING MAP/REDUCEINDEXING – SINGLE, COMPOUND, GEOSPATIALREPLICATIONLOAD BALANCING USING SHARDINGGRIDFS TO STORE FILES

MONGODB SYNTAX SEEMS TO BE BORROWED FROM…

- MYSQL

- JSON

- JAVASCRIPT

- UNIX

MONGODB SUPPORTS SEVERAL LANGUAGES

DRIVERS FOR

- PYTHON

- NODE.JS

- C#

- HADOOP

- R

AND MANY MORE

MONGODB TERMINOLOGYRDBMS MONGODBDATABASE DATABASE

TABLE COLLECTION

ROW DOCUMENT

A DATABASE CAN HAVE 1 OR MORE COLLECTIONS.

A COLLECTION CAN HAVE 1 OR MORE DOCUMENTS.

A DOCUMENT CAN HAVE 1 OR MORE NAME-VALUE PAIRS, AND/OR 1 OR MORE EMBEDDED DOCUMENTS.

MONGODB SUPPORTS SEVERAL DATA TYPES

STRING

NUMBER

BOOLEAN

ARRAY

DATE

EMBEDDED DOCUMENT

NULL

MONGODB OPERATIONS

C.R.U.D.CREATE

READ

UPDATE

DELETE

CONNECTING TO MONGODB

MONGOD

MONGO ROBOMONGO

MONGO SHELL IS A JAVASCRIPT INTERPRETER.

ROBOMONGO HAS THE SAME JAVASCRIPT ENGINE AS THE MONGO SHELL.

mongoimport -d tennis –c ParksNYC --type json --drop < ParksNYC.json

IMPORT JSON TO MONGO COLLECTION

CREATE TABLE ParksNYC

(

id int identity(1, 1),

Prop_ID varchar(10),

Name varchar(50) not null,

Location varchar(20) not null,

EstablishedOn datetime

)

SQL MONGODB

CREATE COLLECTION

INSERT ParksNYC (Prop_ID, Name, Location, EstablishedOn)

VALUES(’Q900’, ’Ridge Park’, ‘1843 Norman St.’, ‘1/1/1970’)

db.ParksNYC.insert({

Prop_ID : "Q900",

Name : "Ridge Park",

Location : ”1843 Norman St.”,

EstablishedOn: “1/1/1970”

})

SQL MONGODB

CREATE DOCUMENT

Prop_ID Name Location EstablishedOn

Q900 Ridge Park 1843 Norman St. 1/1/1970

SELECT * FROM ParksNYC

SQL MONGODB

READ ALL DOCUMENTS

db.ParksNYC.find()

SELECT * FROM ParksNYC

WHERE Name = "Ridge Park"

SQL MONGODB

READ SPECIFIC DOCUMENT

db.ParksNYC.find(

{

Name : "Ridge Park”

})

SELECT TOP 1 * FROM ParksNYC

SQL MONGODB

READ FIRST DOCUMENT

db.ParksNYC.findOne()

SELECT id, Name FROM ParksNYC

SQL MONGODB

READ SPECIFIC FIELDS IN DOCUMENT

db.ParksNYC.find(

{ },{

_id: 1, Name: 1

}

)

SELECT id, Name FROM ParksNYC WHERE Courts > 5AND Courts <= 8

SQL MONGODB

READ DOCUMENTS WITH RANGE CRITERIA

db.ParksNYC.find(

{

Courts: { $gt: 5, $lte: 8}

}

)

SELECT id, Name FROM ParksNYC WHERE NAME LIKE ‘F%’

SQL MONGODB

READ DOCUMENTS THAT START WITH A LETTER (REGULAR EXPRESSION)

db.ParksNYC.find(

{

Name: /^F/

}

)

UPDATE ParksNYCSET VisitDate = ‘1/1/2014’

SQL MONGODB

UPDATE FIELD IN DOCUMENT

db.ParksNYC.update({ }, {

$set: { VisitDate: "1/1/2014" }

},{ multi: true}

)

DELETE FROM ParksNYCWhere Name = ‘Ridge Park’

SQL MONGODB

DELETE DOCUMENT

db.ParksNYC.remove(

{

Name : “Ridge Park”})

SELECT COUNT(Name) AS Parks_Number,

SUM(Courts) AS Courts_Number

FROM ParksNYC

GROUP BY Accessible

SQL MONGODB

GROUP BY AND SUM

db.ParksNYC.aggregate({ $group :

{_id : "$Accessible", Parks_Number : { $sum : 1 }, Courts_Number :

{ $sum : "$Courts" } }

})

SHARDING AND REPLICATION IN MONGODB

EACH DOCUMENT CAN BE IN A DIFFERENT MACHINE

HOW DOES MONGODB DO THIS?

AUTOSHARDING, FOR A COLLECTION

MONGODB CLUSTER

MONGOS

CLIENT

MONGOD MONGOD MONGOD

CLIENT

MONGOD

SHARDING STEPS1. ENABLE SHARDING ON DATABASE.2. PICK A SHARD KEY FROM THE COLLECTION.

MAKE SURE THE KEY IS- INDEXED- SUFFICIENTLY UNIQUE SO IT WILL HAVE A VARIETY OF UNIQUE VALUES.

3. SIT BACK AND RELAX. MONGODB WILL AUTOMATICALLY DO THE SHARDING.

SHARDING WP_POSTS COLLECTION{

_id: 1,post_author: “Amy W”,post_date: “1/1/2014”,comments: [{

comment_author: “bestguy”,comment_date: “1/1/2014”

},{comment_author: “baddie”,comment_date: “1/10/2014”

},{comment_author: “clever24”,comment_date: “1/11/2014”

}]}

SHARD KEY

BREAKING THE USERS INTO CHUNKS

$minKeyAbba1234

Abba1235CarlW

CarlZFrankT

FrankYJackA

JackBLambV

LambWRobF

RobGTimA

TimB$maxKey

BREAKING THE RANGE INTO CHUNKS

$minKeyAbba1234

Abba1235CarlW

CarlZFrankT

FrankYJackA

JackBLambV

LambWRobF

RobGTimA

TimB$maxKeyMONGOS

CLIENT

MONGOD

MONGOD

MONGOD

SHARD0000

SHARD0001

SHARD0002

BENEFITS OF SHARDING

1. INCREASES AVAILABLE MEMORY.2. REDUCES LOAD ON THE SERVER.3. INCREASES HARD DISK SPACE.4. LOCATION-BASED SHARD KEYS CAN PUT DATA

CLOSE TO THE USERS AND KEEP RELATED DATA TOGETHER.

MASTER-SLAVE REPLICATION

MONGOD

CLIENT

MASTER SLAVE SLAVE

REPLICA SET

MONGOD MONGOD

MASTER-SLAVE REPLICATION

MONGOD

CLIENT

MASTER SLAVE SLAVE

REPLICA SET

MONGOD MONGOD

ELECTION

MASTER-SLAVE REPLICATION

MONGOD

CLIENT

MASTER SLAVE

REPLICA SET

MONGOD MONGOD

MINIMUM 3 MEMBERS TO FORM REPLICA SET

MASTER-SLAVE REPLICATION

MONGOD

CLIENT

MASTER SLAVE

REPLICA SET

MONGOD MONGOD

SLAVE

REPLICATION SOLVES THE PROBLEM OF AVAILABILITY

AND FAULT TOLERANCE

FUTURE OF MONGODB AND US

COMPANIES USING MONGODB

MONGODB WINS AWARD

36 MOST VALUABLE STARTUPS ON EARTH

ORACLE

SQL SERVER

MONGODB

POSTGRESQL

?RIAK

NEO4J

POLYGLOT PERSISTENCE

GOOD TO KNOW BOTH SQL AND

NOSQL

MYSQL

DREMEL

ARCHITECT

WHAT WE DID NOT COVER

SECURITY

BACKUP/RECOVERY

DATA MODELING

THANK YOU VERY MUCH

AND THANK YOU TO EVERYONE WHO HELPED US

DR. BILL HOWE, UNIVERSITY OF WASHINGTON

JASON CHEN, MONGODB RECRUITER

KRISTINA CHODOROW (DEFINITIVE GUIDE AUTHOR)

FRANCESCA KRIHELY (MONGODB COMMUNITY MANAGER)

DR. MARKUS SCHMIDBERGER, RMONGODB

JOHANNES BRANDSTETTER, MONGOSOUP (THE FIRST EUROPEAN PARTNER OF MONGODB TO PROVIDE MONGODB AS A SERVICE)

DR. RAMNATH VAIDYANATHAN, RCHARTS

REFERENCESMongoDBhttp://www.mongodb.org

Book: MongoDB, The Definitive Guide – Kristina Chodorow

Book: NoSQL Distilled – Pramod J. Sadalage and Martin Fowler

NoSQLhttp://en.wikipedia.org/wiki/NoSQL

MongoDB Use Caseshttp://www.mongodb.com/use-cases

First NoSQL Meetup Noteshttp://developer.yahoo.com/blogs/ydn/notes-nosql-meetup-7663.html

Billion dollar clubhttp://graphics.wsj.com/billion-dollar-club/

Photos from Google

DEMO

top related