NOSQL INTRO & MONGO DB NOSQL I NTRODUCT ION A ND U SING MONG ODB NOSQL INTRO & MONGODB 1
Oct 18, 2014
NOSQL INTR
O & M
ONGO
DBN
OS
QL I
NT
RO
DU
CT
I ON
AN
D U
SI N
G M
ON
GO
DB
N O S Q L I N T R O & M O N G O D B 1
N O S Q L I N T R O & M O N G O D B
REQUISITE SLIDE – WHO AM I?- Brian Enochson
- SW Engineer who has worked as designer / developer on NOSQL (Mongo, Cassandra, Hadoop)
- Consultant – HBO, ACS, CIBER- Specialize in SW Development, architecture and training
Brian Enochson
Twitter @benochso
Google Plus https://plus.google.com/+BrianEnochson
Contact Me:
I am available for training, consulting & development.
2
N O S Q L I N T R O & M O N G O D B
AGENDA
Hour 1
• Installation of required software (will send out list before, but make sure all of class has what is needed)
• Introduction to Big Data
• Introduction to NoSQL
• Relational Database to NoSQL technology contrast & compare
• NoSQL landscape
• Exercise – install and use required software
3
N O S Q L I N T R O & M O N G O D B
AGENDA
Hour 2
• Introduction to MongoDB
• MongoDB Components, capabilities and common use cases
• Json & BsON
• Documents, collections, references and Mongo ID
• Querying
• Other CRUD Operations
• Indexes
• Exercise – Design and populate MongoDB
4
N O S Q L I N T R O & M O N G O D B
AGENDA
Hour 3
• Data Modeling/Schema Design
• Replication & Sharding
• Exercise: Application Development Using MongDB and Java
• Wrap-up and final Q & A
5
N O S Q L I N T R O & M O N G O D B
SOFTWARE
Later we will need
• MongoDB http://www.mongodb.org/downloads
• Java JDK• 1.6
• Netbeans, Eclipse or Intellij (with maven support)• or maven and any editor
• Our project• http://bit.ly/IVnTEb
(or https://www.dropbox.com/sh/mwu6lltaljqq59z/PMWiw7ZPk3)
• Robomongo or MongoExplorer
6
N O S Q L I N T R O & M O N G O D B
BIG DATA
Why are database like Mongo needed?
• To understand we need to look at • the history of databases• How systems were built in the past
• Modern Application Architectures• Web scale• Data acquisition
• Other factors like cost of H/W
7
N O S Q L I N T R O & M O N G O D B
HISTORY OF THE DATABASE
• 1960’s – Hierarchical and Network type (IMS and CODASYL)
• 1970’s – Beginnings of theory behind relational model. Codd
• 1980’s – Rise of the relational model. SQL. E/R Model (Chen)
• 1990’s – Access/Excel and MySQL. ODMS began to appear
• 2000;’s – Two forces; large enterprise and open source. Google and Amazon. CAP Theorem (more on that to come…)
• 2010’s – Immergence of NoSQL as an industry player and viable alternative
8
N O S Q L I N T R O & M O N G O D B
WHY WERE ALTERNATIVES NEEDED
• Developers today are faced with Internet scale• 100,000’s of users• Low cost of storage• Increased processing power• Ability to capture (and need) of millions of events. Caching solves it to an
extent but brings other complexities• Real-time• Need to scale out and not up. (add infinite number of low cost machines
vs. replace with a more powerful machine).
• Cost• Let’s not forget for enterprise DB’s Internet scale can become expensive• Open source DB’s may solve license cost, but don’t ignore operational
costs
9
N O S Q L I N T R O & M O N G O D B
A LOT OF DATA
Some facts from http://www.storagenewsletter.com/rubriques/market-reportsresearch/ibm-cmo-study/
Approximately 90 percent of all the real-time information being created today is unstructured data
Every day we create 2.5 quintillion (10 to the 18th) bytes of data (this is 30 zeroes!!)
90 percent of the world's data today has been created in the last two years alone
10
N O S Q L I N T R O & M O N G O D B
RELATIONAL VS. NOSQL
• Relational
• Divide into tables, relate into foreign keys, DB constraints, normalized data, the Interface is SQL
• NoSQL
• Store in schemaless format, redundancy encouraged, application access determines the storage format (your queries).Interface varies and is optimized for the implementation, no forced DB constraints. Tradeoff is often you get eventual consistency.
11
N O S Q L I N T R O & M O N G O D B
TRADEOFFS?
Luckily, due to the large number of compromises made when
attempting to scale their existing relational databases,
these tradeoffs were not so foreign or
distasteful as they might have been.
Greg Burd - https://www.usenix.org/legacy/publications/login/2011-10/openpdfs/Burd.pdf
12
N O S Q L I N T R O & M O N G O D B
3 V’S – DESCRIBING THE BIG DATA PROBLEMDriving force in requiring new technology is often referred to as the “3
V Model”.
• High Volume – amount of data
• High Variety – range of data types and sources
• High Velocity – speed of data in and out
OK, maybe 4 V’s
• Veracity – is all the data applicable to the problem being analyzed.
13
N O S Q L I N T R O & M O N G O D B
NOSQL IS NOT BIG DATA
NoSQL != Big Data
NoSQL products were created to help solve the big data problem.
Big data is a much larger problem than just storage. Analysis tools like Hadoop, messaging systems like Kafka, real time processing engines like Storm and machine learning (Mahout) all help solve the big data problem.
14
N O S Q L I N T R O & M O N G O D B
NOSQL TYPES
Document DB
• MongoDB, CouchDB,
Wide Column– Column Family
• Cassandra, HBASE, Amazon SimpleDB
Key Value
• Riak, Redis, DynamoDB, Voldemort, MemcacheDB
Graph
• Neo4J, OrientDB
Search (also alternatives, normally used with *)
• Lucene, Solr, ElasticSearch
Many many many, many more! (http://nosql-database.org/)
15
N O S Q L I N T R O & M O N G O D B
CHOOSING THE RIGHT ONE…
Choosing the right NoSQL type and eventual product depends on…
Type of Data• One key and a lot of data?• High volume of data?• Storing, media, blobs, • Document oriented?• Tracking relationships?• Combination?• Multi-Datacenter
Type of Access
Volumes of Data (there is big data and there is BIG DATA)
Need Support/Services/Training
16
N O S Q L I N T R O & M O N G O D B
SOME BASICS
• ACID
• CAP Theorem
• BASE
17
N O S Q L I N T R O & M O N G O D B
ACID
YOU PROBABLY ALL HAVE HEARD OF ACID
• Atomic – All or None
• Consistency – What is written is valid
• Isolation – One operation at a time
• Durability – Once committed to the DB, it stays
This is the world we have lived in for a long time…
18
N O S Q L I N T R O & M O N G O D B
CAP THEOREM (BREWERS)
Many may have heard this one
CAP stands for Consistency, Availability and Partition Tolerance• Consistency –like the C in ACID. Operation is all or nothing,
• Availability – service is available.
• Partition Tolerance – No failure other than complete network failure causes system not to respond
(REMEMBER VISUAL GUIDE TO SELECTING A NO SQL DATABASE
So.. What does this mean?
** http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
19
N O S Q L I N T R O & M O N G O D B
YOU CAN ONLY HAVE 2 OF THEM
20
Or better said in C* terms you can have Availability and Partition-Tolerant AND Eventual Consistency.
Means eventually all accesses will return the last updated value.
N O S Q L I N T R O & M O N G O D B
VISUAL GUIDE – USING THE CAP THEOREMHTTP://BLOG.NAHURST.COM/VISUAL-GUIDE-TO-NOSQL-SYSTEMS
21
N O S Q L I N T R O & M O N G O D B
BIG DATA WRAP UP
• So we are talking about large amounts of data
• High velocity of acquisition
• A lot of variety that we need to store. Will worry about it later how to handle (or not)
• Need to scale and not break the bank
• Want the database to support agile, not hinder
22
N O S Q L I N T R O & M O N G O D B
STILL WRAPPING
• Maybe consider going relational if
• High transaction (FoundationDB?)
• Business Intelligence Systems (Hadoop may make this not true)
• Don’t be fooled by fear of losing ACID….http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on-why-banks-are-base-not-acid-availability.html
23
N O S Q L I N T R O & M O N G O D B
And now, let’s look at
MongoDB
24
N O S Q L I N T R O & M O N G O D B
MONGO OVERVIEW
Few high level points
• Document Oriented
• Storage format is JSON (actually BSON)
• Replication built in
• Master / slave architecture
• Strong querying support
• from "humongous"
25
N O S Q L I N T R O & M O N G O D B
MEET MONGO
• Open Source
• Schemaless
• Scalable
• Document Level Atomicity
• Easy Installation
• Relatively Ease Of Use
• Great (!!!!) Documentation
26
N O S Q L I N T R O & M O N G O D B
AND…
• No cross document transactions
• No joins
• Replication – master / slave
• Sharding
27
N O S Q L I N T R O & M O N G O D B
MONGO ADVANTAGE
-
28
* Credit – Dwight Merriman, Founder and CEO – MongoDB (was 10Gen)
N O S Q L I N T R O & M O N G O D B
DOCUMENT
At its simplest form, Mongo is a document oriented database
• MongoDB stores all data in documents, which are JSON-style data structures composed of field-and-value pairs.
• MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents. BSON contains more data types than does JSON.
** For in-depth BSON information, see bsonspec.org.
29
N O S Q L I N T R O & M O N G O D B
WHAT DOES A DOCUMENT LOOK LIKE
{
"_id" : "52a602280f2e642811ce8478",
"ratingCode" : "PG13",
"country" : "USA",
"entityType" : "Rating”
}
30
MONGO DOCUMENTS
N O S Q L I N T R O & M O N G O D B 31
N O S Q L I N T R O & M O N G O D B
RULES FOR A DOCUMENT
Documents have the following rules:
The maximum BSON document size is 16 megabytes.
The field name _id is reserved for use as a primary key; its value must be unique in the collection.
The field names cannot start with the $ character.
The field names cannot contain the . character.
32
N O S Q L I N T R O & M O N G O D B
MONGO INSTALL
Windows
http://docs.mongodb.org/manual/tutorial/install-mongodb-on-windows/
MAC
http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-x/
Create Data Directory , Defaults• C:\data\db• /data/db/ (make sure have permissions)
Or can set using -dbpath
C:\mongodb\bin\mongod.exe --dbpath d:\test\mongodb\data
33
N O S Q L I N T R O & M O N G O D B
START IT!
Database
mongod
Shell
mongo
show dbs
show collections
db.stats()
34
N O S Q L I N T R O & M O N G O D B
BASIC OPERATIONS
1_simpleinsert.txt
Insert
Find Find all Find One Find with criteria
Indexes Explain()
35
N O S Q L I N T R O & M O N G O D B
MORE MONGO SHELL
2_arrays_sort.txt
• Embedded documents
• Limit, Sort
• Using regex in query
• Removing documents
• Drop collection
36
N O S Q L I N T R O & M O N G O D B
IMPORT / EXPORT
3_imp_exp.txt
Mongo provides tools for getting data in and out of the database• Data Can Be Exported to json files
• Json files can then be Imported
37
N O S Q L I N T R O & M O N G O D B
CONDITIONAL OPERATORS
4_cond_ops.txt
• $lt• $gt• $gte• $lte• $or
• Also $not, $exists, $type, $in
(for $type refer to http://docs.mongodb.org/manual/reference/operator/query/type/#_S_type )
38
N O S Q L I N T R O & M O N G O D B
ADMIN COMMANDS
5_admin.txt
• how dbs• show collections• db.stats()• db.posts.stats()• db.posts.drop()• db.system.indexes.find()
39
N O S Q L I N T R O & M O N G O D B
DATA MODELING
• Remember with NoSql redundancy is not evil
• Applications insure consistency, not the DB
• Application join data, not defined in the DB
• Datamodel is schema-less
• Datamodel is built to support queries usually
40
N O S Q L I N T R O & M O N G O D B
QUESTIONS TO ASK
• Your basic units of data (what would be a document)?
• How are these units grouped / related?
• How does Mongo let you query this data, what are the options?
• Finally, maybe most importantly, what are your applications access patterns?• Reads vs. writes• Queries• Updates• Deletions• How structured is it
41
N O S Q L I N T R O & M O N G O D B
DATA MODEL - NORMALIZED
Normalized
• Similar to relational model.
• One collection per entity type
• Little or no redundancy
• Allows clean updates, familiar to many SQL users, easier to understand
42
N O S Q L I N T R O & M O N G O D B
NORMALIZED DOCUMENTS
43
N O S Q L I N T R O & M O N G O D B
REFERENCES
• From parent to child{ name: "O'Reilly Media",
books: [12346789, 234567890, ...]}
• From child to parent{ _id: 123456789, title: "MongoDB: The Definitive Guide", publisher_id: "oreilly"}
44
N O S Q L I N T R O & M O N G O D B
DATA MODEL - EMBEDDED
Oft used pattern in Mongo, is to embed information as subdocuments.
• Used when there is a contains relationship
• Easier querying (when related data is often used together)
• Need to keep 16 MB document size in mind
45
N O S Q L I N T R O & M O N G O D B
EMBEDDED
46
N O S Q L I N T R O & M O N G O D B
OTHER CONSIDERATIONS FOR DATA MODELINGMany or few collections
• Many Collections• As seen in normalized• Clean and little redundancy• May not provide best performance• May require frequent updates to application if new types added
• Multiple Collections• Middle ground, partially normalized
• Not many collections• One large generic collection• Contains many types• Use type field
47
N O S Q L I N T R O & M O N G O D B
CONSIDERATION CONTINUED
• Document Growth – will relocate if exceeds allocated size
• Atomicity• Atomic at document level• Consideration for insertions, remove and multi-document updates
Sharding – collections distributed across mongod instances, uses a shard key
Indexes – index fields often queries, indexes affect write performance slightly
Consider using TTL to automatically expire documents
48
N O S Q L I N T R O & M O N G O D B
COMMON USES FOR MONGO
Log Collection
https://code.google.com/p/log4mongo/
Caching
Queues / Messaging
Capped Collections - fixed-size collections that support high-throughput operations that insert, retrieve, and delete documents based on insertion order.
Analytics
Prototyping
49
N O S Q L I N T R O & M O N G O D B
MONGODB DEVELOPMENT WITH JAVA
50
Supplied by MongoDB Itself
Easy to setup
Housed on maven repo
N O S Q L I N T R O & M O N G O D B
EXAMPLE JAVA APP
Load Health Data
Query Data
Administrative Functions
51
N O S Q L I N T R O & M O N G O D B
OTHER JAVA OPTIONS
Morphia
Spring MongoDB
52
N O S Q L I N T R O & M O N G O D B
SOME OTHER COOL STUFF
Get MEAN
Mongo, Express, Angular and Node
http://bitnami.com/stack/mean
Can install, in a VM or even in the cloud
53
N O S Q L I N T R O & M O N G O D B
THE CLOUD
Database in the cloud
https://mongolab.com/
Can access using shell, GUI Mongo explorer, mongoimport, mongoexport and use in application
Amazon, Rackspace, Joyent or Azure
54
N O S Q L I N T R O & M O N G O D B
BOOKSMongoDB: The Definitive Guide, 2nd EditionBy: Kristina ChodorowPublisher: O'Reilly Media, Inc.Pub. Date: May 23, 2013Print ISBN-13: 978-1-4493-4468-9Pages in Print Edition: 432
MongoDB in ActionBy: Kyle BankerPublisher: Manning PublicationsPub. Date: December 16, 2011Print ISBN-10: 1-935182-87-0Print ISBN-13: 978-1-935182-87-0Pages in Print Edition: 312
The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop ComputingBy Eelco Plugge; Peter Membrey; Tim HawkinsApress, September 2010ISBN: 9781430230519327 pages
55
N O S Q L I N T R O & M O N G O D B
BOOKS CONT.MongoDB Applied Design PatternsBy: Rick CopelandPublisher: O'Reilly Media, Inc.Pub. Date: March 18, 2013Print ISBN-13: 978-1-4493-4004-9Pages in Print Edition: 176
MongoDB for Web Development (rough cut!)By: Mitch PirtlePublisher: Addison-Wesley ProfessionalLast Updated: 14-JUN-2013Pub. Date: March 11, 2015 (Estimated)Print ISBN-10: 0-321-70533-5Print ISBN-13: 978-0-321-70533-4Pages in Print Edition: 360
Instant MongoDBBy: Amol Nayak;Publisher: Packt PublishingPub. Date: July 26, 2013Print ISBN-13: 978-1-78216-970-3Pages in Print Edition: 72
56
N O S Q L I N T R O & M O N G O D B
IMPORTANT SITES
• http://www.mongodb.org/
• https://mongolab.com/welcome/
• https://education.mongodb.com/
• http://blog.mongodb.org/
• http://stackoverflow.com/questions/tagged/mongodb
• http://bitnami.com/stack/mean
57
N O S Q L I N T R O & M O N G O D B
THAT’S ALL FOLKS
Questions?
Comments?
What other topics are of interest?
Thank You!!!!!!
58