What Exactly is NoSQL? Document databases, Column-family stores, Key-value pairs, more Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
May 17, 2015
What Exactly is NoSQL?Document databases, Column-family stores, Key-value pairs, more
Shashank Tiwariblog: shanky.org | twitter: @[email protected]
NoSQL?
NoSQL : Various Shapes and Sizes
• Document Databases
• Column-family Oriented Stores
• Key/value Data stores
• XML Databases
• Object Databases
• Graph Databases
Document Databases
• mostly MongoDB, little CouchDB
What is a document db?
• One that stores documents
• Popular options:
• MongoDB -- C++
• CouchDB -- Erlang
• Also Amazon’s SimpleDB
• ...what exactly is a document?
In the real world
• (Source: http://guide.couchdb.org/draft/why.html)
In terms of JSON
• {name: “John Doe”,
• zip: 10001}
What about db schema?
• Schema-less
• Different documents could be stored in a single collection
Data types: MongoDB
• Essential JSON types:
• string
• integer
• boolean
• double
Data types: MongoDB (...cont)
• Additional JSON types
• null, array and object
• BSON types -- binary encoded serialization of JSON like documents
• date, binary data, object id, regular expression and code
• (Reference: bsonspec.org)
A BSON example: object id
Data types: CouchDB
• Everything JSON
• Large objects: attachments
CRUD operations for documents
• Create
• Read
• Update
• Delete
MongoDB: Create Document
• use mydb
• w = {name: “John Doe”, zip: 10001};
• db.location.save(w);
Create db and collection
• Lazily created
• Implicitly created
• use mydb
• db.collection.save(w)
MongoDB: Read Document
• db.location.find({zip: 10001});
• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
MongoDB: Read Document (...cont)
• db.location.find({name: "John Doe"});
• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
MongoDB: Update Document
• Atomic operations on single documents
• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
Indexes(explain)
• db.ratings.find().explain();
Indexes(explain output)
• {
• "cursor" : "BasicCursor",
• "nscanned" : 1000209,
• "nscannedObjects" : 1000209,
• "n" : 1000209,
• "millis" : 1549,
• "indexBounds" : {
Indexes(ensure index)
• db.ratings.ensureIndex({ movie_id:1 });
• db.ratings.ensureIndex({ movie_id:-1 });
Indexes(explain when index used)
• {
• "cursor" : "BtreeCursor movie_id_1",
• "nscanned" : 2077,
• "nscannedObjects" : 2077,
• "n" : 2077,
• "millis" : 2,
• "indexBounds" : {
Indexes(get indexes)
• db.ratings.getIndexes();
Sorted Ordered Column-family Datastores
• Sorted
• Ordered
• Distributed
• Map
Essential schema
Multi-dimensional View
A Map/Hash View
• {
• "row_key_1" : { "name" : {
• "first_name" : "Jolly", "last_name" : "Goodfellow"
• } } },
• "location" : { "zip": "94301" },
Architectural View (HBase)
The Persistence Mechanism
The underlying file format
Model Wrappers (The GAE Way)
• Python
• Model, Expando, PolyModel
• Java
• JDO, JPA
HBase Data Access
• Thrift + Avro
• Java API -- HTable, HBaseAdmin
• Hive (SQL like)
• MapReduce -- sink and/or source
Transactions
• Atomic row level
• GAE Entity Groups
Indexes
• Row ordered
• Secondary indexes
• GAE style multiple indexes
• thinking from output to query
Use cases
• Many Google’s Products
• Facebook Messaging
• StumbleUpon
• Open TSDB
• Mahalo, Ning, Meetup, Twitter, Yahoo!
• Lily -- open source CMS built on HBase & Solr
Brewer’s CAP Theorem
• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
Distributed Systems & Consistency (case: success)
Distributed Systems & Consistency (case: failure)
Binding by Transactions
Consistency Spectrum
Inconsistency Window
RWN Math
• R – Number of nodes that are read from.
• W – Number of nodes that are written to.
• N – Total number of nodes in the cluster.
• In general: R < N and W < N for higher availability
R + W > N
• Easy to determine consistent state
• R + W = 2N
• absolutely consistent, can provide ACID gaurantee
• In all cases when R + W > N there is some overlap between read and write nodes.
R = 1, W = N
• more reads than writes
• W = N
• 1 node failure = entire system unavailable
R = N, W =1
• W = N
• Chance of data inconsistency quite high
• R = N
• Read only possible when all nodes in the cluster are available
R = W = ceiling ((N + 1)/2)
Effective quorum for eventual consistency
Eventual consistency variants
• Causal consistency -- A writes and informs B then B always sees updated value
• Read-your-writes-consistency -- A writes a new value and never see the old one
• Session consistency -- read-your-writes-consistency within a client session
• Monotonic read consistency -- once seen a new value, never return previous value
• Monotonic write consistency -- serialize writes by the same process
Dynamo Techniques
• Consistent Hashing (Incremental scalability)
• Vector clocks (high availability for writes)
• Sloppy quorum and hinted handoff (recover from temporary failure)
• Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection)
• Anti-entropy using Merkle trees
• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf)
Consistent Hashing
Vector clocks (a trivial example)
• 4 hackers: Joe, Hillary, Eric and Ajay decide to meetup
• Joe -- suggests Palo Alto (t0)
• Hillary and Eric -- decide to meet in Mountain View (t1)
• Eric and Ajay -- decide to meet in Los Altos (t2)
• Joe mails: PA, Hillary responds: Mtn View, Ajay responds: Los Altos (t3)
• both Hillary and Ajay say: Eric knows
Vector clocks (how it works)
• Venue : Palo Alto
• Vector Clock: Joe (ver 1)
• Venue: Mountain View
• Vector Clock: Joe (ver 1), Hillary (ver 1), Eric (ver 1)
• Venue: Los Altos
• Vector Clock: Joe (ver 1), Ajay (ver 1), Eric (ver 1)
Vector clock (resolution)
• Venue : Palo Alto
• Vector Clock: Joe (ver 1)
• Venue: Mountain View
• Vector Clock: Joe (ver 1), Hillary (ver 1), Ajay (ver 0), Eric (ver 2)
• Venue: Los Altos
• Vector Clock: Joe (ver 1), Hillary (ver 0), Ajay (ver 1), Eric (ver 1)
CouchDB MVCC Style
• (Source: http://guide.couchdb.org/draft/consistency.html)
Key/value Stores
• Memcached
• Membase
• Redis
• Tokyo Cabinet
• Kyoto Cabinet
• Berkeley DB
Redis -- a key-value data structure server
• open source key-value store
• a data structure server
• values in key-value pairs can be strings, hashes, lists, sets, sorted sets
Where to find it?
• redis.io
• download a copy from http://redis.io/download
Who is building it?
• Core developers
• Salvatore Sanfilippo, twitter: @antirez
• Pieter Noordhuis, twitter: @pnoordhuis
• Main sponsor
• VMware
Written in
• ANSI C
• runs on POSIX compliant systems with no external dependencies
How can it be used?
• as an in memory data store
• with option to persist to disk
• in standalone mode or as a master-slave replicated set
• Redis cluster -- coming soon! (June 2011)
• as cache
Redis Architecture
Download and install
• curl -O http://redis.googlecode.com/files/redis-2.2.0-rc4.tar.gz
• (just a 436kb download)
• tar zxvf redis-2.2.0-rc4.tar.gz
• cd redis-2.2.0-rc4
• make & make install (installs in /usr/local/bin)
• make test (to be sure you install it correctly)
Start the redis-server
• /usr/local/bin/redis-server
• ...Server started, Redis version 2.1.12
• ...The server is now ready to accept connections on port 6379
Connect with redis-cli
• /usr/local/bin/redis-cli
• redis> set key1 val1
• OK
• redis> get key1
• "val1"
String key-value pairs
• like memcached
• with persistence
• key and value -- binary-safe strings
Binary-safe?
• redis> set "a key _" "another value"
• OK
• redis> get "a key _"
• "another value"
Questions?
• blog: shanky.org | twitter: @tshanky