Top Banner
47
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NoSQL
Page 2: NoSQL

Agenda● History● Relational databases● Horizontal vs vertical scaling● CAP theorem● Document databases● Key value databases● Graph databases● Column family databases

Page 3: NoSQL

History● Non SQL (not traditional tabular database)● Facebook, Google, Amazon..etc (Big data and

real time applications)● Horizontal scaling is a problem in relational

database● Not only SQL (SQL like queries)

Page 4: NoSQL

Relational Databases :)● MySQL, Oracle, SQL Server, Postgres..etc● Carpenter Hammer● Easy & Popular● Avoid data duplication but complex queries● Atomicity (transactions)

Page 5: NoSQL

Relational Databases :(

● Defined schema, optional attributes (NULLs)● Use joins to aggregate related data● Large data VOLUME and high rate of READ

(scalability)

Page 6: NoSQL

Scaling

Page 7: NoSQL

Scaling

Page 8: NoSQL

Horizontal (Sharding)

Page 9: NoSQL

Horizontal (Master-Slave Replication)

Page 10: NoSQL

CAP Theorem

● Consistency (all nodes see the same data at the same time)

Page 11: NoSQL

CAP Theorem

● Availability (every request definitely receives a response with success or failure)

Page 12: NoSQL

CAP Theorem

● Partition tolerance (the system continues to operate )

Page 13: NoSQL
Page 14: NoSQL

Pick

Only

“TWO”

Page 15: NoSQL

CAP Proof

Page 16: NoSQL

Eventually Consistent

Page 17: NoSQL

SQL Vs NoSQLRelational Databases NoSQL Databases

Vertical and not too many horizontal Horizontal scaling

Consistent Consistent or Eventual consistent

Scalable reads Scalable reads/writes

Transactions on multiple tables Difficult to support transactions

No partition tolerance Partition tolerance

Schema/tables Schemaless

Flexible queries (joins) Limited queries

Page 18: NoSQL
Page 19: NoSQL
Page 20: NoSQL

1)Document Databases● Simple & popular● Close to relational database● MongoDB was a rising star in 2009

Page 21: NoSQL

1)Document Databases● Simple & Popular● Seven Databases in Seven Weeks

Page 22: NoSQL

JSON Document Vs Row

● Document Vs Row

● Collection Vs Table

● Nesting no joins● Query in sub-doc● Duplicate data to

avoid joins● Schemaless

Page 23: NoSQL

MongoDB CP● Consistency

Master-Slave (elections)

● CouchDB is AP

Page 24: NoSQL

MongoDB Conclusion● Simple● Scalable● Embedded document● CP● Built-in Geo-spatial support● No joins● May need to duplicate data● Writes should go through master node

Page 25: NoSQL

2) Key-Value Databases● Light & compact● Hash table (values; text, blob, json,

image..etc)● Reads are fast, writes are faster

Page 26: NoSQL

Key-Value Databases

● Redis Hash

Page 27: NoSQL

Redis Complex Data Types● List

Page 28: NoSQL

Redis Complex Data Types● Blocking List

Page 29: NoSQL

Redis Complex Data Types● Publish-Subscribe

Page 30: NoSQL

Redis Complex Data Types● Set

Page 31: NoSQL

Redis Complex Data Types● Expiry Caching

Page 32: NoSQL

Redis in Memory

● No instant persistency by default in memory

● Persist periodically by taking snapshots

Page 33: NoSQL

Redis CP● Sharding (A,B,C)● Replication A => A1, B => B1, C => C1● If master B fails, B1 is the promoted to be a

master● Redis is NOT strong consistent (if both A, A1

fails)

● Riak is AP

Page 34: NoSQL

Redis Conclusion● Light & Compact● Key-value● Complex data types● Fast in memory● Dataset should be less than RAM size● Transforming data, caching, messaging● CP but not strongly consistent● Flexible persistence levels● Rarely used alone

Page 35: NoSQL

3) Graph Databases

● Directed graph

● Node has properties

● Relation has properties

Page 36: NoSQL

Graph Databases

Page 37: NoSQL

Graph Databases

Page 38: NoSQL

Graph Databases (AP)

● Tens of billions of nodes and edges

● No Sharding; replicate all the graph

● High availability over Consistency

● Elect a gold master but writes to slaves directly

● Community edition is free but full version is NOT

Page 39: NoSQL

4) Column-Family Databases

Row family database:

● Many columns● Seek disk

operation● Low compression

rate

Page 40: NoSQL

Column-Family Databases

● In RDBMS, heavy writes, so store rows as a bulk

● In columns, heavy reads, store columns together

Page 41: NoSQL

HBase● Database for HDFS (RDBMS vs

files)● Widely used with Hadoop● Scalability! At least five nodes in

production● Facebook messaging system

infrastructure 2010

Page 42: NoSQL

HBase Column Family

Page 43: NoSQL

HBase Column Family● Key-Value pairs

(Map of maps)● Column families

should be defined but the columns are schema-less

Page 44: NoSQL

HBase Versioning● Versioning● It became map of

map of map (asc, asc, desc)

● Garbage collector for expired data

● Everything is binary● Compression rate

Page 45: NoSQL

FB Messaging Index Table● The row keys are user IDs● Column qualifiers are words that

appear in that user’s messages● Timestamps are message IDs of

messages that contain that word● Value is offset of word in message

Page 46: NoSQL

HBase Vs Cassandra● HBase on Hadoop, Cassandra is standalone● HBase community is more active

● HBase is CP, Cassandra is AP● Cassandra more suitable for high concurrent

writes

Page 47: NoSQL

The right tool for the right job