Agenda● History● Relational databases● Horizontal vs vertical scaling● CAP theorem● Document databases● Key value databases● Graph databases● Column family databases
History● Non SQL (not traditional tabular database)● Facebook, Google, Amazon..etc (Big data and
real time applications)● Horizontal scaling is a problem in relational
database● Not only SQL (SQL like queries)
Relational Databases :)● MySQL, Oracle, SQL Server, Postgres..etc● Carpenter Hammer● Easy & Popular● Avoid data duplication but complex queries● Atomicity (transactions)
Relational Databases :(
● Defined schema, optional attributes (NULLs)● Use joins to aggregate related data● Large data VOLUME and high rate of READ
(scalability)
Scaling
Scaling
Horizontal (Sharding)
Horizontal (Master-Slave Replication)
CAP Theorem
● Consistency (all nodes see the same data at the same time)
CAP Theorem
● Availability (every request definitely receives a response with success or failure)
CAP Theorem
● Partition tolerance (the system continues to operate )
Pick
Only
“TWO”
CAP Proof
Eventually Consistent
SQL Vs NoSQLRelational Databases NoSQL Databases
Vertical and not too many horizontal Horizontal scaling
Consistent Consistent or Eventual consistent
Scalable reads Scalable reads/writes
Transactions on multiple tables Difficult to support transactions
No partition tolerance Partition tolerance
Schema/tables Schemaless
Flexible queries (joins) Limited queries
1)Document Databases● Simple & popular● Close to relational database● MongoDB was a rising star in 2009
1)Document Databases● Simple & Popular● Seven Databases in Seven Weeks
JSON Document Vs Row
● Document Vs Row
● Collection Vs Table
● Nesting no joins● Query in sub-doc● Duplicate data to
avoid joins● Schemaless
MongoDB CP● Consistency
Master-Slave (elections)
● CouchDB is AP
MongoDB Conclusion● Simple● Scalable● Embedded document● CP● Built-in Geo-spatial support● No joins● May need to duplicate data● Writes should go through master node
2) Key-Value Databases● Light & compact● Hash table (values; text, blob, json,
image..etc)● Reads are fast, writes are faster
Key-Value Databases
● Redis Hash
Redis Complex Data Types● List
Redis Complex Data Types● Blocking List
Redis Complex Data Types● Publish-Subscribe
Redis Complex Data Types● Set
Redis Complex Data Types● Expiry Caching
Redis in Memory
● No instant persistency by default in memory
● Persist periodically by taking snapshots
Redis CP● Sharding (A,B,C)● Replication A => A1, B => B1, C => C1● If master B fails, B1 is the promoted to be a
master● Redis is NOT strong consistent (if both A, A1
fails)
● Riak is AP
Redis Conclusion● Light & Compact● Key-value● Complex data types● Fast in memory● Dataset should be less than RAM size● Transforming data, caching, messaging● CP but not strongly consistent● Flexible persistence levels● Rarely used alone
3) Graph Databases
● Directed graph
● Node has properties
● Relation has properties
Graph Databases
Graph Databases
Graph Databases (AP)
● Tens of billions of nodes and edges
● No Sharding; replicate all the graph
● High availability over Consistency
● Elect a gold master but writes to slaves directly
● Community edition is free but full version is NOT
4) Column-Family Databases
Row family database:
● Many columns● Seek disk
operation● Low compression
rate
Column-Family Databases
● In RDBMS, heavy writes, so store rows as a bulk
● In columns, heavy reads, store columns together
HBase● Database for HDFS (RDBMS vs
files)● Widely used with Hadoop● Scalability! At least five nodes in
production● Facebook messaging system
infrastructure 2010
HBase Column Family
HBase Column Family● Key-Value pairs
(Map of maps)● Column families
should be defined but the columns are schema-less
HBase Versioning● Versioning● It became map of
map of map (asc, asc, desc)
● Garbage collector for expired data
● Everything is binary● Compression rate
FB Messaging Index Table● The row keys are user IDs● Column qualifiers are words that
appear in that user’s messages● Timestamps are message IDs of
messages that contain that word● Value is offset of word in message
HBase Vs Cassandra● HBase on Hadoop, Cassandra is standalone● HBase community is more active
● HBase is CP, Cassandra is AP● Cassandra more suitable for high concurrent
writes
The right tool for the right job