Top Banner
NOSQL DATABASES: MONGODB VS CASSANDRA
25

N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

Dec 18, 2015

Download

Documents

Theodora Booker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

NOSQL DATABASES: MONGODB VS CASSANDRA

Page 2: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

INTRODUCTION What is a Database?

“… a repository with organized and structured data, … “ (Abramova & Bernardino, 2013-07)

Data can be accessed using DBMS (DataBase Management System)

What is DBMS? “ DBMS can be defined as a collection of

mechanisms that enables storage, edit and extraction of data” (Abramova & Bernardino, 2013-07)

Page 3: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

SQL

SQL: Structured Query Language Became standard for:

Data interaction Data manipulation

Data Stored as set of tables

Accessing data from different tables at the same time is possible.

Page 4: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

NOSQL

Carlo Strozzi presented NoSQL in 1980, back then, it refers to an open source database that didn’t use SQL interface.

Carlo Strozzi preferred to call it “noseequel” or “NoRel” Principle Difference

Popular after San Francisco conference held 2009

Why do we need NoSQL? In SQL ,efficiency in information extraction is affected

by the growth of data stored & used

Page 5: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

CAP THEOREM

Based from CAP theorem, the following guarantees can be defined: Consistency Availability Partition tolerance

CAP theorem derives Relational and NoSQL principles

Page 6: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

ACID

“ACID is a principle based on CAP theorem and used as set of rules for relational database transactions.“ (Abramova & Bernardino, 2013-07)

ACID guarantees: Atomic Consistent Isolated Durable

What if the amount of data is large? ACID may be hard to accomplish!

Page 7: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

BASE PRINCIPLE & NOSQL

BASE principle: Basically Available Soft state Eventually consistent

BASE still follows CAP theorem. Two of the three guarantees should be selected if

the system is distributed.

Page 8: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

TYPES OF NOSQL DATABASES

More than 150 different NoSQL databases Based on same principles Has some different characteristics.

Categories: Key-value Store Document Store Column-family Graph database

Page 9: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

KEY-VALUE STORE

Data is stored as a group of key and value

All keys are unique

Data Access is done by relating those keys to values

Hash contains all keys in order to provide information when needed

Page 10: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

DOCUMENT STORE

Databases are defined as set of Key-value stores that gets transformed into documents.

Each document is identified by unique key

Data access can be done using: key specific value

Page 11: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

COLUMN FAMILY

Similar to relational database model Structure:

Column Super-Column Column family

Structure of database is defined by super-columns and column families.

Data access is accomplished by specifying column family, key and column in order to get value, using following structure:

<columnFamily>.<key>.<column> = <value>

Page 12: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

GRAPH DATABASE

Those databases are used when data can be represented as graph, for example, social networks.

Page 13: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

MONGODB

“MongoDB is an open source NoSQL database developed in C++” (Abramova & Bernardino, 2013-07).

MongoDB is a document store database Documents are gathered into groups according to

their structure

CAP theorem Consistency Partition tolerance

Page 14: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

MONGODB (CONT.)

Description Data is sent to disc every 60 seconds. Everything is flushed to disc once new files are

created Each document is identified by “id” field An index for the “id” field is created

Characteristics Durability Concurrency

Page 15: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

MONGODB CHARACTERISTICS

Durability Durability of data is accomplished by the

creation of replicas. Master-Slave technique

Master: read & write Slave: read Slave with recent data becomes Master if the Master

goes down Replicas are asynchronous

Concurrency Locks

Page 16: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

CASSANDRA “Cassandra is a NoSQL database developed by Apache

Software Foundation; written in Java” (Abramova & Bernardino, 2013-07)

Similar to the usual relational model Difference is that stored data can be:

semi structured unstructured.

CAP theorem Partition tolerance High Availability

Designed to save large amount of data and deal with huge volumes in an efficient way.

Page 17: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

CASSANDRA (CONT.)

Peer-to-peer architecture (NO MASTER) High availability High scalability

Replicates data over multiple nodes in a cluster.

Replication Factor: Total number of replicas. RF(1): 1 copy of each row on 1 node RF(2): 2 copies of same records on 2 nodes

Fail nodes are replaced with no downtime, and they are detected using “gossip” protocols

Page 18: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

CASSANDRA (CONT.)

Replication Strategy: Simple: single data center Network Topology: multiple data centers

Cassandra Characteristics: Durability:

Two replication types: Synchronous Asynchronous

All writes & redundancies are known using a commit log.

Indexing: “Each node maintains the indexes of the table it

manages”

Data is manipulated using CQL

Page 19: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

YCSB

“The YCSB – Yahoo! Cloud Serving Benchmark is one of the most used benchmarks to test NoSQL databases” (Abramova & Bernardino, 2013-07).

YCSB has a client that consists of two parts: Workload generator Set of workloads.

Workloads are combinations of: read Write update operations are done on randomly chosen records.

Page 20: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

WORKLOAD A: 50%READS & 50% UPDATES

Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 19

Page 21: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

WORKLOAD B: 95% READS & 5%UPDATES

Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

Page 22: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

WORKLOAD C: 100% READS

Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

Page 23: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

WORKLOAD F: READ-MODIFY-WRITE

Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

Page 24: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

WORKLOAD G: 5% READS 95% UPDATES

Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20

Page 25: N O SQL D ATABASES : M ONGO DB VS C ASSANDRA. I NTRODUCTION What is a Database? “… a repository with organized and structured data, … “ (Abramova & Bernardino,

WORKLOAD H: 100% UPDATES

Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 21