Top Banner
The NoSQL Landscape Objective – Reasonable understanding of the non- relational or NoSQL data stores and how they relate to RDBMS databases we are all used to working with.
41
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: No sql landscape_nosqltips

The NoSQL Landscape

Objective – Reasonable understanding of the non-relational or NoSQL data stores and how they relate to RDBMS databases we are all used to working with.

Page 2: No sql landscape_nosqltips

About Me

Chief Architect – youwho.com

Former dot com CTO

NoSql advocate

nosqltips.blogspot.com

@nosqltips on twitter

Page 3: No sql landscape_nosqltips

Agenda

What is NoSQL?

Landscape

Vocabulary and concepts

CAP Theorem

SQL vs NoSQL comparison

Overview of each type w/ examples

Question and Answer

Page 4: No sql landscape_nosqltips
Page 5: No sql landscape_nosqltips
Page 6: No sql landscape_nosqltips
Page 7: No sql landscape_nosqltips
Page 8: No sql landscape_nosqltips
Page 9: No sql landscape_nosqltips

Vocabulary

CAP Theorem – consistency, availability, partitioning

ACID – Atomic, Consistent, Isolated, Durable

BASE – Basically Available, Soft state, Eventually consistent

RDF – Resource Description Framework

Sharding – Partitioning, distributed

Web Scale – Google, Twitter, Facebook, etc

Page 10: No sql landscape_nosqltips
Page 11: No sql landscape_nosqltips

CAP Tuning

NRW

N: Number of Data Copies

R: Read Quorum

W: Write Quorum

Hard Consistency – RDBMS

Soft Consistency – No Guarantees

Eventual Consistency – Most NoSQL

Page 12: No sql landscape_nosqltips

Cap Tuning Chart

NRW Outcome

N=3 Magic Number of Data Replicas

W=N R=1 Read Optimized – Strong Consistency.

W=1 R=N Write Optimized – Strong Consistency.

W+R > N Strong Consistency on Read and Write.

W+R <= N Weak Eventual Consistency. Read may not see the latest Data.

N > W > 1 Eventual Consistency - Most NoSQL data stores live here.

Page 13: No sql landscape_nosqltips

Eventual Consistency

All replicas have same data – eventually

Milliseconds to seconds

Not all applications are compatible

Various ways to ensure latest data

Vector Clocks, Read Repair, Gossiping

Application determines correct data

Page 14: No sql landscape_nosqltips
Page 15: No sql landscape_nosqltips

Comparison

SQL

Prefers big-box, self redundant

Keep things from breaking

Solidly in CA land

P is difficult and expensive

Query by SQL

Stored procedures

NoSQL

Prefers commodity hardware, distributed

Assume things break or are broken

Mostly AP, some tunable

P generally easy

Custom API, SQLish

Map/Reduce

Page 16: No sql landscape_nosqltips

Comparison

SQL

ACID transactions

Advanced indexing

Foreign key support

Strong lock support

Schema centric

API – usually JPA or JDBC

Strong access control

NoSQL

BASE transactions

Key only to Advanced

Usually none

Usually none

Usually schema-less

Depends on implementation

Usually none

Page 17: No sql landscape_nosqltips

Comparison

SQL

Complex disk store, random access

Easy for dev with JPA/Hibernate/SQL

Multi-platform

General purpose

Strong commercial support

Great tool support

NoSQL

Usually append only, 1 seek, 1 read

Puts more work on application dev

Favors Linux/Unix

More special purpose

Strong to no commercial support

Not so much

Page 18: No sql landscape_nosqltips
Page 19: No sql landscape_nosqltips

Column Stores

Data stored by column instead of row

Schema-less

Non-relational, data is de-normalized

Column format stores sparse data efficiently

Column families cannot change

10,000+ columns by 100 million+ rows

Easy sharding (partitioning)

Usually not ACID compliant

Page 20: No sql landscape_nosqltips

Column stores

BigTable – Google, 2006 paper

Hadoop/HBase – Part of Apache Hadoop

Cassandra – Facebook, LAN/WAN replication

Hypertable – Pluggable DFS, HQL

Vertica – Full SQL implementation

Amazon SimpleDB – Cloud store

Page 21: No sql landscape_nosqltips

Document Stores

CAP tunable

Either key/value or bucket/key/value

Easy/Auto sharding - Consistent hashing

Usually ACID compliant

Not SQL compliant, maybe custom query

Easy implementation via map or custom api

Page 22: No sql landscape_nosqltips

Document stores

Amazon – Dynamo and S3 (cloud based)

Riak – CAP tunable, built in map/reduce

CouchDB – ACID, REST api

MongoDB – Indexing, query support

Voldemort – Java, pluggable serialization

MySQL – Key access, denormalize schema, kill indexes

Page 23: No sql landscape_nosqltips

Memory Stores

Mostly in the CA realm

P can be tough depending on implementation

Some are distributed, some local only

Usually key-value stores

Many are disk backed, append only files

Designed for very high-speed access

Page 24: No sql landscape_nosqltips

Memory stores

CouchBase – Membase + CouchDb

Memcached – Local map

Coherence – Commercial Oracle, distributed

Redis – Supports hash, list, set, and sorted set, data structure server

Tokyo/Kyoto Cabinet – disk backed map

Infinispan – JSR-107 jcache impl

Scalaris – Erlang, strong consistency

Page 25: No sql landscape_nosqltips

Graph/Triple Store

Model relationships well, bi-directional

Node/edges – edges can be weighted or not

RDF Triple – subject -> predicate -> object, w3c standard for semantic web

Many implement SPARQL, object api

Sharding can difficult because of graph nature

Schema-less – nodes, edges, properties

Fast set operations

Page 26: No sql landscape_nosqltips

Graph/Triple Stores

Neo4j – ACID transactions, object API

Alegrograph – Reference impl of SPARQL

Bigdata – dynamic sharding

Trinity – Microsoft research

Infinite Graph – Distributed, cross-platform

FlockDb – Twitter, fast set operations

Infogrid – Object based, REST api

Page 27: No sql landscape_nosqltips

Interesting Integrations

Lucene - Document Store with Search as Query Language

SOLR and Elastic Search – Scalable Lucene

Riak Search – Elang impl of Lucene APIs

Solandra – Lucene on Cassandra backend

Couchdb-lucene – Integration

DistributedLucene – Lucene on Hadoop

Neo4j – Full Text Search on Graph Store

Page 28: No sql landscape_nosqltips

Worth Mentioning

Configuration Dbs – ZooKeeper, Doozer

Distributed configuration, locks, synchronization

Used to make other apps scalable

XML Dbs – eXist, BaseX, Xindice

XML only, Xquery, Xpath, ACID, GUI support

non-distributed

Page 29: No sql landscape_nosqltips
Page 30: No sql landscape_nosqltips
Page 31: No sql landscape_nosqltips

Case Study - HBase

Apache – part of Hadoop/HDFS

Requires ZooKeeper

Java based

Runs well on Amazon EC2

Excellent language support

Supports REST interface

Page 32: No sql landscape_nosqltips

HBase continued

Map/Reduce via Hadoop

Schema-less, column families fixed

Nearly unlimited columns and rows

HBQL – partial sql + JDBC support

Some ACID support, atomicity, durability

Integration with Hive for data warehousing, ad-hoc query support - HiveQL

Page 33: No sql landscape_nosqltips

Case Study - Riak

Data Model – Bucket/Key/Value

Value has MIME type, byte[]

Value supports one-way Links, basic graph

Erlang, Protocol Buffers, REST interfaces

Pre/Post Commit Hooks

CAP Tunable per bucket

Map/Reduce – Erlang and Javascript

Page 34: No sql landscape_nosqltips

Riak Continued

Vector Clocks

Read repair for R < N

Peer-to-Peer, Nothing Shared Architecture

Replication across data centers

Pluggable storage

API for Most Languages + REST

Commercial Support

Page 35: No sql landscape_nosqltips

Case Study - Redis

Supports hash, list, set, and sorted set

Fast set operations

Atomic updates

Everything stored in memory

Persistence to disk – periodic save, append only file, can be compacted

Good API support, JDBC subset driver

Page 36: No sql landscape_nosqltips

Redis Continued

Master – slave replication, read scalability, redundancy, slave can sync to disk

Can swap out values, keys must be in memory

Can be used as pub/sub messaging system

Can send multiple commands in single request

Built to be extremely fast

Supports very high speed atomic counters

Page 37: No sql landscape_nosqltips

Case Study - Neo4j

Java based – cross platform

ACID transactions

Durable persistence

Handle billions of nodes/edges single machine

Supports bulk data loading

Good language support

Page 38: No sql landscape_nosqltips

Neo4j Continued

Spatial index support

RDF triples/OWL/SPARQL support

Replication and HA – commercial version

Object oriented API

Sharding at client level

Dual open source and commercial license

Page 39: No sql landscape_nosqltips

Resources

fallabs.com/tokyocabinet

fallabs.com/kyotocabinet

redis.io

www.membase.org

neo4j.org

en.wikipedia.org/wiki/Triplestore

en.wikipedia.org/wiki/Graph_theory

research.microsoft.com/en-us/projects/trinity

Page 40: No sql landscape_nosqltips

Resources

www.jboss.org/infinispan

basho.com

nosqlpedia.com/wiki/Consistency_models_in_nonrelational_dbs

www.hypertable.org

project-voldemort.com

www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Page 41: No sql landscape_nosqltips

Resources

nosql-database.org

couchdb.apache.org

engineering.twitter.com/2010/05/introducing-flockdb.html

infinitegraph.com

nosql-database.org

http://www.w3.org/TR/rdf-concepts/