Navigating the Transition from Relational to NoSQL Technology

1

Naviga&ng the Transi&on from Rela&onal to NoSQL Technology

Dip& Borkar Director, Product Management

SFDAMA presents

2

WHY TRANSITION TO NOSQL?

3

Changes in interac&ve so@ware – NoSQL driver

4

11%

12%

16%

29%

35%

49%

Other

All of these

Costs

High latency/low performance

Inability to scale out data

Lack of flexibility/rigid schemas

Source: Couchbase NoSQL Survey, December 2011, n=1351

What is the biggest data management problem driving your use of NoSQL in the coming year?

Survey: Two big drivers for NoSQL adop&on

5

NoSQL catalog

Key-‐Value

memcached

membase

redis

Data Structure Document Column Graph

mongoDB

couchbase cassandra

Cache

(mem

ory on

ly)

Database

(mem

ory/disk)

Neo4j

couchDB

6

Q

Q

Are you being impacted by these?

Schema Rigidity problems •  Do you store serialized objects in the database? •  Do you have lots of sparse tables with very few columns being used by most rows?

•  Do you find that your applica&on developers require schema changes frequently due to constantly changing data?

•  Are you using your database as a key-‐value store?

Scalability problems •  Do you periodically need to upgrade systems to more powerful servers and scale up?

•  Are you reaching the read / write throughput limit of a single database server?

•  Is your server’s read / write latency not mee&ng your SLA? •  Is your user base growing at a frightening pace?

7

DISTRIBUTED DOCUMENT DATABASES

8

Document Databases

•  Each record in the database is a self-‐describing document

•  Each document has an independent structure

•  Documents can be complex •  All databases require a unique key •  Documents are stored using JSON or XML or their deriva&ves

•  Content can be indexed and queried •  Offer auto-‐sharding for scaling and replica&on for high-‐availability

{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:

{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:

[“SERVER”, “US-‐West”, “API”]

}}

9

COMPARING DATA MODELS

10 hgp://www.geneontology.org/images/diag-‐godb-‐er.jpg

11

Rela&onal vs Document data model

R1C1




}}




}}




}}




}}

Rela&onal data model Document data model Highly-‐structured table organiza&on with rigidly-‐defined data formats and record

structure.

Collec&on of complex documents with arbitrary, nested data formats and

varying “record” format.

R1C2 R1C3 R1C4

R2C1 R2C2 R2C3 R2C4

R3C1 R3C2 R3C3 R3C4

R4C1 R4C2 R4C3 R4C4

12

Example: Error Logging Use case

KEY

Table 1: Error Log Table 2: Data Centers

ERR DC TIME KEY LOC

1 ERR FK(DC2)

TIME

2 ERR FK(DC2)

TIME

3 ERR FK(DC2)

TIME

4 ERR FK(DC3)

TIME

NUM

1

2

3

DEN

NYC

SFO

303-‐223-‐ 2332

212-‐223-‐ 2332

415-‐223-‐ 2332

13

{ “ID”: 4, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”, “DC”: “NYC”, “NUM”: “212-‐223-‐2332” }

Document design with flexible schema

{ “ID”: 5, “ERR”: “Out of Memory”, “TIME”: “2004-‐09-‐16T23:59:58.75”,

“COMPONENT”: ”DMS” “SEV”: “LEVEL1”

“DC”: “NYC”, “NUM”: “212-‐223-‐2332” }

SCHEMA CHANGE




14

When considering how to model data for a given applica&on •  Think of a logical container for the data •  Think of how data groups together

Document modeling

Q •  Are these separate object in the model layer? •  Are these objects accessed together? •  Do you need updates to these objects to be atomic? •  Are mul&ple people edi&ng these objects concurrently?

15

Document Design Op&ons

•  One document that contains all related data – Data is de-‐normalized –  Beger performance and scale –  Eliminate client-‐side joins

•  Separate documents for different object types with cross references – Data duplica&on is reduced – Objects may not be co-‐located –  Transac&ons supported only on a document boundary – Most document databases do not support joins

16

Document ID / Key selec&on

•  Similar to primary keys in rela&onal databases •  Documents are sharded based on the document ID •  ID based document lookup is extremely fast •  Usually an ID can only appear once in a bucket

Op&ons • UUIDs, date-‐based IDs, numeric IDs • Hand-‐cra@ed (human readable) • Matching prefixes (for mul&ple related objects)

Q •  Do you have a unique way of referencing objects? •  Are related objects stored in separate documents?

17

•  User profile The main pointer into the user data

•  Blog entries •  Badge sesngs, like a twiger badge

•  Blog posts Contains the blogs themselves

•  Blog comments •  Comments from other users

Example: En&&es for a Blog BLOG

18




}}

Blog Document – Op&on 1 – Single document

{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[ ! [“format”: “markdown”, “body”:”Awesome post!”],! [“format”: “markdown”, “body”:”Like it.” ]! ]!}

19

Blog Document – Op&on 2 -‐ Split into mul&ple docs




}}

{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[!

! “comment1_jchris_Hello_world”!! ]!

}!{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:



}}

{!“_id”: “comment1_jchris_Hello_World”,!“format”: “markdown”, !“body”:”Awesome post!” !}

BLOG DOC

COMMENT

20

•  You can imagine how to take this to a threaded list

Threaded Comments

Blog First comment

Reply to comment

More Comments

List

List

Advantages •  Only fetch the data when you need it

•  For example, rendering part of a web page

•  Spread the data and load across the en&re cluster

21

COMPARING SCALING MODEL

22

Modern interactive software architecture

Application Scales Out Just add more commodity web servers

Database Scales Up Get a bigger, more complex server

Note – Rela&onal database technology is great for what it is great for, but it is not great for this.

23

NoSQL database matches application logic tier architecture Data layer now scales with linear cost and constant performance.

Application Scales Out Just add more commodity web servers

Database Scales Out Just add more commodity data servers

Scaling out flattens the cost and performance curves.

NoSQL Database Servers

24

Other considera&ons

Accessing data –  No standards exist yet –  Typically via SDKs or over HTTP –  Check if the programing language of your choice is supported.

Consistency –  Consistent only at the document level –  Most documents stores currently don’t support mul&-‐document transac&ons

–  Analyze your applica&on needs

Availability –  Each node stores ac&ve and replica data (Couchbase)

–  Each node is either a master or slave (MongoDB)

App Server

App Server

App Server

25

Other considera&ons

Opera&ons –  Monitoring the system –  Backup and restore the system –  Upgrades and maintenance –  Support

Scaling –  Ease of adding and reducing capacity –  Applica&on availability on topology

changes

Indexing and Querying –  Secondary indexes (Map func&ons) –  Aggregates Grouping (Reduce func&ons) –  Basic querying

App Server

App Server

Client

26

Is NoSQL the right choice for you?

Does your applica&on need rich database func&onality?

•  Mul&-‐document transac&ons •  Complex security needs – user roles, document level security, authen&ca&on, authoriza&on integra&on

•  Complex joins across bucket / collec&ons •  BI integra&on •  Extreme compression needs

NoSQL may not be the right choice for your applica&on

27

WHERE IS NOSQL A GOOD FIT?

28

Performance driven use cases

•  Low latency •  High throughput magers •  Large number of users •  Unknown demand with sudden growth of users/data

•  Predominantly direct document access •  Workloads with very high muta&on rate per document (temporal locality) Working set with heavy writes

29

Data driven use cases

•  Support for unlimited data growth •  Data with non-‐homogenous structure •  Need to quickly and o@en change data structure •  3rd party or user defined structure •  Variable length documents •  Sparse data records •  Hierarchical data

30

BRIEF OVERVIEW COUCHBASE SERVER

31

Couchbase automa&cally distributes data across commodity servers. Built-‐in caching enables apps to read and write data with sub-‐millisecond latency. And with no schema to manage, Couchbase effortlessly accommodates changing data management requirements.

Couchbase Server

Simple. Fast. Elas&c. NoSQL.

32

Representa&ve user list

33

Couchbase architecture

Membase EP Engine

CouchDB

storage interface

Heartbeat

Process m

onito

r

Glob

al singleton supe

rviso

r

Confi

gura&o

n manager

on each node

Rebalance orchestrator

Nod

e he

alth m

onito

r

one per cluster

vBucket state and

replica&

on m

anager

hgp RE

ST m

anagem

ent A

PI/W

eb UI

Erlang/OTP

(built-‐in memcached)

Data Manager Cluster Manager

Database Opera&ons

Cluster Management

34

Couchbase deployment

Data Flow

Cluster Management

Web Applica&on

Couchbase Client Library

35

3 3 2

Clustering With Couchbase

SET request arrives at KEY’s master server

Listener-‐Sender

Master server for KEY Replica Server 2 for KEY Replica Server 1 for KEY

1 1 SET acknowledgement returned to applica&on

2

Disk Disk Disk

RAM

Couchb

ase storage en

gine

Disk Disk Disk

4

36

COUCHBASE CLIENT LIBRARY

Basic Opera&on

§ Docs distributed evenly across servers in the cluster

§ Each server stores both ac#ve & replica docs §  Only one server ac&ve at a &me

§ Client library provides app with simple interface to database

§ Cluster map provides map to which server doc is on §  App never needs to know

§  App reads, writes, updates docs

§  Mul&ple App Servers can access same document at same &me

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

User Configured Replica Count = 1

Read/Write/Update

COUCHBASE CLIENT LIBRARY

Read/Write/Update

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Ac&ve Docs Ac&ve Docs Ac&ve Docs

CLUSTER MAP

CLUSTER MAP

APP SERVER 1 APP SERVER 2

COUCHBASE SERVER CLUSTER

37

Add Nodes

§  Two servers added to cluster §  One-‐click opera&on

§  Docs automa&cally rebalanced across cluster §  Even distribu&on of

docs §  Minimum doc movement

§  Cluster map updated

§  App database calls now distributed over larger # of servers


Read/Write/Update Read/Write/Update

Doc 7

Doc 9

Doc 3

Ac&ve Docs

Replica Docs

Doc 6

COUCHBASE CLIENT LIBRARY CLUSTER MAP

APP SERVER 1


APP SERVER 2

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2



SERVER 4 SERVER 5

Ac&ve Docs Ac&ve Docs

Replica Docs Replica Docs


38

Fail Over Node

§  App servers happily accessing docs on Server 3

§  Server fails §  App server requests to server 3 fail §  Cluster detects server has failed

§  Promotes replicas of docs to ac#ve §  Updates cluster map

§  App server requests for docs now go to appropriate server

§  Typically rebalance would follow


Doc 7

Doc 9

Doc 3

Ac&ve Docs

Replica Docs

Doc 6


APP SERVER 1


APP SERVER 2

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7 Doc 8

Doc 6

Doc 3

DOC

DOC

DOC DOC

DOC

DOC

DOC DOC

DOC

DOC

DOC DOC

DOC

DOC

DOC

Doc 9

Doc 5 DOC

DOC

DOC

Doc 1

Doc 8

Doc 2



SERVER 4 SERVER 5

Ac&ve Docs Ac&ve Docs

Replica Docs Replica Docs


39

Reading and Wri&ng

Reading Data Wri&ng Data

Server

Give me document A

Here is document A

Application Server

A

Server

Please store document A

OK, I stored document A

Application Server

A

RAM

DISK

A

A

RAM

DISK

A

A

40

Server

Flow of data when wri&ng

Wri&ng Data

Application ServerApplication Server Application Server

Applica&ons wri&ng to Couchbase

Couchbase wri&ng to disk

network

Couchbase transmi^ng replicas

Replica&on queue Disk write queue

41

THANK YOU

[email protected]

42

43