Navigating the Transition from relational to NoSQL - CloudCon Expo 2012

1

Naviga&ng the Transi&on from Rela&onal to NoSQL Technology

Dip& Borkar Director, Product Management

2

WHY TRANSITION TO NOSQL?

3

Two big drivers for NoSQL adop&on

Lack of flexibility/ rigid schemas

Inability to scale out data

Performance challenges

Cost All of these Other

49%

35%

29%

16% 12% 11%

Source: Couchbase Survey, December 2011, n = 1351.

4

NoSQL catalog

Key-‐Value

memcached

membase

redis

Data Structure Document Column Graph

mongoDB

couchbase cassandra

Cache

(mem

ory on

ly)

Database

(mem

ory/disk)

Neo4j

5

DISTRIBUTED DOCUMENT DATABASES

6

Document Databases

•  Each record in the database is a self-‐describing document

•  Each document has an independent structure

•  Documents can be complex •  All databases require a unique key •  Documents are stored using JSON or XML or their deriva&ves

•  Content can be indexed and queried •  Offer auto-‐sharding for scaling and replica&on for high-‐availability

{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:

{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:

[“SERVER”, “US-‐West”, “API”]

}}

7

COMPARING DATA MODELS

8 h]p://www.geneontology.org/images/diag-‐godb-‐er.jpg

9

Rela&onal vs Document data model

Rela&onal data model Document data model Collec&on of complex documents with arbitrary, nested data formats and

varying “record” format.

Highly-‐structured table organiza&on with rigidly-‐defined data formats and

record structure.

JSON JSON

JSON

C1 C2 C3 C4

{ }

10

Example: User Profile

Address Info

1 DEN 30303 CO

2 MV 94040 CA

3 CHI 60609 IL

User Info

KEY First ZIP_id Last

4 NY 10010 NY

1 Dip& 2 Borkar

2 Joe

2 Smith

3 Ali 2 Dodson

4 John 3 Doe

ZIP_id CITY ZIP STATE

1 2

2 MV 94040 CA

To get informa&on about specific user, you perform a join across two tables

11

All data in a single document

Document Example: User Profile

{ “ID”: 1, “FIRST”: “Dip&”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }

JSON

= +

12

User ID First Last Zip

1 Dip& Borkar 94040

2 Joe Smith 94040

3 Ali Dodson 94040

4 Sarah Gorin NW1

5 Bob Young 30303

6 Nancy Baker 10010

7 Ray Jones 31311

8 Lee Chen V5V3M

•  •  • 

50000 Doug Moore 04252

50001 Mary White SW195

50002 Lisa Clark 12425

Country ID

TEL3

001

Country ID Country name

001 USA

002 UK

003 Argen&na

004 Australia

005 Aruba

006 Austria

007 Brazil

008 Canada

009 Chile

•  •  • 

130 Portugal

131 Romania

132 Russia

133 Spain

134 Sweden

User ID Photo ID Comment

2 d043 NYC

2 b054 Bday

5 c036 Miami

7 d072 Sunset

5002 e086 Spain

Photo Table

001

007

001

133

133

User ID Status ID Text

1 a42 At conf

4 b26 excited

5 c32 hockey

12 d83 Go A’s

5000 e34 sailing

Status Table

134

007

008

001

005

Country Table

User ID Affl ID Affl Name

2 a42 Cal

4 b96 USC

7 c14 UW

8 e22 Oxford

Affilia&ons Table Country

ID

001

001

001

002

Country ID

Country ID

001

001

002

001

001

001

008

001

002

001

User Table

. . .

Making a Change Using RDBMS

13

Making the Same Change with a Document Database

{ “ID”: 1, “FIRST”: “Dip&”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: { “TEXT”: “At Conf”

}

}

“GEO_LOC”: “134” }, “COUNTRY”: ”USA”

Just add informa&on to a document

JSON

, }

14

When considering how to model data for a given applica&on •  Think of a logical container for the data •  Think of how data groups together

Document modeling

Q •  Are these separate object in the model layer? •  Are these objects accessed together? •  Do you need updates to these objects to be atomic? •  Are mul&ple people edi&ng these objects concurrently?

15

Document Design Op&ons

•  One document that contains all related data – Data is de-‐normalized –  Be]er performance and scale –  Eliminate client-‐side joins

•  Separate documents for different object types with cross references – Data duplica&on is reduced – Objects may not be co-‐located –  Transac&ons supported only on a document boundary – Most document databases do not support joins

16

Document ID / Key selec&on

•  Similar to primary keys in rela&onal databases •  Documents are sharded based on the document ID •  ID based document lookup is extremely fast •  Usually an ID can only appear once in a bucket

Op&ons • UUIDs, date-‐based IDs, numeric IDs • Hand-‐crajed (human readable) • Matching prefixes (for mul&ple related objects)

Q •  Do you have a unique way of referencing objects? •  Are related objects stored in separate documents?

17

•  User profile The main pointer into the user data •  Blog entries •  Badge sekngs, like a twi]er badge

•  Blog posts Contains the blogs themselves

•  Blog comments •  Comments from other users

Example: En&&es for a Blog BLOG

18




}}

Blog Document – Op&on 1 – Single document

{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[ ! [“format”: “markdown”, “body”:”Awesome post!”],! [“format”: “markdown”, “body”:”Like it.” ]! ]!}

19

Blog Document – Op&on 2 -‐ Split into mul&ple docs




}}

{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[!

! “comment1_jchris_Hello_world”!! ]!

}!{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:



}}

{!“_id”: “comment1_jchris_Hello_World”,!“format”: “markdown”, !“body”:”Awesome post!” !}

BLOG DOC

COMMENT

20

•  You can imagine how to take this to a threaded list

Threaded Comments

Blog First comment

Reply to comment

More Comments

List

List

Advantages •  Only fetch the data when you need it •  For example, rendering part of a web page

•  Spread the data and load across the en&re cluster

21

COMPARING SCALING MODEL

22

RDBMS Scales Up Get a bigger, more complex server

Users

Applica&on Scales Out Just add more commodity web servers

Users

System Cost Applica&on Performance

Rela&onal Technology Scales Up

Rela&onal Database

Web/App Server Tier

Expensive and disrup&ve sharding, doesn’t perform at web scale


Won’t scale beyond this point

23

Couchbase Server Scales Out Like App Tier

NoSQL Database Scales Out Cost and performance mirrors app &er

Users

Scaling out flatens the cost and performance curves

Couchbase Distributed Data Store

Web/App Server Tier

Applica&on Scales Out Just add more commodity web servers

Users


Applica&on Performance System Cost

24

EVALUATING NOSQL

25

The Process – From Evalua&on to Go Live

Analyze your requirements

Find solu&ons / products that match key requirements

Execute a proof of concept / performance evalua&on

Begin development of applica&on Deploy in staging and then produc&on

1

2

3

4

5

No different from evalua&ng a rela&onal database

New requirements è New solu&ons

26

Analyze your requirements

•  Rapid applica&on development

–  Changing market needs –  Changing data needs

•  Scalability –  Unknown user demand –  Constantly growing throughput

•  Consistent Performance –  Low response &me for be]er user experience –  High throughput to handle viral growth

•  Reliability –  Always online

1

Common applica&on requirements

27

Find solu&ons that match key requirements

•  Linear Scalability •  Schema flexibility •  High Performance

2

NoSQL

RDBMS

RDBMS NoSQL

•  Mul&-‐document transac&ons •  Database Rollback •  Complex security needs •  Complex joins •  Extreme compression needs

•  Both / depends on the data

28

Proof of concept / Performance evalua&on

3

Prototype a workload •  Look for consistent performance…

–  Low response &mes / latency •  For be]er user experience

–  High throughput •  To handle viral growth •  For resource efficiency

•  … across –  Read heavy / Write heavy / Mixed workloads –  Clusters of growing sizes

•  … and watch for –  Conten&on / heavy locking –  Linear scalability

29

Other considera&ons

Accessing data –  No standards exist yet –  Typically via SDKs or over HTTP –  Check if the programing language of your

choice is supported.

App Server

App Server

App Server

3

Consistency –  Consistent only at the document level –  Most documents stores currently don’t

support mul&-‐document transac&ons –  Analyze your applica&on needs

Availability –  Each node stores ac&ve and replica data

(Couchbase) –  Each node is either a master or slave

(MongoDB)

30

Opera&ons –  Monitoring the system –  Backup and restore the system –  Upgrades and maintenance –  Support

App Server

App Server

Client

Other considera&ons 3

Ease of Scaling –  Ease of adding and reducing capacity –  Single node type –  App availability on topology changes

Indexing and Querying –  Secondary indexes (Map func&ons) –  Aggregates Grouping (Reduce func&ons) –  Basic querying

31

Begin development

4

Data Modeling and Document Design

32

Deploying to staging and produc&on

5

•  Monitoring the system •  RESTful interfaces / Easy integra&on with monitoring

tools

•  High-‐availability •  Replica&on •  Failover and Auto-‐failover

•  Always Online – even for maintenance tasks •  Database upgrades •  Sojware (OS) and Hardware upgrades •  Backup and restore •  Index building •  Compac&on

33

Couchbase Server Admin Console

34

35

Q

Q

So are you being impacted by these?

Schema Rigidity problems •  Do you store serialized objects in the database? •  Do you have lots of sparse tables with very few columns being used by most rows?

•  Do you find that your applica&on developers require schema changes frequently due to constantly changing data?

•  Are you using your database as a key-‐value store?

Scalability problems •  Do you periodically need to upgrade systems to more powerful servers and scale up?

•  Are you reaching the read / write throughput limit of a single database server?

•  Is your server’s read / write latency not mee&ng your SLA? •  Is your user base growing at a frightening pace?

36

Is NoSQL the right choice for you?

Does your applica&on need rich database func&onality?

•  Mul&-‐document transac&ons •  Complex security needs – user roles, document level security, authen&ca&on, authoriza&on integra&on

•  Complex joins across bucket / collec&ons •  BI integra&on •  Extreme compression needs

NoSQL may not be the right choice for your applica&on

37

WHERE IS NOSQL A GOOD FIT?

38

Performance driven use cases

•  Low latency •  High throughput ma]ers •  Large number of users •  Unknown demand with sudden growth of users/data

•  Predominantly direct document access •  Workloads with very high muta&on rate per document (temporal locality) Working set with heavy writes

39

Data driven use cases

•  Support for unlimited data growth •  Data with non-‐homogenous structure •  Need to quickly and ojen change data structure •  3rd party or user defined structure •  Variable length documents •  Sparse data records •  Hierarchical data

40

BRIEF OVERVIEW COUCHBASE SERVER

41

2.0�

NoSQL Distributed Document Database for interac&ve web applica&ons

Couchbase Server

42

Easy Scalability

Consistent, High Performance

Always On 24x7x365

Grow cluster without applica&on changes, without down&me with a single click

Consistent sub-‐millisecond read and write response &mes with consistent high throughput

No down&me for sovware upgrades, hardware maintenance, etc.

Couchbase Server

43

Flexible Data Model

•  No need to worry about the database when changing your applica&on

•  Records can have different structures, there is no fixed schema

•  Allows painless data model changes for rapid applica&on development

{ “ID”: 1, “FIRST”: “Dip&”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }

JSON JSON

JSON JSON

44

COUCHBASE SERVER ARCHITECTURE

45

Couchbase Server 2.0 Architecture

Heartbeat

Process m

onito

r

Glob

al singleton supe

rviso

r

Confi

gura&o

n manager

on each node

Rebalance orchestrator

Nod

e he

alth m

onito

r

one per cluster

vBucket state and

replica&

on m

anager

htp RE

ST m

anagem

ent A

PI/W

eb UI

HTTP 8091

Erlang port mapper 4369

Distributed Erlang 21100 -‐ 21199

Erlang/OTP

storage interface

Couchbase EP Engine

11210 Memcapable 2.0

Moxi


Memcached

New Persistence Layer

8092 Query API

Que

ry Engine

Data Manager Cluster Manager

46

Couchbase Server 2.0 Architecture

Heartbeat

Process m

onito

r

Glob

al singleton supe

rviso

r

Confi

gura&o

n manager

on each node

Rebalance orchestrator

Nod

e he

alth m

onito

r

one per cluster

vBucket state and

replica&

on m

anager

htp RE

ST m

anagem

ent A

PI/W

eb UI

HTTP 8091

Erlang port mapper 4369

Distributed Erlang 21100 -‐ 21199

Erlang/OTP

storage interface

Couchbase EP Engine


Moxi


Memcached

New Persistence Layer

8092 Query API

Que

ry Engine

47

Couchbase deployment

Data Flow

Cluster Management

Web Applica&on

Couchbase Client Library

48

3 3 2

Single node -‐ Couchbase Write Opera&on 2

Managed Cache

Disk Que

ue

Disk

Replica&on Queue

App Server

Couchbase Server Node

Doc 1 Doc 1

Doc 1

To other node

49

3 3 2

Single node -‐ Couchbase Update Opera&on 2

Managed Cache

Disk Que

ue

Replica&on Queue

App Server


Doc 1’

Doc 1

Doc 1’ Doc 1

Doc 1’

Disk

To other node

50

GET

Doc 1

3 3 2

Single node -‐ Couchbase Read Opera&on 2

Disk Que

ue

Replica&on Queue

App Server


Doc 1

Doc 1 Doc 1

Managed Cache

Disk

To other node

51

3 3 2

Single node -‐ Couchbase Cache Evic&on 2

Disk Que

ue

Replica&on Queue

App Server


Doc 1

Doc 6 Doc 5 Doc 4 Doc 3 Doc 2

Doc 1


Managed Cache

Disk

To other node

52

3 3 2

Single node – Couchbase Cache Miss 2

Disk Que

ue

Replica&on Queue

App Server


Doc 1

Doc 3 Doc 5 Doc 2 Doc 4


Doc 4

GET

Doc 1

Doc 1

Doc 1

Managed Cache

Disk

To other node

53

COUCHBASE SERVER CLUSTER

Cluster wide -‐ Basic Opera&on

•  Docs distributed evenly across servers

•  Each server stores both ac&ve and replica docs Only one server ac&ve at a &me

•  Client library provides app with simple interface to database

•  Cluster map provides map to which server doc is on App never needs to know

•  App reads, writes, updates docs

•  Mul&ple app servers can access same document at same &me

User Configured Replica Count = 1

READ/WRITE/UPDATE

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 1 ACTIVE

Doc 4

Doc 7

Doc

Doc

Doc

SERVER 2

Doc 8

ACTIVE

Doc 1

Doc 2

Doc

Doc

Doc

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

REPLICA

Doc 6

Doc 3

Doc 2

Doc

Doc

Doc

REPLICA

Doc 7

Doc 9

Doc 5

Doc

Doc

Doc

SERVER 3

Doc 6

APP SERVER 1

COUCHBASE Client Library CLUSTER MAP


APP SERVER 2

Doc 9

54

Cluster wide -‐ Add Nodes to Cluster

•  Two servers added One-‐click opera&on

•  Docs automa&cally rebalanced across cluster Even distribu&on of docs Minimum doc movement

•  Cluster map updated

•  App database calls now distributed over larger number of servers

REPLICA

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc 4

Doc 1

Doc

Doc

SERVER 1

REPLICA

ACTIVE

Doc 4

Doc 7

Doc

Doc

Doc 6

Doc 3

Doc

Doc

SERVER 2

REPLICA

ACTIVE

Doc 1

Doc 2

Doc

Doc

Doc 7

Doc 9

Doc

Doc

SERVER 3

SERVER 4

SERVER 5

REPLICA

ACTIVE

REPLICA

ACTIVE

Doc

Doc 8 Doc

Doc 9 Doc

Doc 2 Doc

Doc 8 Doc

Doc 5 Doc

Doc 6

READ/WRITE/UPDATE READ/WRITE/UPDATE

APP SERVER 1



APP SERVER 2



55

Cluster wide -‐ Fail Over Node

REPLICA

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc 4

Doc 1

Doc

Doc

SERVER 1

REPLICA

ACTIVE

Doc 4

Doc 7

Doc

Doc

Doc 6

Doc 3

Doc

Doc

SERVER 2

REPLICA

ACTIVE

Doc 1

Doc 2

Doc

Doc

Doc 7

Doc 9

Doc

Doc

SERVER 3

SERVER 4

SERVER 5

REPLICA

ACTIVE

REPLICA

ACTIVE

Doc 9

Doc 8

Doc Doc 6 Doc

Doc

Doc 5 Doc

Doc 2

Doc 8 Doc

Doc

•  App servers accessing docs

•  Requests to Server 3 fail

•  Cluster detects server failed Promotes replicas of docs to ac&ve Updates cluster map

•  Requests for docs now go to appropriate server

•  Typically rebalance would follow

Doc

Doc 1 Doc 3

APP SERVER 1



APP SERVER 2



56


Indexing and Querying


ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 1

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

APP SERVER 1



APP SERVER 2

Doc 9

•  Indexing work is distributed amongst nodes

•  Large data set possible

•  Parallelize the effort

•  Each node has index for data stored on it

•  Queries combine the results from required nodes

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 2

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

Doc 9

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 3

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

Doc 9

Query

57

Cross Data Center Replica&on (XDCR)

COUCHBASE SERVER CLUSTER NY DATA CENTER

ACTIVE

Doc

Doc 2

SERVER 1

Doc 9

SERVER 2

SERVER 3

RAM

Doc Doc Doc

ACTIVE

Doc

Doc

Doc RAM

ACTIVE

Doc

Doc

Doc RAM

DISK

Doc Doc Doc

DISK

Doc Doc Doc

DISK

COUCHBASE SERVER CLUSTER SF DATA CENTER

ACTIVE

Doc

Doc 2

SERVER 1

Doc 9

SERVER 2

SERVER 3

RAM

Doc Doc Doc

ACTIVE

Doc

Doc

Doc RAM

ACTIVE

Doc

Doc

Doc RAM

DISK

Doc Doc Doc

DISK

Doc Doc Doc

DISK

58

THANK YOU

[email protected] @DBORKAR

59

60

Navigating the Transition from relational to NoSQL - CloudCon Expo 2012

Education