1 Naviga&ng the Transi&on from Rela&onal to NoSQL Technology Dip& Borkar Director, Product Management
1
Naviga&ng the Transi&on from Rela&onal to NoSQL Technology
Dip& Borkar Director, Product Management
2
WHY TRANSITION TO NOSQL?
3
Two big drivers for NoSQL adop&on
Lack of flexibility/ rigid schemas
Inability to scale out data
Performance challenges
Cost All of these Other
49%
35%
29%
16% 12% 11%
Source: Couchbase Survey, December 2011, n = 1351.
4
NoSQL catalog
Key-‐Value
memcached
membase
redis
Data Structure Document Column Graph
mongoDB
couchbase cassandra
Cache
(mem
ory on
ly)
Database
(mem
ory/disk)
Neo4j
5
DISTRIBUTED DOCUMENT DATABASES
6
Document Databases
• Each record in the database is a self-‐describing document
• Each document has an independent structure
• Documents can be complex • All databases require a unique key • Documents are stored using JSON or XML or their deriva&ves
• Content can be indexed and queried • Offer auto-‐sharding for scaling and replica&on for high-‐availability
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
7
COMPARING DATA MODELS
8 h]p://www.geneontology.org/images/diag-‐godb-‐er.jpg
9
Rela&onal vs Document data model
Rela&onal data model Document data model Collec&on of complex documents with arbitrary, nested data formats and
varying “record” format.
Highly-‐structured table organiza&on with rigidly-‐defined data formats and
record structure.
JSON JSON
JSON
C1 C2 C3 C4
{ }
10
Example: User Profile
Address Info
1 DEN 30303 CO
2 MV 94040 CA
3 CHI 60609 IL
User Info
KEY First ZIP_id Last
4 NY 10010 NY
1 Dip& 2 Borkar
2 Joe
2 Smith
3 Ali 2 Dodson
4 John 3 Doe
ZIP_id CITY ZIP STATE
1 2
2 MV 94040 CA
To get informa&on about specific user, you perform a join across two tables
11
All data in a single document
Document Example: User Profile
{ “ID”: 1, “FIRST”: “Dip&”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }
JSON
= +
12
User ID First Last Zip
1 Dip& Borkar 94040
2 Joe Smith 94040
3 Ali Dodson 94040
4 Sarah Gorin NW1
5 Bob Young 30303
6 Nancy Baker 10010
7 Ray Jones 31311
8 Lee Chen V5V3M
• • •
50000 Doug Moore 04252
50001 Mary White SW195
50002 Lisa Clark 12425
Country ID
TEL3
001
Country ID Country name
001 USA
002 UK
003 Argen&na
004 Australia
005 Aruba
006 Austria
007 Brazil
008 Canada
009 Chile
• • •
130 Portugal
131 Romania
132 Russia
133 Spain
134 Sweden
User ID Photo ID Comment
2 d043 NYC
2 b054 Bday
5 c036 Miami
7 d072 Sunset
5002 e086 Spain
Photo Table
001
007
001
133
133
User ID Status ID Text
1 a42 At conf
4 b26 excited
5 c32 hockey
12 d83 Go A’s
5000 e34 sailing
Status Table
134
007
008
001
005
Country Table
User ID Affl ID Affl Name
2 a42 Cal
4 b96 USC
7 c14 UW
8 e22 Oxford
Affilia&ons Table Country
ID
001
001
001
002
Country ID
Country ID
001
001
002
001
001
001
008
001
002
001
User Table
. . .
Making a Change Using RDBMS
13
Making the Same Change with a Document Database
{ “ID”: 1, “FIRST”: “Dip&”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: { “TEXT”: “At Conf”
}
}
“GEO_LOC”: “134” }, “COUNTRY”: ”USA”
Just add informa&on to a document
JSON
, }
14
When considering how to model data for a given applica&on • Think of a logical container for the data • Think of how data groups together
Document modeling
Q • Are these separate object in the model layer? • Are these objects accessed together? • Do you need updates to these objects to be atomic? • Are mul&ple people edi&ng these objects concurrently?
15
Document Design Op&ons
• One document that contains all related data – Data is de-‐normalized – Be]er performance and scale – Eliminate client-‐side joins
• Separate documents for different object types with cross references – Data duplica&on is reduced – Objects may not be co-‐located – Transac&ons supported only on a document boundary – Most document databases do not support joins
16
Document ID / Key selec&on
• Similar to primary keys in rela&onal databases • Documents are sharded based on the document ID • ID based document lookup is extremely fast • Usually an ID can only appear once in a bucket
Op&ons • UUIDs, date-‐based IDs, numeric IDs • Hand-‐crajed (human readable) • Matching prefixes (for mul&ple related objects)
Q • Do you have a unique way of referencing objects? • Are related objects stored in separate documents?
17
• User profile The main pointer into the user data • Blog entries • Badge sekngs, like a twi]er badge
• Blog posts Contains the blogs themselves
• Blog comments • Comments from other users
Example: En&&es for a Blog BLOG
18
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
Blog Document – Op&on 1 – Single document
{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[ ! [“format”: “markdown”, “body”:”Awesome post!”],! [“format”: “markdown”, “body”:”Like it.” ]! ]!}
19
Blog Document – Op&on 2 -‐ Split into mul&ple docs
{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{ !“_id”: “jchris_Hello_World”,!“author”: “jchris”, !“type”: “post”!“title”: “Hello World”,!“format”: “markdown”, !“body”: “Hello from [Couchbase](http://couchbase.com).”, !“html”: “<p>Hello from <a href=\“http: …!“comments”:[!
! “comment1_jchris_Hello_world”!! ]!
}!{ “UUID”: “21f7f8de-‐8051-‐5b89-‐86“Time”: “2011-‐04-‐01T13:01:02.42“Server”: “A2223E”,“Calling Server”: “A2213W”,“Type”: “E100”,“Initiating User”: “[email protected]”,“Details”:
{“IP”: “10.1.1.22”,“API”: “InsertDVDQueueItem”,“Trace”: “cleansed”,“Tags”:
[“SERVER”, “US-‐West”, “API”]
}}
{!“_id”: “comment1_jchris_Hello_World”,!“format”: “markdown”, !“body”:”Awesome post!” !}
BLOG DOC
COMMENT
20
• You can imagine how to take this to a threaded list
Threaded Comments
Blog First comment
Reply to comment
More Comments
List
List
Advantages • Only fetch the data when you need it • For example, rendering part of a web page
• Spread the data and load across the en&re cluster
21
COMPARING SCALING MODEL
22
RDBMS Scales Up Get a bigger, more complex server
Users
Applica&on Scales Out Just add more commodity web servers
Users
System Cost Applica&on Performance
Rela&onal Technology Scales Up
Rela&onal Database
Web/App Server Tier
Expensive and disrup&ve sharding, doesn’t perform at web scale
System Cost Applica&on Performance
Won’t scale beyond this point
23
Couchbase Server Scales Out Like App Tier
NoSQL Database Scales Out Cost and performance mirrors app &er
Users
Scaling out flatens the cost and performance curves
Couchbase Distributed Data Store
Web/App Server Tier
Applica&on Scales Out Just add more commodity web servers
Users
System Cost Applica&on Performance
Applica&on Performance System Cost
24
EVALUATING NOSQL
25
The Process – From Evalua&on to Go Live
Analyze your requirements
Find solu&ons / products that match key requirements
Execute a proof of concept / performance evalua&on
Begin development of applica&on Deploy in staging and then produc&on
1
2
3
4
5
No different from evalua&ng a rela&onal database
New requirements è New solu&ons
26
Analyze your requirements
• Rapid applica&on development
– Changing market needs – Changing data needs
• Scalability – Unknown user demand – Constantly growing throughput
• Consistent Performance – Low response &me for be]er user experience – High throughput to handle viral growth
• Reliability – Always online
1
Common applica&on requirements
27
Find solu&ons that match key requirements
• Linear Scalability • Schema flexibility • High Performance
2
NoSQL
RDBMS
RDBMS NoSQL
• Mul&-‐document transac&ons • Database Rollback • Complex security needs • Complex joins • Extreme compression needs
• Both / depends on the data
28
Proof of concept / Performance evalua&on
3
Prototype a workload • Look for consistent performance…
– Low response &mes / latency • For be]er user experience
– High throughput • To handle viral growth • For resource efficiency
• … across – Read heavy / Write heavy / Mixed workloads – Clusters of growing sizes
• … and watch for – Conten&on / heavy locking – Linear scalability
29
Other considera&ons
Accessing data – No standards exist yet – Typically via SDKs or over HTTP – Check if the programing language of your
choice is supported.
App Server
App Server
App Server
3
Consistency – Consistent only at the document level – Most documents stores currently don’t
support mul&-‐document transac&ons – Analyze your applica&on needs
Availability – Each node stores ac&ve and replica data
(Couchbase) – Each node is either a master or slave
(MongoDB)
30
Opera&ons – Monitoring the system – Backup and restore the system – Upgrades and maintenance – Support
App Server
App Server
Client
Other considera&ons 3
Ease of Scaling – Ease of adding and reducing capacity – Single node type – App availability on topology changes
Indexing and Querying – Secondary indexes (Map func&ons) – Aggregates Grouping (Reduce func&ons) – Basic querying
31
Begin development
4
Data Modeling and Document Design
32
Deploying to staging and produc&on
5
• Monitoring the system • RESTful interfaces / Easy integra&on with monitoring
tools
• High-‐availability • Replica&on • Failover and Auto-‐failover
• Always Online – even for maintenance tasks • Database upgrades • Sojware (OS) and Hardware upgrades • Backup and restore • Index building • Compac&on
33
Couchbase Server Admin Console
34
35
Q
Q
So are you being impacted by these?
Schema Rigidity problems • Do you store serialized objects in the database? • Do you have lots of sparse tables with very few columns being used by most rows?
• Do you find that your applica&on developers require schema changes frequently due to constantly changing data?
• Are you using your database as a key-‐value store?
Scalability problems • Do you periodically need to upgrade systems to more powerful servers and scale up?
• Are you reaching the read / write throughput limit of a single database server?
• Is your server’s read / write latency not mee&ng your SLA? • Is your user base growing at a frightening pace?
36
Is NoSQL the right choice for you?
Does your applica&on need rich database func&onality?
• Mul&-‐document transac&ons • Complex security needs – user roles, document level security, authen&ca&on, authoriza&on integra&on
• Complex joins across bucket / collec&ons • BI integra&on • Extreme compression needs
NoSQL may not be the right choice for your applica&on
37
WHERE IS NOSQL A GOOD FIT?
38
Performance driven use cases
• Low latency • High throughput ma]ers • Large number of users • Unknown demand with sudden growth of users/data
• Predominantly direct document access • Workloads with very high muta&on rate per document (temporal locality) Working set with heavy writes
39
Data driven use cases
• Support for unlimited data growth • Data with non-‐homogenous structure • Need to quickly and ojen change data structure • 3rd party or user defined structure • Variable length documents • Sparse data records • Hierarchical data
40
BRIEF OVERVIEW COUCHBASE SERVER
41
2.0�
NoSQL Distributed Document Database for interac&ve web applica&ons
Couchbase Server
42
Easy Scalability
Consistent, High Performance
Always On 24x7x365
Grow cluster without applica&on changes, without down&me with a single click
Consistent sub-‐millisecond read and write response &mes with consistent high throughput
No down&me for sovware upgrades, hardware maintenance, etc.
Couchbase Server
43
Flexible Data Model
• No need to worry about the database when changing your applica&on
• Records can have different structures, there is no fixed schema
• Allows painless data model changes for rapid applica&on development
{ “ID”: 1, “FIRST”: “Dip&”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }
JSON JSON
JSON JSON
44
COUCHBASE SERVER ARCHITECTURE
45
Couchbase Server 2.0 Architecture
Heartbeat
Process m
onito
r
Glob
al singleton supe
rviso
r
Confi
gura&o
n manager
on each node
Rebalance orchestrator
Nod
e he
alth m
onito
r
one per cluster
vBucket state and
replica&
on m
anager
htp RE
ST m
anagem
ent A
PI/W
eb UI
HTTP 8091
Erlang port mapper 4369
Distributed Erlang 21100 -‐ 21199
Erlang/OTP
storage interface
Couchbase EP Engine
11210 Memcapable 2.0
Moxi
11211 Memcapable 1.0
Memcached
New Persistence Layer
8092 Query API
Que
ry Engine
Data Manager Cluster Manager
46
Couchbase Server 2.0 Architecture
Heartbeat
Process m
onito
r
Glob
al singleton supe
rviso
r
Confi
gura&o
n manager
on each node
Rebalance orchestrator
Nod
e he
alth m
onito
r
one per cluster
vBucket state and
replica&
on m
anager
htp RE
ST m
anagem
ent A
PI/W
eb UI
HTTP 8091
Erlang port mapper 4369
Distributed Erlang 21100 -‐ 21199
Erlang/OTP
storage interface
Couchbase EP Engine
11210 Memcapable 2.0
Moxi
11211 Memcapable 1.0
Memcached
New Persistence Layer
8092 Query API
Que
ry Engine
47
Couchbase deployment
Data Flow
Cluster Management
Web Applica&on
Couchbase Client Library
48
3 3 2
Single node -‐ Couchbase Write Opera&on 2
Managed Cache
Disk Que
ue
Disk
Replica&on Queue
App Server
Couchbase Server Node
Doc 1 Doc 1
Doc 1
To other node
49
3 3 2
Single node -‐ Couchbase Update Opera&on 2
Managed Cache
Disk Que
ue
Replica&on Queue
App Server
Couchbase Server Node
Doc 1’
Doc 1
Doc 1’ Doc 1
Doc 1’
Disk
To other node
50
GET
Doc 1
3 3 2
Single node -‐ Couchbase Read Opera&on 2
Disk Que
ue
Replica&on Queue
App Server
Couchbase Server Node
Doc 1
Doc 1 Doc 1
Managed Cache
Disk
To other node
51
3 3 2
Single node -‐ Couchbase Cache Evic&on 2
Disk Que
ue
Replica&on Queue
App Server
Couchbase Server Node
Doc 1
Doc 6 Doc 5 Doc 4 Doc 3 Doc 2
Doc 1
Doc 6 Doc 5 Doc 4 Doc 3 Doc 2
Managed Cache
Disk
To other node
52
3 3 2
Single node – Couchbase Cache Miss 2
Disk Que
ue
Replica&on Queue
App Server
Couchbase Server Node
Doc 1
Doc 3 Doc 5 Doc 2 Doc 4
Doc 6 Doc 5 Doc 4 Doc 3 Doc 2
Doc 4
GET
Doc 1
Doc 1
Doc 1
Managed Cache
Disk
To other node
53
COUCHBASE SERVER CLUSTER
Cluster wide -‐ Basic Opera&on
• Docs distributed evenly across servers
• Each server stores both ac&ve and replica docs Only one server ac&ve at a &me
• Client library provides app with simple interface to database
• Cluster map provides map to which server doc is on App never needs to know
• App reads, writes, updates docs
• Mul&ple app servers can access same document at same &me
User Configured Replica Count = 1
READ/WRITE/UPDATE
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 1 ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc
SERVER 2
Doc 8
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
REPLICA
Doc 6
Doc 3
Doc 2
Doc
Doc
Doc
REPLICA
Doc 7
Doc 9
Doc 5
Doc
Doc
Doc
SERVER 3
Doc 6
APP SERVER 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
APP SERVER 2
Doc 9
54
Cluster wide -‐ Add Nodes to Cluster
• Two servers added One-‐click opera&on
• Docs automa&cally rebalanced across cluster Even distribu&on of docs Minimum doc movement
• Cluster map updated
• App database calls now distributed over larger number of servers
REPLICA
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc 4
Doc 1
Doc
Doc
SERVER 1
REPLICA
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc 6
Doc 3
Doc
Doc
SERVER 2
REPLICA
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc 7
Doc 9
Doc
Doc
SERVER 3
SERVER 4
SERVER 5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc
Doc 8 Doc
Doc 9 Doc
Doc 2 Doc
Doc 8 Doc
Doc 5 Doc
Doc 6
READ/WRITE/UPDATE READ/WRITE/UPDATE
APP SERVER 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
APP SERVER 2
COUCHBASE SERVER CLUSTER
User Configured Replica Count = 1
55
Cluster wide -‐ Fail Over Node
REPLICA
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc 4
Doc 1
Doc
Doc
SERVER 1
REPLICA
ACTIVE
Doc 4
Doc 7
Doc
Doc
Doc 6
Doc 3
Doc
Doc
SERVER 2
REPLICA
ACTIVE
Doc 1
Doc 2
Doc
Doc
Doc 7
Doc 9
Doc
Doc
SERVER 3
SERVER 4
SERVER 5
REPLICA
ACTIVE
REPLICA
ACTIVE
Doc 9
Doc 8
Doc Doc 6 Doc
Doc
Doc 5 Doc
Doc 2
Doc 8 Doc
Doc
• App servers accessing docs
• Requests to Server 3 fail
• Cluster detects server failed Promotes replicas of docs to ac&ve Updates cluster map
• Requests for docs now go to appropriate server
• Typically rebalance would follow
Doc
Doc 1 Doc 3
APP SERVER 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
APP SERVER 2
User Configured Replica Count = 1
COUCHBASE SERVER CLUSTER
56
COUCHBASE SERVER CLUSTER
Indexing and Querying
User Configured Replica Count = 1
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 1
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
APP SERVER 1
COUCHBASE Client Library CLUSTER MAP
COUCHBASE Client Library CLUSTER MAP
APP SERVER 2
Doc 9
• Indexing work is distributed amongst nodes
• Large data set possible
• Parallelize the effort
• Each node has index for data stored on it
• Queries combine the results from required nodes
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 2
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
Doc 9
ACTIVE
Doc 5
Doc 2
Doc
Doc
Doc
SERVER 3
REPLICA
Doc 4
Doc 1
Doc 8
Doc
Doc
Doc
Doc 9
Query
57
Cross Data Center Replica&on (XDCR)
COUCHBASE SERVER CLUSTER NY DATA CENTER
ACTIVE
Doc
Doc 2
SERVER 1
Doc 9
SERVER 2
SERVER 3
RAM
Doc Doc Doc
ACTIVE
Doc
Doc
Doc RAM
ACTIVE
Doc
Doc
Doc RAM
DISK
Doc Doc Doc
DISK
Doc Doc Doc
DISK
COUCHBASE SERVER CLUSTER SF DATA CENTER
ACTIVE
Doc
Doc 2
SERVER 1
Doc 9
SERVER 2
SERVER 3
RAM
Doc Doc Doc
ACTIVE
Doc
Doc
Doc RAM
ACTIVE
Doc
Doc
Doc RAM
DISK
Doc Doc Doc
DISK
Doc Doc Doc
DISK
59
60