Couchbase Overview Aaron Benton Monterey Bay Information Technologists Meetup 02.15.17
Feb 21, 2017
Couchbase OverviewAaron Benton
Monterey Bay Information Technologists Meetup 02.15.17
Agenda
What is NoSQL?What is Couchbase?Couchbase Architecture, SDK, QueriesCouchbase MobileOther VendorsWho's using Couchbase?Demo (if time permits)
What is NoSQL?
What is NoSQL?
Non-RelationalCluster FriendlyGenerally Open-Source21st CenturySchema-Less
Scaling
Scale Vertically (RDBMS)Add resources to a single
node in a systemEnhance the server
(more CPU, more RAM, etc)
High availability, difficult to implement
Scale Horizontally (NoSQL)Add more nodes to a
systemMore servers,
distributing loadHigh Availability, easy to
implement
Container FriendlyCattle Pets
Any type of dataFlexibleApplication Managed
Change is easy
Known ModelsFixed FieldsData TypesDatabase ManagedChange can be difficult
Schemas
Relational
Non-Relational
Types of NoSQL Databases
Key-ValueRedisRiakMemcached
DocumentCouchbaseCouchDBMongoDB
Column-FamilyCassandraBaseBigTable
GraphNeo4JGiraphOrientDB
In Development…
Objects are assembled as a whole:CartOrderProductProfile
Saving these Objects requires:DeconstructingMultiple RowsMultiple Tables
Impedance Mismatch"The object-relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being used by a program written in an object-oriented programming language or style, particularly when objects or class definitions are mapped in a straightforward way to database tables or relational schema." - Wikipedia
Relational ModelsDatabase
// order (structure / dictionary / object / map /etc)order['order_id'] = 3492843;order['order_date'] = "2016-07-14T18:27:22.586Z";order['products'] = [{
'product_id' = 78323, 'quantity' = 2, 'price' = 39.99,'sub_total' = 79.98,
}];order['user_id'] = 123;order['billing_address_1'] = "1302 Pleasant Ridge Rd";order['billing_address_2'] = "";order['billing_city'] = "Greensboro";order['billing_region_code'] = "NC"; order['billing_postal_code'] = "27409";order['billing_country_code'] = "US";order['shipping_address_1'] = "1302 Pleasant Ridge Rd";order['shipping_address_2'] = "";order['shipping_city'] = "Greensboro";order['shipping_region_code'] = "NC";order['shipping_postal_code'] = "27409";order['shipping_country_code'] = "US";order['card_number'] = "3337151609084503";order['expiration_month'] = 11;order['expiration_year'] = 2019;
Code
Non-Relational Models
// order (structure / dictionary / object / map /etc)order['order_id'] = 3492843;order['order_date'] = "2016-07-14T18:27:22.586Z";order['products'] = [{
'product_id' = 78323, 'quantity' = 2, 'price' = 39.99,'sub_total' = 79.98,
}];order['user_id'] = 123;order['billing_address_1'] = "1302 Pleasant Ridge Rd";order['billing_address_2'] = "";order['billing_city'] = "Greensboro";order['billing_region_code'] = "NC"; order['billing_postal_code'] = "27409";order['billing_country_code'] = "US";order['shipping_address_1'] = "1302 Pleasant Ridge Rd";order['shipping_address_2'] = "";order['shipping_city'] = "Greensboro";order['shipping_region_code'] = "NC";order['shipping_postal_code'] = "27409";order['shipping_country_code'] = "US";order['card_number'] = "3337151609084503";order['expiration_month'] = 11;order['expiration_year'] = 2019;
Database (JSON)Code{ "order_id": 3492843, "order_date": "2016-07-14T18:27:22.586Z", "products": [{
'product_id': 78323, 'quantity': 2, 'price': 39.99,'sub_total': 79.98,
}]; "user_id": 123, "billing_address_1": "1302 Pleasant Ridge Rd", "billing_address_2": "", "billing_city": "Greensboro", "billing_region_code": "NC", "billing_postal_code": "27409", "billing_country_code": "US", "shipping_address_1": "1302 Pleasant Ridge Rd", "shipping_address_2": "", "shipping_city": "Greensboro", "shipping_region_code": "NC", "shippping_postal_code": "27409", "shippping_country_code": "US", "card_number": "3337151609084503", "expiration_month": 11, "expiration_year": 2019}
BASEACID
Transaction Processing
AtomicityConsistencyIsolationDurability
Basically Available
Soft-StateEventual Consistency
Relational
Non-Relational
CAP Theorem
Only NoSQL?
Is SQL going away?NoThese databases along with
NoSQL are tools to solve problems
What is Couchbase?
History
High availability
cache
Key-value store
Document
database
Embedded database
Sync management
Couchbase Server
Couchbase Lite
CouchbaseSync Gateway
Data management for a broad range of use cases
Couchbase Tenants
Flexible data model
Consistent performance at scale
High availability
Easy, affordable scalability
24x365
Storing Data
Buckets
Couchbase Connectors
ArchitectureCouchbase Node
Couchbase Server NodeSingle-node type means easier administration and scaling Single installation Two major
components/processes: Data manager cluster manager
Data manager:C/C++Layer consolidation of caching
and persistence Cluster manager:
Erlang/OTPAdministration UI’sOut-of-band for data requests
Couchbase Read OperationAPPLICATION SERVER
MANAGED CACHE
DISK
DISKQUEUE
REPLICATIONQUEUE
DOC 1
GETDOC 1
DOC 1
Single-node type means easier administration and scaling Reads out of cache are
extremely fast No other process/system to
communicate with Data connection is a TCP-
binary protocol
DOC 1
APPLICATION SERVER
MANAGED CACHE
DISK
DISKQUEUE
REPLICATIONQUEUE
Couchbase Write Operation
DOC 1
DOC 1DOC 1
Single-node type means easier administration and scaling Writes are async by default Application gets
acknowledgement when successfully in RAM and can trade-off waiting for replication or persistence per-write
Replication to 1, 2 or 3 other nodes
Replication is RAM-based so extremely fast
Off-node replication is primary level of High Availability
Disk written to as fast as possible – no waiting
Couchbase Cache EjectionAPPLICATION SERVER
MANAGED CACHE
DISK
DISKQUEUE
REPLICATIONQUEUE
DOC 1
DOC 2DOC 3DOC 4DOC 5
DOC 1
DOC 2 DOC 3 DOC 4 DOC 5
Single-node type means easier administration and scaling Layer consolidation means
read through and write through cache
Couchbase automatically removes data that has already been persisted from RAM
APPLICATION SERVER
MANAGED CACHE
DISK
DISKQUEUE
REPLICATIONQUEUE
DOC 1
Couchbase Cache Miss
DOC 2 DOC 3 DOC 4 DOC 5
DOC 2 DOC 3 DOC 4 DOC 5
GETDOC 1
DOC 1
DOC 1
Single-node type means easier administration and scaling Layer consolidation means
1 single interface for App to talk to and get its data back as fast as possible
Separation of cache and disk allows for fastest access out of RAM while pulling data from disk in parallel
Cluster Overview
Scaling
ArchitectureCouchbase SDK
Documents are integral to the SDKs.All SDK’s support JSON formatIn addition: Serialized objects, Unquoted Strings,
Binary pass-throughA Document contains:
Couchbase SDK
22
Property Description
ID The bucket-unique identifierContent The value that is storedExpiry An expiration timeCAS Check-and-Set identifier
Couchbase SDKsOfficial SDKs
Java .NET Node.js Python
For each of these there is:Full Document supportInteroperabilityCommon yet idiomatic Programming Model
Others: ColdFusion, Erlang, Perl, TCL, Clojure, Scala
Also fully REST accessible
PHP C / C++ Go Ruby
JDBC and ODBC
Concurrency
22
Locking always happens at the document level and there are two types:
In a distributed database, optimistic locking is a much more neighborly approach.
Pessimistic: No other actor can write to that document until it
is released or a timeout is hit
Optimistic: Use CAS values to check if the document has changed
since you last touched it and the act accordingly
ArchitectureCouchbase Cluster: Node and SDK Interaction
Auto sharding – Bucket and vBuckets
vB
Data buckets
vB
1 ….. 1024
Virtual buckets
A bucket is a logical, unique key spaceMultiple buckets can exist within a single cluster of
nodes
Each bucket has active and replica data sets (1, 2 or 3 extra copies)
Each data set has 1024 Virtual Buckets (vBuckets)Each vBucket contains 1/1024th portion of the data setvBuckets do not have a fixed physical server location
Mapping between the vBuckets and physical servers is called the cluster map
Document IDs (keys) always get hashed to the same vbucket
Couchbase SDK’s lookup the vbucket -> server mapping
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
Basic Operation
SHARD5
SHARD2
SHARD9
SHARD SHARD SHARD
SHARD4
SHARD7
SHARD8
SHARD SHARD SHARD
SHARD1
SHARD3
SHARD6
SHARD SHARD SHARD
SHARD4
SHARD1
SHARD8
SHARD SHARD SHARD
SHARD6
SHARD3
SHARD2
SHARD SHARD SHARD
SHARD7
SHARD9
SHARD5
SHARD SHARD SHARD
Application has single logical connection to cluster (client
object) Data is automatically sharded resulting in
even document data distribution across cluster
Each vbucket replicated 1, 2 or 3 times (“peer-to-peer” replication)
Docs are automatically hashed by the client to a shard
Cluster map provides location of which server a shard is on
Every read/write/update/delete goes to same node for a given key
Strongly consistent data access (“read your own writes”)
A single Couchbase node can achieve 100k’s ops/sec so no need to scale reads
Cluster Map
Cluster Map
Cluster Map – 2 nodes added
Rebalance
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD5
SHARD2
SHARD SHARD
SHARD4
SHARD SHARD
SHARD1
SHARD3
SHARD SHARD
SHARD4
SHARD1
SHARD8
SHARD SHARD SHARD
SHARD6
SHARD3
SHARD2
SHARD SHARD SHARD
SHARD7
SHARD9
SHARD5
SHARD SHARD SHARD
SHARD7
SHARD
SHARD6
SHARD
SHARD8
SHARD9
SHARD
READ/WRITE/UPDATE
Application has single logical connection to cluster (client object) Multiple nodes added
or removed at once One-click operation Incremental
movement of active and replica vbuckets and data
Client library updated via cluster map
Fully online operation, no downtime or loss of performance
Fail Over Node
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD5
SHARD2
SHARD SHARD
SHARD4
SHARD SHARD
SHARD1
SHARD3
SHARD SHARD
SHARD4
SHARD1
SHARD8
SHARD SHARD
SHARDSHARD6
SHARD2
SHARD SHARD SHARD
SHARD7
SHARD9
SHARD5
SHARD SHARD
SHARD
SHARD7
SHARD
SHARD6
SHARDSHARD8
SHARD9
SHARD
SHARD3
SHARD1
SHARD3
SHARD
Application has single logical connection to cluster (client object) When node goes
down, some requests will fail
Failover is either automatic or manual
Client library is automatically updated via cluster map
Replicas not recreated to preserve stability
Best practice to replace node and rebalance
XDCR
Cross Datacenter Replication Replication to other
clusters Bi-Directional Uni-directional Filtered Replication
Querying
Map / Reduce ViewsIn Couchbase, Map-Reduce is specifically used to
create IndexesMap functions are applied to JSON documents and
their output or "emit" data is stored in an index
Querying
SELECT first_name, last_name, children FROM users looks like
SQL…WHERE EVERY child IN children SATISFIES child.age > 10 END
N1QL ExamplesINSERT INTO ecommerce ( KEY, VALUE )VALUES ("user_1021", { "user_id": 1021, "doc_type": "user", "first_name": "John", "last_name": "Smith", "email": "[email protected]"})
UPDATE ecommerceUSE KEYS "user_1021"SET email = "[email protected]", gender = "M", token = UUID()RETURNING token
UPSERT INTO ecommerce ( KEY, VALUE )VALUES ("user_1021", { "user_id": 1021, "doc_type": "user", "first_name": "John", "last_name": "Smith", "email": "[email protected]"})
DELETEFROM ecommerceUSE KEYS "user_1021"
N1QL Operators
Type Support Arithmetic + - * / % -val Collection ANY EVERY ARRAY FIRST EXISTS IN WITHIN Comparison = == != <> > >= < <=
(NOT) BETWEEN (NOT) LIKE IS (NOT) NULL IS (NOT) MISSING IS (NOT) VALUED
Conditional CASE expression WHEN value THEN expression
Construction Array [ value, value, ... ] Object { key:value, key:value, ... }
Logical AND OR NOT String ||
N1QL Operators ARRAY_AGG(EXP) ARRAY_AGG(DISTINCT EXP) AVG(EXP) AVG(DISTINCT EXP) COUNT(*) COUNT(EXP) COUNT(DISTINCT EXP) MAX(EXP) MIN(EXP) SUM(EXP) SUM(DISTINCT EXP)
Aggregate Functions
Object Functions OBJECT_LENGTH(EXP) OBJECT_NAMES(EXP) OBJECT_PAIRS(EXP) OBJECT_VALUES(EXP)
Conditionals - Unknowns
IFMISSING(EXP1, EXP2, …) IFMISSINGORNULL(EXP1, EXP2,
…) IFNULL(EXP1, EXP2, …) MISSINGIF(EXP1, EXP2) NULLIF(EXP1, EXP2)
Conditionals - Numbers
IFINF(EXP1, EXP2, …) IFNAN(EXP1, EXP2, …) IFNANORINF(EXP1, EXP2,
…) NANIF(EXP1, EXP2) NEGINFIF(EXP1, EXP2) POSINFIF(EXP1, EXP2)
Comparison Functions
GREATEST(EXP1, EXP2) LEAST(EXP1, EXP2)
Meta and UUID Functions
BASE64(EXP) BASE64_ENCODE(EXP) BASE64_DECODE(EXP) META(EXP) UUID()
Number Functions ABS(EXP) ACOS(EXP) ASIN(EXP) ATAN(EXP) ATAN2(EXP1, EXP2) CEIL(EXP) COS(EXP) DEGREES(EXP) E(EXP) EXP(EXP) LN(EXP) LOG(EXP) FLOOR(EXP) PI(EXP) POWER(EXP1, EXP2) RADIANS(EXP) RANDOM([ EXP ]) ROUND(EXP [, DIGITS]) SIGN(EXP) SIN(EXP) SQRT(EXP) TAN(EXP) TRUNC(EXP [, DIGITS])
Type Checking Functions
ISARRAY(EXP) ISATOM(EXP) ISBOOLEAN(EXP) ISNUMBER(EXP) ISOBJECT(EXP1, EXP2) ISSTRING(EXP) TYPE(EXP)
Type Conversion Functions
TOARRAY(EXP) TOATOM(EXP) TOBOOLEAN(EXP) TONUMBER(EXP) TOOBJECT(EXP) TOSTRING(EXP)
N1QL Operators ARRAY_APPEND(EXP, VAL) ARRAY_AVG(EXP) ARRAY_CONCAT(EXP1, EXP2) ARRAY_CONTAINS(EXP, VAL) ARRAY_COUNT(EXP) ARRAY_DISTINCT(EXP) ARRAY_IFNULL(EXP) ARRAY_LENGTH(EXP) ARRAY_MAX(EXP) ARRAY_MIN(EXP) ARRAY_POSITION(EXP, VAL) ARRAY_PREPEND(VAL, EXP) ARRAY_PUT(EXP, VAL) ARRAY_RANGE(START, END
[,STEP]) ARRAY_REMOVE(EXP, VAL) ARRAY_REPEAT(VAL, N) ARRAY_REPLACE(EXP, VAL1,
VAL2 [,N]) ARRAY_REVERSE(EXP) ARRAY_SORT(EXP) ARRAY_SUM(EXP)
Array Functions Date Functions CLOCK_MILLIS() CLOCK_STR ([FMT ]) DATE_ADD_MILLIS(EXP, N, PART) DATE_ADD_STR(EXP, N,PART) DATE_DIFF_MILLIS(EXP1, EXP2,
PART) DATE_DIFF_STR(EXP1, EXP2, PART) DATE_PART_MILLIS(EXP, PART) DATE_PART_STR(EXP, PART) DATE_TRUNC_MILLIS(EXP, PART) DATE_TRUNC_STR(EXP, PART) MILLIS(EXP) STR_TO_MILLIS(EXP) MILLIS_TO_STR(EXP [, FMT ]) MILLIS_TO_UTC(EXP [, FMT ]) MILLIS_TO_ZONE_NAME(EX,
TZ[,FMT]) NOW_MILLIS() NOW_STR([ FMT ]) STR_TO_MILLIS(EXP) MILLIS(EXP) STR_TO_UTC(EXP) STR_TO_ZONE_NAME(EXP,
TZ_NAME)
Number Functions CONTAINS(EXP, SUBSTRING) INITCAP(EXP ) TITLE(EXP) LENGTH(EXP) LOWER(EXP) LTRIM(EXP [,CHARACTERS ]) POSITION(EXP, SUBSTRING) REPEAT(EXP, N) REPLACE(EXP, SBSTR, REPL
[, N ]) RTRIM(EXP, [,CHARACTERS ]) SPLIT(EXP [, SEP ]) SUBSTR(EXP, POS[, LEN ]) TRIM(EXP [, CHARACTERS ]) UPPER(EXP)Pattern Matching Functions
REGEXP_CONTAINS(EXP, PATTERN) REGEXP_LIKE(EXP, PATTERN) REGEXP_POSITION(EXP, PATTERN) REGEXP_REPLACE(EXP, PTRN,
REPL [, N ])
JSON Functions DECODE_JSON(EXP) ENCODE_JSON(EXP) ENCODED_SIZE(EXP) POLY_LENGTH(EXP)
Missing AttributesQuery
Result
53
SELECT u.user_id, u.email, u.name FROM users AS uUSE KEYS "user_197"
{ "email": "[email protected]", "first_name": "Albin", "last_name": "Price", "user_id": 197, "username": "Eudora43"}
[ { "email": "[email protected]" "user_id": 197 }]
Sample Document
Missing AttributesQuery
Result
54
Sample DocumentSELECT u.user_id, u.email, IFMISSING( u.name, u.first_name || ' ' || u.last_name ) AS nameFROM users AS uUSE KEYS "user_197"
{ "email": "[email protected]", "first_name": "Albin", "last_name": "Price", "user_id": 197, "username": "Eudora43"}
[ { "email": "[email protected]", "name": "Albin Price", "user_id": 197 }]
JOINSQuery
Result
Airline Lookup Document
SELECT airlines.airline_id, airlines.airline_name, IFNULL( airlines.airline_iata, airlines.airline_icao ) AS airline_codeFROM `flight-data` AS codesUSE KEYS 'airline_code_DL'INNER JOIN `flight-data` AS airlinesON KEYS 'airline_' || TOSTRING( codes.id )LIMIT 1
[ { "airline_code": "DL", "airline_id": 2009, "airline_name": "Delta Air Lines" }]
{ "_id": "airline_code_DL", "code": "DL", "code_type": "iata", "designation": "airline", "doc_type": "code", "id": 2009}
{ "_id": "airline_2009", "active": true, "airline_iata": "DL", "airline_icao": "DAL", "airline_id": 2009, "airline_name": "Delta Air Lines", "callsign": "DELTA", "doc_type": "airline", "iso_country": "US"}
Airline Document
55
UNNEST
56
Query Sample Document
Result
{ "_id": "user_764", "doc_type": "user", "user_id": 764, "first_name": "Geovanny", "last_name": "Parker", "phones": [ { "type": "Mobile", "phone_number": "676.825.8926", "extension": null }, { "type": "Mobile", "phone_number": "792.877.3144", "extension": "3644" }, { "type": "Home", "phone_number": "(730) 490-6734", "extension": null } ]}
SELECT phone_numbers.*FROM users AS uUSE KEYS 'user_23'UNNEST u.phones AS phone_numbers
[ { "type": "Mobile", "phone_number": "676.825.8926", "extension": null }, { "type": "Mobile", "phone_number": "792.877.3144", "extension": "3644" }, { "type": "Home", "phone_number": "(730) 490-6734", "extension": null }]
NEST
57
Query User Document
Result
SELECT u.first_name, u.last_name, user_phonesFROM users AS uUSE KEYS 'user_581'INNER NEST users AS user_phones ON KEYS 'user_' || TOSTRING( u.user_id ) || '_phones'
[ { "first_name": "Geovanny", "last_name": "Parker", "user_phones": [ { "_id": "user_581_phones", "doc_type": "user-phones", "phones": [ { "extension": null, "phone_number": "872-201-8963", "phone_type": "Mobile" }, { "extension": "9324", "phone_number": "720.194.5604", "phone_type": "Other" } ], "user_id": 581 } ] }]
{ "_id": "user_581_phones", "doc_type": "user-phones", "user_id": 581, "phones": [ { "extension": null, "phone_number": "872-201-8963", "phone_type": "Mobile" }, { "extension": "9324", "phone_number": "720.194.5604", "phone_type": "Other" } ]}
{ "_id": "user_581", "doc_type": "user", "user_id": 581, "first_name": "Geovanny", "last_name": "Parker"}
User Phones Document
Mobile
Mobile Ecosystem
Mobile Ecosystem
Showdown
MongoDB
Replication: Master - SlavePrimaries and SecondariesNot all writes are localNeed for 3rd Party CacheNo Mobile Solution Complex TopologyNeeds constant monitoringDatabase + Collections + BSON
Cluster with 2 replicas / backups Distributed Load
Couchbase vs MongoDB
Who's Using Couchbase?
Companies
Profile Management @ Apple
Caching @ Walmart
Real Time Big Data @ PayPal
Digital Communication @ Viber
Digital Communication @ Viber
Field Service Application @ GE
Product Catalog @ Tesco
Product Catalog @ Tesco
Mobile Travel App @ Ryanair
Demo Time
Questions