Top Banner
AN INTRODUCTION TO COUCHDB
52

An Introduction to CouchDB (IPC11SE 2011-06-01)

May 11, 2015

Download

Technology

David Zuelke

Presentation given at International PHP Conference Spring Edition 2011.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Introduction to CouchDB (IPC11SE 2011-06-01)

AN INTRODUCTION TO COUCHDB

Page 2: An Introduction to CouchDB (IPC11SE 2011-06-01)

David Zülke

Page 3: An Introduction to CouchDB (IPC11SE 2011-06-01)

David Zuelke

Page 4: An Introduction to CouchDB (IPC11SE 2011-06-01)

http://en.wikipedia.org/wiki/File:München_Panorama.JPG

Page 5: An Introduction to CouchDB (IPC11SE 2011-06-01)

Founder

Page 7: An Introduction to CouchDB (IPC11SE 2011-06-01)

Lead Developer

Page 10: An Introduction to CouchDB (IPC11SE 2011-06-01)

COUCHDB IN THREE SLIDESFull Of DIS IS SRS BSNS Bullet Points

Page 11: An Introduction to CouchDB (IPC11SE 2011-06-01)

COUCHDB STORES DOCUMENTS

• CouchDB stores documents with arbitrary keys and values

• Each document is identified by an ID and has a revision

•Documents can have file attachments

• Stored as JSON, so it’s easy to interface with

Page 12: An Introduction to CouchDB (IPC11SE 2011-06-01)

COUCHDB SPEAKS HTTP

• CouchDB uses HTTP to communicate with clients & servers

• That means scalability

• That means a lot of kick ass stuff totally for free

• Caching

• Load Balancing

• Content Negotiation

Page 13: An Introduction to CouchDB (IPC11SE 2011-06-01)

COUCHDB USES MVCC

•Multiversion Concurrency Control

•When updating, you must supply a revision number

• Your change will be rejected if the revision is not the latest

• All writes are serialized

•No need for locks, but puts some responsibility on developers

Page 14: An Introduction to CouchDB (IPC11SE 2011-06-01)

THE DETAILSAn In-Depth Look At What Makes CouchDB Different

Page 15: An Introduction to CouchDB (IPC11SE 2011-06-01)

CAP

consistency

availability

partition toleranceX

Do you know the

theorem?

Page 16: An Introduction to CouchDB (IPC11SE 2011-06-01)

“So, CouchDB does not have consistency of CAP?”

Page 17: An Introduction to CouchDB (IPC11SE 2011-06-01)

“Booh, that means my data will be inconsistent. Fail!”

Page 18: An Introduction to CouchDB (IPC11SE 2011-06-01)

psssshhh

Page 19: An Introduction to CouchDB (IPC11SE 2011-06-01)

YOUR MOM IS INCONSISTENT

Page 20: An Introduction to CouchDB (IPC11SE 2011-06-01)

CouchDB is eventually consistent

Page 21: An Introduction to CouchDB (IPC11SE 2011-06-01)

When replicating, conflicting revisions will be marked as such

Page 22: An Introduction to CouchDB (IPC11SE 2011-06-01)

These conflicts can then be resolved (users, daemons,...)

Page 23: An Introduction to CouchDB (IPC11SE 2011-06-01)

and everything will be fine\o/

Page 24: An Introduction to CouchDB (IPC11SE 2011-06-01)

which brings us to...

Page 25: An Introduction to CouchDB (IPC11SE 2011-06-01)

REPLICATION

• You can do Master-Master replication

• Conflicts are detected and marked automatically

• Conflicts are supposed to be resolved by applications

•Or by users, who usually know best what to do!

Page 26: An Introduction to CouchDB (IPC11SE 2011-06-01)

CouchDB is Ground Computing

Page 27: An Introduction to CouchDB (IPC11SE 2011-06-01)

Imagine a world where every computer runs CouchDB

Page 28: An Introduction to CouchDB (IPC11SE 2011-06-01)

Ubuntu One already does, to sync bookmarks etc!

Page 29: An Introduction to CouchDB (IPC11SE 2011-06-01)

MAP/REDUCE

Page 30: An Introduction to CouchDB (IPC11SE 2011-06-01)

BASIC PRINCIPLE: MAPPER

• The Mapper reads records and emits <key, value> pairs

• Example: Apache access.log

• Each line is a record

• Extract client IP address and number of bytes transferred

• Emit IP address as key, number of bytes as value

• For hourly rotating logs, the job can be split across 24 nodes*

* In pratice, it’s a lot smarter than that

Page 31: An Introduction to CouchDB (IPC11SE 2011-06-01)

BASIC PRINCIPLE: REDUCER

• A Reducer is given a key and all values for this specific key

• Even if there are many Mappers on many computers; the results are aggregated before they are handed to Reducers

• Example: Apache access.log

• The Reducer is called once for each client IP (that’s our key), with a list of values (transferred bytes)

•We simply sum up the bytes to get the total traffic per IP!

Page 32: An Introduction to CouchDB (IPC11SE 2011-06-01)

EXAMPLE OF MAPPED INPUT

IP Bytes

212.122.174.13 18271

212.122.174.13 191726

212.122.174.13 198

74.119.8.111 91272

74.119.8.111 8371

212.122.174.13 43

Page 33: An Introduction to CouchDB (IPC11SE 2011-06-01)

REDUCER WILL RECEIVE THIS

IP Bytes

212.122.174.13

18271

212.122.174.13191726

212.122.174.13198

212.122.174.13

43

74.119.8.11191272

74.119.8.1118371

Page 34: An Introduction to CouchDB (IPC11SE 2011-06-01)

AFTER REDUCTION

IP Bytes

212.122.174.13 210238

74.119.8.111 99643

Page 35: An Introduction to CouchDB (IPC11SE 2011-06-01)

COUCHDB INCREMENTAL MAPREDUCE

Page 36: An Introduction to CouchDB (IPC11SE 2011-06-01)

THE KEY DIFFERENCE

•Maps and Reduces are incremental:

• If one document changes, only that one document needs:

•mapping

• reduction

• Then a few new reduce runs are performed to compute the final result

Page 37: An Introduction to CouchDB (IPC11SE 2011-06-01)

MAPPER: DOCS BY TAGS

function(doc)  {    if(doc.type  ==  'product')  {        (doc.tags  ||  []).forEach(function(tag)  {            emit(tag,  doc);        });    }}

Page 38: An Introduction to CouchDB (IPC11SE 2011-06-01)

MAPREDUCE: COUNT TAGS

function(doc)  {    if(doc.type  ==  'product')  {        (doc.tags  ||  []).forEach(function(tag)  {            emit(tag,  1);        });    }}

function(key,  values)  {    return  sum(values);}

_sum

built-in CouchDB function, very efficient

Page 39: An Introduction to CouchDB (IPC11SE 2011-06-01)

BUT WAIT!There are no tables :(

Page 40: An Introduction to CouchDB (IPC11SE 2011-06-01)

so... how do you join data from related documents?

Page 41: An Introduction to CouchDB (IPC11SE 2011-06-01)

JOIN PRODUCTS WITH THEIR CATEGORIES

function(doc)  {    if(doc.type  ==  'product')  {        emit([doc._id,  0],  doc);        emit([doc._id,  1],  {  _id:  doc.category_id  });    }}

["123",  0]            {_id:  "123",  _rev:  "5-­‐a72",  type:  "product",  "name":  "Laser  Beam"}["123",  1]            {_id:  "est",  _rev:  "2-­‐9af",  type:  "category",  "name":  "Evil  Stuff"}

["817",  0]            {_id:  "817",  _rev:  "2-­‐aa8",  type:  "product",  "name":  "Rocketship"}["817",  1]            {_id:  "cst",  _rev:  "3-­‐d8a",  type:  "category",  "name":  "Cool  Stuff"}

["441",  0]            {_id:  "441",  _rev:  "19-­‐fdf",  type:  "product",  "name":  "Sharks"}["441",  1]            {_id:  "est",  _rev:  "2-­‐9af",  type:  "category",  "name":  "Evil  Stuff"}

Page 42: An Introduction to CouchDB (IPC11SE 2011-06-01)

JOIN CATEGORIES WITH ALL THEIR PRODUCTS

function(doc)  {    if(doc.type  ==  'category')  {        emit([doc._id,  0],  doc);    }  elseif(doc.type  ==  'product')  {        emit([doc.category_id,  doc._id],  doc);    }}

["est",  0]            {_id:  "est",  _rev:  "2-­‐9af",  type:  "category",  "name":  "Evil  Stuff"}["est",  "123"]    {_id:  "123",  _rev:  "5-­‐a72",  type:  "product",  "name":  "Laser  Beam"}["est",  "441"]    {_id:  "441",  _rev:  "19-­‐fdf",  type:  "product",  "name":  "Sharks"}

["cst",  0]            {_id:  "cst",  _rev:  "3-­‐d8a",  type:  "category",  "name":  "Cool  Stuff"}["cst",  "817"]    {_id:  "817",  _rev:  "2-­‐aa8",  type:  "product",  "name":  "Rocketship"}

Page 43: An Introduction to CouchDB (IPC11SE 2011-06-01)

BUT... BUT... WAIT!How to guarantee a document's structure if it’s all schema-less?

Page 44: An Introduction to CouchDB (IPC11SE 2011-06-01)

VALIDATE DOCUMENTS

function  (newDoc,  savedDoc,  userCtx)  {

   if(savedDoc  &&  savedDoc.created_at  !=  newDoc.created_at)  {        throw({forbidden:  'created_at  is  immutable'});    }

   if(doc.type  ==  'product')  {        if(!doc.price)  {            throw({forbidden:  'product  must  have  a  price'});        }    }

}

Page 45: An Introduction to CouchDB (IPC11SE 2011-06-01)

VALIDATE DOCUMENTS

function  (newDoc,  savedDoc,  userCtx)  {

   function  require(beTrue,  message)  {        if(!beTrue)  throw({forbidden:  message});    }

   require(savedDoc  &&  savedDoc.created_at  !=  newDoc.created_at,        'created_at  is  immutable'    );

   if(doc.type  ==  'product')  {        require(!doc.price,            'product  must  have  a  price'        );    }

}

Page 46: An Introduction to CouchDB (IPC11SE 2011-06-01)

LUCENE INTEGRATIONFull Control Over What Is Indexed, And How

Page 47: An Introduction to CouchDB (IPC11SE 2011-06-01)

COUCHAPPPython Tool For Development And Deployment

Page 48: An Introduction to CouchDB (IPC11SE 2011-06-01)

DEMO TIMELet’s Relax On The Couch

Page 49: An Introduction to CouchDB (IPC11SE 2011-06-01)

!e End

Page 50: An Introduction to CouchDB (IPC11SE 2011-06-01)

FURTHER READING

• http://guide.couchdb.org/

• http://couchdb.apache.org/

• http://github.com/couchapp/couchapp

• http://github.com/rnewson/couchdb-lucene/

• http://www.couchbase.com/downloads/

• http://j.mp/oqbQs (E4X in CouchDB for XML parsing)

Page 51: An Introduction to CouchDB (IPC11SE 2011-06-01)

Questions?

Page 52: An Introduction to CouchDB (IPC11SE 2011-06-01)

THANK YOU!This was

http://joind.in/3521by

@dzuelke