Top Banner
NOSQL for Dummies Tobias Ivarsson Hacker @ Neo Technology twitter: @thobe / #neo4j email: [email protected] web: http://www.neo4j.org / web: http://www.thobe.org /
53

NOSQL for Dummies

Sep 08, 2014

Download

Technology

NOSQL introduction/overview session presented at Miracle Open World 2010, at Hotel Legoland in Denmark.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 2: NOSQL for Dummies

4Image credit: http://browsertoolkit.com/fault-tolerance.png

This is still the view a lot of people have of NOSQL.

Page 3: NOSQL for Dummies

NOSQL - Defined by what it is Not

5

๏“Any database that is not a Relational Database”

๏The term was coined at a meetup with the creators behind some prominent emerging databases

๏“Non-Relational Databases” might be more correct- But it’s a mouthful!

๏ ... then there was a conference ...

๏ ... and a mailing list ...

๏ ... the name caught on ...

๏ ... then there were more conferences ...

๏ ... and here we are!

Page 4: NOSQL for Dummies

6

NOSQLWhat’s in the name...

Page 5: NOSQL for Dummies

7

NO to SQLIt’s not about saying that SQL should never be used, or that SQL is dead...

Page 6: NOSQL for Dummies

8

Not Only SQLIt’s about recognizing that for some problems other storage solutions are better suited!

Page 7: NOSQL for Dummies

9

Four trends

NOSQL - Why now?

Page 8: NOSQL for Dummies

2006 2007 2008 2009 2010

0

250

500

750

1000

161253

397

623

988

ExaBytes (10¹⁸) of data stored per year

10

Trend 1: Data size

Data source: IDC 2007

Each year more and more digital data is created. Over two years we create more digital data than all the data created in history before that.

Page 9: NOSQL for Dummies

Trend 2: Connectedness

11

Text documents

1990

Info

rmat

ion

conn

ectiv

ity

FolksonomiesTagging

User-generated content

Wikis

RSS

Blogs

Hypertext

2000 2010 2020web 1.0 web 2.0 “web 3.0”

Ontologies

RDF

GiantGlobal

Graph (GGG)

Over time data has evolved to be more and more interlinked and connected.Hypertext has links,Blogs have pingback,Tagging groups all related data

Page 10: NOSQL for Dummies

Trend 3: Semi-structure

12

๏ Individualization of content

• In the salary lists of the 1970s, all elements had exactly one job

• In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15?

๏All encompassing “entire world views”

• Store more data about each entity

๏Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)

Page 11: NOSQL for Dummies

Trend 4: Architecture

13

DB

Application

1980s: Mainframe applications

Page 12: NOSQL for Dummies

Trend 4: Architecture

14

DB

Application

1990s: Database as integration hub

Application Application

Page 13: NOSQL for Dummies

DBDB DB

Trend 4: Architecture

15

Application

2000s: (moving towards) Decoupled serviceswith their own backend

Application Application

Page 14: NOSQL for Dummies

Why NOSQL Now?

๏Trend 1: Size

๏Trend 2: Connectedness

๏Trend 3: Semi-structure

๏Trend 4: Architecture

16

Page 15: NOSQL for Dummies

RDBMS performance

17Data complexity

Perf

orm

ance

Majority ofWebapps

Social network

Semantic Trading

Salary List

}custom

Relational database

Requirement of application

We are building applications today that have size and load requirements that

Page 16: NOSQL for Dummies

Four emerging NOSQL categories

18

Page 17: NOSQL for Dummies

Key-Value stores

19

๏Focus on scaling to huge amounts of data

๏Designed to handle massive load

๏Based on Amazon’s Dynamo paper

๏Data model: (global) collection of Key-Value pairs

๏Dynamo ring partitioning and replication

๏Examples:

•Dynomite

•Voldemort

•Tokyo{Tyrant, Cabinet, etc...}

Page 18: NOSQL for Dummies

Key-Value stores

20

E D

CF

G B

A

We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.

Page 19: NOSQL for Dummies

Key-Value stores

20

E D

CF

G B

A

We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.

Page 20: NOSQL for Dummies

Key-Value stores

20

E D

CF

G B

A

We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.

Page 21: NOSQL for Dummies

Key-Value stores

20

E D

CF

G B

A

We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.

Page 22: NOSQL for Dummies

Key-Value stores

20

E D

CF

G B

A

We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.

Page 23: NOSQL for Dummies

BigTable clones๏Like column oriented Relational Databases, but with a twist

๏Tables similarly to RDBMS, but handles semi-structured

๏Based on Google’s BigTable paper

๏Data model: ‣Columns → column families → ACL

‣Datums keyed by: row, column, time, index

‣Row-range → tablet → distribution

๏Examples:

•HBase

•Hypertable

•Cassandra 21

Page 24: NOSQL for Dummies

Document databases๏Similar to Key-Value stores, but the DB knows what the Value is

๏ Inspired by Lotus Notes

๏Data model: Collections of Key-Value collections

๏Documents are often versioned

๏Examples:

•CouchDB

•MongoDB

•Redis

22

Page 25: NOSQL for Dummies

Graph databases๏Focus on modeling the structure of data - interconnectivity

๏Scales to the complexity of the data

๏ Inspired by mathematical Graph Theory ( G=(E,V) )

๏Data model: “Property Graph” ‣Nodes‣Relationships/Edges between Nodes (first class)‣Key-Value pairs on both‣Possibly Edge Labels and/or Node/Edge Types

๏Examples:

•Neo4j

•AllegroGraph

• Sones graphDB 23

Page 26: NOSQL for Dummies

Property Graph model

24

•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties

Page 27: NOSQL for Dummies

Property Graph model

24

•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties

Page 28: NOSQL for Dummies

Property Graph model

24

LIVES WITHLOVES

OWNSDRIVES

•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties

Page 29: NOSQL for Dummies

Property Graph model

24

LIVES WITHLOVES

OWNSDRIVES

LOVES

•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties

Page 30: NOSQL for Dummies

Property Graph model

24

LIVES WITHLOVES

OWNSDRIVES

LOVESname: “James”age: 32twitter: “@spam”

name: “Mary”age: 35

brand: “Volvo”model: “V70”

•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties

Page 31: NOSQL for Dummies

Property Graph model

24

LIVES WITHLOVES

OWNSDRIVES

LOVESname: “James”age: 32twitter: “@spam”

name: “Mary”age: 35

brand: “Volvo”model: “V70”

property type: “car”

•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties

Page 32: NOSQL for Dummies

Graphs are whiteboard friendly

25Image credits: Tobias Ivarsson

An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.

Page 33: NOSQL for Dummies

Graphs are whiteboard friendly

25

1

*

1

*

*

1*

1

*

*

Image credits: Tobias Ivarsson

An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.

Page 34: NOSQL for Dummies

Graphs are whiteboard friendly

25

thobe

Wardrobe Strength

Joe project blog

Hello Joe

Neo4j performance analysis

Modularizing Jython

Image credits: Tobias Ivarsson

An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.

Page 35: NOSQL for Dummies

Four emerging NOSQL categories

๏Key-Value stores

๏BigTable clones

๏Document databases

๏Graph databases

26

Page 36: NOSQL for Dummies

... and one that’s been around for a while

๏Object databases

•Neither gaining nor loosing traction

•Not part of the NOSQL community

• Still a good solution to a lot of problems

• Focuses on matching object oriented programming paradigm

‣Simplicity to integrate

‣Ease of use

27

Page 37: NOSQL for Dummies

Scaling to size vs. Scaling to complexity

28

Size

Complexity

Key/Value stores

Bigtable clones

Document databases

Graph databases

Page 38: NOSQL for Dummies

Scaling to size vs. Scaling to complexity

28

Size

Complexity

Key/Value stores

Bigtable clones

Document databases

Graph databases

> 90% of use cases

Billions of nodesand relationships

Page 39: NOSQL for Dummies

Who is NOSQL?

29

A healthy mix of big players and independent vendors.

Page 40: NOSQL for Dummies

“Ok, it’s not a database. How do I query it?”

30

๏RESTful interfaces (HTTP as an access API)

๏Query languages other than SQL

•GQL - SQL-like QL for Google BigTable

• SPARQL - Query language for the Semantic Web

•Gremlin - the graph traversal language

• Sones Graph Query Language

๏Query APIs

•The Google BigTable DataStore API

•The Neo4j Traversal API

Page 41: NOSQL for Dummies

Why is the database RESTing?

31

http://one/http://two/

http://three/http://four/

http://one/fishie

My best friend is http://three/flounder!

Because hyperlinks make it possible to reference data on different hosts without hassle.

RESTful is really all about hypermedia!

Page 42: NOSQL for Dummies

How about Data Manipulation?๏RESTful interfaces again (http PUT, POST, DELETE)

๏Data Manipulation APIs

•Google BigTable DataStore API

•Neo4j GraphDatabase API

๏Serialization Formats

• JSON

•Thrift

• ProtoBuffers

•RDF

32

Page 43: NOSQL for Dummies

NOSQL in the Enterprise

๏Availability

๏Security

๏Correctness

๏Performance

33

This presentation does not cover Security.The interesting parts of Security is an application layer issue anyways.

Page 44: NOSQL for Dummies

Availability๏Replication

•Write to many

• (Multi-)Master to Slave replication

๏Master reelection

๏Failover

• Either by another machine taking over

• or by the client knowing to attempt a replica

34

Page 45: NOSQL for Dummies

Correctness๏Brewer’s CAP theorem

•Most NOSQL db’s sacrifice Consistency

‣Some use “read-correction”, treat read values as votes

๏Some NOSQL databases don’t have transactions

• Instead they have only atomic single operations

•This makes some operations impossible to implement

35

Page 46: NOSQL for Dummies

Performance๏This is where all the focus seems to be

๏A surprising number scarifies Durability for performance

•On-disk durability

•Multiple-replicas durability

๏All NOSQL databases outperform RDBMSes

• ... in their particular niche ...

36

Page 47: NOSQL for Dummies

One database to rule them all

37Image credits: The Lord of the Rings, New Line Cinema

Up until recently there was only one Database, the RDBMS. The days of a single database that rules all is over.

Page 48: NOSQL for Dummies

Use best suited storage for each kind of data

38

The era of using RDBMSes for all problems is over.Instead we should use the database most suited for the problem at hand.

Page 49: NOSQL for Dummies

Polyglot persistence

39

... we could even use multiple databases in conjunction, and let each database handle the things it does best.

Page 50: NOSQL for Dummies

Polyglot persistence

40

SQL && NOSQL

All databases are welcome!SQL and NOSQL - it is Not Only SQL!

Page 51: NOSQL for Dummies

Summary๏Two steps forward ( but first one step back... )

๏The era of a single DBMS is over

๏Use the right tool for the right job

๏Polyglot persistence happens already, and will grow more common

๏Solves different scalability issues

• Scale to size - huge amounts of data, many many machines

• Scale to complexity - handle complicated schemas- avoid being bogged down by deep JOINs

๏Driven by big players and independent vendors - healthy community

41

Page 52: NOSQL for Dummies

Open source implementations to play with!๏Neo4j - talk to me, or visit http://neo4j.org/

๏CouchDB - http://couchdb.apache.org/

๏Cassandra - http://cassandra.apache.org/

๏Hadoop + HBase (clones GFS + BigTable) - http://hadoop.apache.org/

๏MongoDB - http://www.mongodb.org/

๏Redis - http://code.google.com/p/redis/

๏Oracle Berkley DB - http://www.oracle.com/database/berkeley-db/

๏FlockDB - http://github.com/twitter/flockdb

๏ ... and more ...42

Page 53: NOSQL for Dummies

http://neotechnology.com