NOSQL for Dummies Tobias Ivarsson Hacker @ Neo Technology twitter: @thobe / #neo4j email: [email protected] web: http://www.neo4j.org / web: http://www.thobe.org /
Sep 08, 2014
NOSQLfor Dummies
Tobias Ivarsson
Hacker @ Neo Technology
twitter: @thobe / #neo4jemail: [email protected]: http://www.neo4j.org/web: http://www.thobe.org/
4Image credit: http://browsertoolkit.com/fault-tolerance.png
This is still the view a lot of people have of NOSQL.
NOSQL - Defined by what it is Not
5
๏“Any database that is not a Relational Database”
๏The term was coined at a meetup with the creators behind some prominent emerging databases
๏“Non-Relational Databases” might be more correct- But it’s a mouthful!
๏ ... then there was a conference ...
๏ ... and a mailing list ...
๏ ... the name caught on ...
๏ ... then there were more conferences ...
๏ ... and here we are!
6
NOSQLWhat’s in the name...
7
NO to SQLIt’s not about saying that SQL should never be used, or that SQL is dead...
8
Not Only SQLIt’s about recognizing that for some problems other storage solutions are better suited!
9
Four trends
NOSQL - Why now?
2006 2007 2008 2009 2010
0
250
500
750
1000
161253
397
623
988
ExaBytes (10¹⁸) of data stored per year
10
Trend 1: Data size
Data source: IDC 2007
Each year more and more digital data is created. Over two years we create more digital data than all the data created in history before that.
Trend 2: Connectedness
11
Text documents
1990
Info
rmat
ion
conn
ectiv
ity
FolksonomiesTagging
User-generated content
Wikis
RSS
Blogs
Hypertext
2000 2010 2020web 1.0 web 2.0 “web 3.0”
Ontologies
RDF
GiantGlobal
Graph (GGG)
Over time data has evolved to be more and more interlinked and connected.Hypertext has links,Blogs have pingback,Tagging groups all related data
Trend 3: Semi-structure
12
๏ Individualization of content
• In the salary lists of the 1970s, all elements had exactly one job
• In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15?
๏All encompassing “entire world views”
• Store more data about each entity
๏Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)
Trend 4: Architecture
13
DB
Application
1980s: Mainframe applications
Trend 4: Architecture
14
DB
Application
1990s: Database as integration hub
Application Application
DBDB DB
Trend 4: Architecture
15
Application
2000s: (moving towards) Decoupled serviceswith their own backend
Application Application
Why NOSQL Now?
๏Trend 1: Size
๏Trend 2: Connectedness
๏Trend 3: Semi-structure
๏Trend 4: Architecture
16
RDBMS performance
17Data complexity
Perf
orm
ance
Majority ofWebapps
Social network
Semantic Trading
Salary List
}custom
Relational database
Requirement of application
We are building applications today that have size and load requirements that
Four emerging NOSQL categories
18
Key-Value stores
19
๏Focus on scaling to huge amounts of data
๏Designed to handle massive load
๏Based on Amazon’s Dynamo paper
๏Data model: (global) collection of Key-Value pairs
๏Dynamo ring partitioning and replication
๏Examples:
•Dynomite
•Voldemort
•Tokyo{Tyrant, Cabinet, etc...}
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
Key-Value stores
20
E D
CF
G B
A
We find the position of each object by its key. Here the keys are the names of the objects, alphabetically sorted.Each object is replicated in a few other stores for redundancy, in this example we use 3 replicas.
BigTable clones๏Like column oriented Relational Databases, but with a twist
๏Tables similarly to RDBMS, but handles semi-structured
๏Based on Google’s BigTable paper
๏Data model: ‣Columns → column families → ACL
‣Datums keyed by: row, column, time, index
‣Row-range → tablet → distribution
๏Examples:
•HBase
•Hypertable
•Cassandra 21
Document databases๏Similar to Key-Value stores, but the DB knows what the Value is
๏ Inspired by Lotus Notes
๏Data model: Collections of Key-Value collections
๏Documents are often versioned
๏Examples:
•CouchDB
•MongoDB
•Redis
22
Graph databases๏Focus on modeling the structure of data - interconnectivity
๏Scales to the complexity of the data
๏ Inspired by mathematical Graph Theory ( G=(E,V) )
๏Data model: “Property Graph” ‣Nodes‣Relationships/Edges between Nodes (first class)‣Key-Value pairs on both‣Possibly Edge Labels and/or Node/Edge Types
๏Examples:
•Neo4j
•AllegroGraph
• Sones graphDB 23
Property Graph model
24
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
Property Graph model
24
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
Property Graph model
24
LIVES WITHLOVES
OWNSDRIVES
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
Property Graph model
24
LIVES WITHLOVES
OWNSDRIVES
LOVES
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
Property Graph model
24
LIVES WITHLOVES
OWNSDRIVES
LOVESname: “James”age: 32twitter: “@spam”
name: “Mary”age: 35
brand: “Volvo”model: “V70”
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
Property Graph model
24
LIVES WITHLOVES
OWNSDRIVES
LOVESname: “James”age: 32twitter: “@spam”
name: “Mary”age: 35
brand: “Volvo”model: “V70”
property type: “car”
•Nodes•Relationships between Nodes•Relationships have Labels•Relationships are directed, but traversed at equal speed in both directions•The semantics of the direction is up to the application (LIVES WITH is reflexive, LOVES is not)•Nodes have key-value properties•Relationships have key-value properties
Graphs are whiteboard friendly
25Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
Graphs are whiteboard friendly
25
1
*
1
*
*
1*
1
*
*
Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
Graphs are whiteboard friendly
25
thobe
Wardrobe Strength
Joe project blog
Hello Joe
Neo4j performance analysis
Modularizing Jython
Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
Four emerging NOSQL categories
๏Key-Value stores
๏BigTable clones
๏Document databases
๏Graph databases
26
... and one that’s been around for a while
๏Object databases
•Neither gaining nor loosing traction
•Not part of the NOSQL community
• Still a good solution to a lot of problems
• Focuses on matching object oriented programming paradigm
‣Simplicity to integrate
‣Ease of use
27
Scaling to size vs. Scaling to complexity
28
Size
Complexity
Key/Value stores
Bigtable clones
Document databases
Graph databases
Scaling to size vs. Scaling to complexity
28
Size
Complexity
Key/Value stores
Bigtable clones
Document databases
Graph databases
> 90% of use cases
Billions of nodesand relationships
Who is NOSQL?
29
A healthy mix of big players and independent vendors.
“Ok, it’s not a database. How do I query it?”
30
๏RESTful interfaces (HTTP as an access API)
๏Query languages other than SQL
•GQL - SQL-like QL for Google BigTable
• SPARQL - Query language for the Semantic Web
•Gremlin - the graph traversal language
• Sones Graph Query Language
๏Query APIs
•The Google BigTable DataStore API
•The Neo4j Traversal API
Why is the database RESTing?
31
http://one/http://two/
http://three/http://four/
http://one/fishie
My best friend is http://three/flounder!
Because hyperlinks make it possible to reference data on different hosts without hassle.
RESTful is really all about hypermedia!
How about Data Manipulation?๏RESTful interfaces again (http PUT, POST, DELETE)
๏Data Manipulation APIs
•Google BigTable DataStore API
•Neo4j GraphDatabase API
๏Serialization Formats
• JSON
•Thrift
• ProtoBuffers
•RDF
32
NOSQL in the Enterprise
๏Availability
๏Security
๏Correctness
๏Performance
33
This presentation does not cover Security.The interesting parts of Security is an application layer issue anyways.
Availability๏Replication
•Write to many
• (Multi-)Master to Slave replication
๏Master reelection
๏Failover
• Either by another machine taking over
• or by the client knowing to attempt a replica
34
Correctness๏Brewer’s CAP theorem
•Most NOSQL db’s sacrifice Consistency
‣Some use “read-correction”, treat read values as votes
๏Some NOSQL databases don’t have transactions
• Instead they have only atomic single operations
•This makes some operations impossible to implement
35
Performance๏This is where all the focus seems to be
๏A surprising number scarifies Durability for performance
•On-disk durability
•Multiple-replicas durability
๏All NOSQL databases outperform RDBMSes
• ... in their particular niche ...
36
One database to rule them all
37Image credits: The Lord of the Rings, New Line Cinema
Up until recently there was only one Database, the RDBMS. The days of a single database that rules all is over.
Use best suited storage for each kind of data
38
The era of using RDBMSes for all problems is over.Instead we should use the database most suited for the problem at hand.
Polyglot persistence
39
... we could even use multiple databases in conjunction, and let each database handle the things it does best.
Polyglot persistence
40
SQL && NOSQL
All databases are welcome!SQL and NOSQL - it is Not Only SQL!
Summary๏Two steps forward ( but first one step back... )
๏The era of a single DBMS is over
๏Use the right tool for the right job
๏Polyglot persistence happens already, and will grow more common
๏Solves different scalability issues
• Scale to size - huge amounts of data, many many machines
• Scale to complexity - handle complicated schemas- avoid being bogged down by deep JOINs
๏Driven by big players and independent vendors - healthy community
41
Open source implementations to play with!๏Neo4j - talk to me, or visit http://neo4j.org/
๏CouchDB - http://couchdb.apache.org/
๏Cassandra - http://cassandra.apache.org/
๏Hadoop + HBase (clones GFS + BigTable) - http://hadoop.apache.org/
๏MongoDB - http://www.mongodb.org/
๏Redis - http://code.google.com/p/redis/
๏Oracle Berkley DB - http://www.oracle.com/database/berkeley-db/
๏FlockDB - http://github.com/twitter/flockdb
๏ ... and more ...42
http://neotechnology.com