ADDING NOSQL TO YOUR ARSENAL
Jul 07, 2015
A D D I N G N O S Q L T O Y O U R A R S E N A L
A D D I N G N O S Q L T O Y O U R A R S E N A L
A K A
T E N D ATA B A S E S I N H A L F A N H O U R
SQLD A TA B A S E # 1 :
T H E I N D U S T R Y S TA N D A R D
R D B M S ( R E L AT I O N A L D ATA B A S E M A N A G E M E N T S Y S T E M )
R D B M S
• Schema-driven
• Set-based operations
• ACID transactionality
S C H E M A D R I V E N
Name Species
S E T- B A S E D O P E R AT I O N
R E A D D A TA O U T W I T H
E V E R Y R O W I S A “ T H I N G ”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
“ W H E R E ” ( I N T E R S E C T I O N )
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
U N I O N S
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
5 Nemo
6 Moby Dick
7 Wanda
J O I N S
Name SpeciesSpecies Coolness
Rating
1 Puss 0
2 Dinah 0
3 Einstein 10
4 Jess 0
C A R T E S I A N P R O D U C T S
0 10
0 10
0 10
C A R T E S I A N P R O D U C T S
0 10
0 10
0 10
– R O N E R N E S T ( & T H E S Q L C O M M U N I T Y AT L A R G E )
“Cursors are evil.”
A C I D
W R I T E D A TA I N W I T H
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
DonaldPlutoMickey
{ }
Ducks aren’t mammals
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
The database is always in a valid state, as defined by a whole number of queries
regardless of: (1) invalid data;
(2) concurrent requests; (3) system failures
The database is always in a valid state, as defined by a whole number of queries
regardless of: (1) invalid data;
(2) concurrent requests; (3) system failures
The database is always in a valid state, as defined by a whole number of queries
regardless of: (1) invalid data;
(2) concurrent requests; (3) system failures
The database is always in a valid state, as defined by a whole number of queries
regardless of: (1) invalid data;
(2) concurrent requests; (3) system failures
A C I D
• Atomicity
• Consistency
• Isolation
• Durability
W H AT I S W R O N G W I T H S Q L ?
N O T H I N G
N O T H I N G *
* As long as you use it for the right job
– M A S L O W ’ S H A M M E R
“If all you have is a hammer, everything looks like a nail.”
T O C O M E
• 10 different ‘flavours’ of NoSQL Databases
• Just enough to whet the appetite!
MongoDBD A TA B A S E # 2 :
D O C U M E N T S T O R E
E V E R Y R O W I S A “ T H I N G ”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
E V E R Y R O W I S A “ T H I N G ”
N A M E = P U S S C O O L N E S S = 0
!
N A M E = J E S S C O O L N E S S = 0
!
N A M E = D I N A H C O O L N E S S = 0
!
N A M E = E I N S T E I N C O O L N E S S = 1 0
!
D O C U M E N T
B E WA R E !
T H AT ’ S T H E P O I N T
D E N O R M A L I S E D D ATAF O R E X A M P L E
E V E R Y R O W I S A “ T H I N G ”
N A M E = P U S S C O O L N E S S = 0
!
N A M E = J E S S C O O L N E S S = 0
!
N A M E = D I N A H C O O L N E S S = 0
!
N A M E = E I N S T E I N C O O L N E S S = 1 0
!
D O C U M E N T
E A S Y S H A R D I N G
G E O S PAT I A L I N D E X E S
S C H E M A L E S S
EloqueraD A TA B A S E # 3 :
O B J E C T D ATA B A S E
E V E R Y R O W I S A “ T H I N G ”
Name Species
1 Puss
2 Dinah
3 Einstein
4 Jess
E V E R Y R O W I S A “ T H I N G ”
N A M E = P U S S C O O L N E S S = 0
!
N A M E = J E S S C O O L N E S S = 0
!
N A M E = D I N A H C O O L N E S S = 0
!
N A M E = E I N S T E I N C O O L N E S S = 1 0
!
D O C U M E N T
E V E R Y R O W I S A “ T H I N G ”O B J E C T
public class Thing { public int coolness { get; set; } public string name { get; set; } public Species species { get; set;} }
T R A N S PA R E N C Y T O T H E D B
neo4jD A TA B A S E # 4 :
G R A P H D ATA B A S E
N E O 4 J
I M P L E M E N T E D B Y …
T H E D ATA I S T H E R E L AT I O N S
VoldemortD A TA B A S E # 5 :
– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E
“Reliability at massive scale is one of the biggest challenges we face at Amazon.com. Even the
slightest outage has significant financial consequences and impacts customer trust.”
– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E
“Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor
availability”
– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E
“Dynamo targets applications that operate with weaker consistency if this results in high
availability.”
C O N S I S T E N C Y
A
B C
C O N S I S T E N C Y
A
B C
D Y N A M O I M P L E M E N TAT I O N S
V O L D E M O R T
K E Y / VA L U E S T O R E
store.put(key, value)
value = store.get(key)
store.delete(key)
B E WA R E : I T ’ S V E R Y L I M I T E D …
L O W L AT E N C Y
H I G H AVA I L A B I L I T Y
HBase/HadoopD A TA B A S E # 6 :
B I G D ATA
W H E N T O U S E H A D O O P …
– C H R I S S T U C C H I O
“Don't use Hadoop - your data isn't that big.”
L I N E A R S C A L A B I L I T Y
A U T O M AT I C S H A R D I N G A N D S T R O N G C O N S I S T E N C Y
B U I LT- I N E F F I C I E N T Q U E R Y M E T H O D S
MarmottaD A TA B A S E # 7 :
L I N K E D M E D I A F R A M E W O R K
– L I N K E D M E D I A G U I D E L I N E S
Use URIs as names for things. Use HTTP URIs, so that people can look up those names.
– L I N K E D M E D I A G U I D E L I N E S
When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).
– L I N K E D M E D I A G U I D E L I N E S
Include links to other URIs, so that they can discover more things.
C O O L S K AT I N G
V I D E O
C O O L S K AT I N G
V I D E O
C O O L S K AT E R
C O O L S K AT I N G
E V E N T
C O O L S K AT I N G
V I D E O
C O O L S K AT E R
W I N D S U R F E R ( A K A C O O L
S K AT E R ’ S H U S B A N D )
C O O L S K AT I N G
E V E N T
S P O N S O R O F C O O L S K AT I N G
E V E N T
C O O L S K AT I N G
V I D E O
C O O L S K AT E R
W I N D S U R F E R ( A K A C O O L
S K AT E R ’ S H U S B A N D )
W R I T E U P O F W I N D S U R F I N G
E V E N T
C O O L S K AT I N G
E V E N T
S P O N S O R O F C O O L S K AT I N G
E V E N T
I N T E R V I E W W I T H C E O O F
S P O N S O R
A PA C H E M A R M O T TA
O U T O F T H E B O X …
T R I P L E VA L U E S T O R E
T R I P L E VA L U E S T O R E
• Video A contains Alice McSkaterton
• Alice McSkaterton is married to Brock Windsurferling
• Article B contains Brock Windsurferling
T R I P L E VA L U E S T O R E
• Video A contains Alice McSkaterton
• Alice McSkaterton is married to Brock Windsurferling
• Article B contains Brock Windsurferling
• ENGINE SAYS VIDEO A IS RELATED TO ARTICLE B
ElasticSearchD A TA B A S E # 8 :
D O C U M E N T S T O R E
E V E R Y R O W I S A “ T H I N G ”
N A M E = P U S S C O O L N E S S = 0
!
N A M E = J E S S C O O L N E S S = 0
!
N A M E = D I N A H C O O L N E S S = 0
!
N A M E = E I N S T E I N C O O L N E S S = 1 0
!
D O C U M E N T
A PA C H E L U C E N E
“Apache Lucene is a high-performance, full-featured text search engine library … It is a
technology suitable for nearly any application that requires full-text search”
F O C U S E D A R O U N D T E X T S E A R C H I N G Q U E R I E S
{ "query": { "match": {"hobbies": "skateboard"} } }
{ "query": { {"fuzzy": {"hobbies": “skateboarig"}} } }
{ "query": { {"match": {"hobbies": {"query": "writing reddit comments", "type": "phrase"}}} } }
TempoDBD A TA B A S E # 9 :
T I M E S E R I E S D ATA B A S E
T I M E S TA M P /VA L U E PA I R S
Timestamp Value
2014-06-10T12:00:00+0100 17
2014-06-10T12:15:00+0100 17
2014-06-10T12:30:00+0100 20
2014-06-10T12:45:00+0100 22
2014-06-10T13:00:00+0100 24
2014-06-10T13:15:00+0100 28
2014-06-10T13:30:00+0100 32
T I M E S E R I E S D ATA B A S E A S A S E R V I C E
!T E M P O D B
S P E C I A L I S E D Q U E R I E S
T I M E R O L L U P S
Timestamp Value2014-06-10T12:00:00+0100 172014-06-10T12:15:00+0100 172014-06-10T12:30:00+0100 202014-06-10T12:45:00+0100 222014-06-10T13:00:00+0100 242014-06-10T13:15:00+0100 282014-06-10T13:45:00+0100 362014-06-10T12:00:00+0100 172014-06-10T12:15:00+0100 172014-06-10T12:30:00+0100 202014-06-10T12:45:00+0100 222014-06-10T13:00:00+0100 242014-06-10T13:15:00+0100 282014-06-10T13:45:00+0100 362014-06-10T12:00:00+0100 172014-06-10T12:15:00+0100 172014-06-10T12:30:00+0100 202014-06-10T12:45:00+0100 222014-06-10T13:00:00+0100 242014-06-10T13:15:00+0100 282014-06-10T13:45:00+0100 36
Timestamp Average Max Min
2014-06-10T12:00:00+0100 35 36 17
2014-06-11T12:00:00+0100 21 22 20
2014-06-12T12:30:00+0100 20.5 21 19
2014-06-13T12:45:00+0100 20 20 20
2014-06-14T13:00:00+0100 18.5 19 18
T E M P O R A L I N T E R P O L AT I O N
Timestamp Value
2014-06-10T12:00:00+0100 17
2014-06-10T12:15:00+0100 17
2014-06-10T12:30:00+0100 20
2014-06-10T12:45:00+0100 22
2014-06-10T13:00:00+0100 24
2014-06-10T13:15:00+0100 28
2014-06-10T13:45:00+0100 36
Timestamp Value
2014-06-10T12:00:00+0100 17
2014-06-10T12:15:00+0100 17
2014-06-10T12:30:00+0100 20
2014-06-10T12:45:00+0100 22
2014-06-10T13:00:00+0100 24
2014-06-10T13:15:00+0100 28
2014-06-10T13:30:00+0100 31.5
2014-06-10T13:45:00+0100 36
PostgreSQLD A TA B A S E # 1 0 :
A L L T H E G O O D S T U F F O F S Q L
– P E T E R W AY N E R
“The smart NoSQL developers simply noted that NoSQL stood for "Not Only SQL." If the masses
misinterpreted the acronym, that was their problem.”
O P E N S O U R C E A N D M AT U R E
F O R E I G N D ATA W R A P P E R S
F O R E I G N D ATA W R A P P E R S
neo4j File StoreLegacy Oracle
System
F O R E I G N D ATA W R A P P E R S
neo4j File StoreLegacy Oracle
System
F O R E I G N D ATA W R A P P E R S
neo4j File StoreLegacy Oracle
System
S E L E C T S T U F F F R O M N E O 4 J J O I N S T U F F F R O M F I L E S T O R E
J O I N S T U F F F R O M O R A C L E
F O R E I G N D ATA W R A P P E R S
neo4j e.g. Patient
Data
File Store e.g. Academic
Results
Legacy Oracle System
e.g. Clinical Trials
S E L E C T S T U F F F R O M N E O 4 J J O I N S T U F F F R O M F I L E S T O R E
J O I N S T U F F F R O M O R A C L E
F O R E I G N D ATA W R A P P E R S
• SQL Databse Wrappers
• NoSQL Databases (Mongo, neo4j etc.)
• Hadoop
• Files (JSON, FixedLengthText)
• Web services
In conclusion…
R E A S O N S T O U S E O T H E R D ATA B A S E S
• Geospatial indexes
• Schemaless data for query-time efficiency
• Transparent Sharding
• Be transparent to the database backend.
• More intuitive for the domain
• Cheap ‘joins’
• Low latency for simple data
• High availability in distributed systems
• Dealing with very large datasets
• Meeting standards such as Linked Media
• Support for time series databases
• Utilise pre-built text searching functionality.
• Interface for other data sources
A N Y Q U E S T I O N S ?T H A N K Y O U …