Top Banner
ADDING NOSQL TO YOUR ARSENAL
139
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 10 d bs in 30 minutes

A D D I N G N O S Q L T O Y O U R A R S E N A L

Page 2: 10 d bs in 30 minutes

A D D I N G N O S Q L T O Y O U R A R S E N A L

Page 3: 10 d bs in 30 minutes

A K A

T E N D ATA B A S E S I N H A L F A N H O U R

Page 4: 10 d bs in 30 minutes

SQLD A TA B A S E # 1 :

Page 5: 10 d bs in 30 minutes

T H E I N D U S T R Y S TA N D A R D

Page 6: 10 d bs in 30 minutes

R D B M S ( R E L AT I O N A L D ATA B A S E M A N A G E M E N T S Y S T E M )

Page 7: 10 d bs in 30 minutes

R D B M S

• Schema-driven

• Set-based operations

• ACID transactionality

Page 8: 10 d bs in 30 minutes

S C H E M A D R I V E N

Page 9: 10 d bs in 30 minutes

Name Species

Page 10: 10 d bs in 30 minutes

S E T- B A S E D O P E R AT I O N

R E A D D A TA O U T W I T H

Page 11: 10 d bs in 30 minutes

E V E R Y R O W I S A “ T H I N G ”

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

Page 12: 10 d bs in 30 minutes

“ W H E R E ” ( I N T E R S E C T I O N )

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

Page 13: 10 d bs in 30 minutes

U N I O N S

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

5 Nemo

6 Moby Dick

7 Wanda

Page 14: 10 d bs in 30 minutes

J O I N S

Name SpeciesSpecies Coolness

Rating

1 Puss 0

2 Dinah 0

3 Einstein 10

4 Jess 0

Page 15: 10 d bs in 30 minutes

C A R T E S I A N P R O D U C T S

0 10

0 10

0 10

Page 16: 10 d bs in 30 minutes

C A R T E S I A N P R O D U C T S

0 10

0 10

0 10

Page 17: 10 d bs in 30 minutes

– R O N E R N E S T ( & T H E S Q L C O M M U N I T Y AT L A R G E )

“Cursors are evil.”

Page 18: 10 d bs in 30 minutes

A C I D

W R I T E D A TA I N W I T H

Page 19: 10 d bs in 30 minutes

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

Page 20: 10 d bs in 30 minutes

DonaldPlutoMickey

{ }

Page 21: 10 d bs in 30 minutes

Ducks aren’t mammals

Page 22: 10 d bs in 30 minutes

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

Page 23: 10 d bs in 30 minutes

The database is always in a valid state, as defined by a whole number of queries

regardless of: (1) invalid data;

(2) concurrent requests; (3) system failures

Page 24: 10 d bs in 30 minutes

The database is always in a valid state, as defined by a whole number of queries

regardless of: (1) invalid data;

(2) concurrent requests; (3) system failures

Page 25: 10 d bs in 30 minutes

The database is always in a valid state, as defined by a whole number of queries

regardless of: (1) invalid data;

(2) concurrent requests; (3) system failures

Page 26: 10 d bs in 30 minutes

The database is always in a valid state, as defined by a whole number of queries

regardless of: (1) invalid data;

(2) concurrent requests; (3) system failures

Page 27: 10 d bs in 30 minutes

A C I D

• Atomicity

• Consistency

• Isolation

• Durability

Page 28: 10 d bs in 30 minutes

W H AT I S W R O N G W I T H S Q L ?

Page 29: 10 d bs in 30 minutes

N O T H I N G

Page 30: 10 d bs in 30 minutes

N O T H I N G *

* As long as you use it for the right job

Page 31: 10 d bs in 30 minutes

– M A S L O W ’ S H A M M E R

“If all you have is a hammer, everything looks like a nail.”

Page 32: 10 d bs in 30 minutes

T O C O M E

• 10 different ‘flavours’ of NoSQL Databases

• Just enough to whet the appetite!

Page 33: 10 d bs in 30 minutes

MongoDBD A TA B A S E # 2 :

Page 34: 10 d bs in 30 minutes

D O C U M E N T S T O R E

Page 35: 10 d bs in 30 minutes

E V E R Y R O W I S A “ T H I N G ”

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

Page 36: 10 d bs in 30 minutes

E V E R Y R O W I S A “ T H I N G ”

N A M E = P U S S C O O L N E S S = 0

!

N A M E = J E S S C O O L N E S S = 0

!

N A M E = D I N A H C O O L N E S S = 0

!

N A M E = E I N S T E I N C O O L N E S S = 1 0

!

D O C U M E N T

Page 37: 10 d bs in 30 minutes

B E WA R E !

Page 38: 10 d bs in 30 minutes

T H AT ’ S T H E P O I N T

Page 39: 10 d bs in 30 minutes

D E N O R M A L I S E D D ATAF O R E X A M P L E

Page 40: 10 d bs in 30 minutes

E V E R Y R O W I S A “ T H I N G ”

N A M E = P U S S C O O L N E S S = 0

!

N A M E = J E S S C O O L N E S S = 0

!

N A M E = D I N A H C O O L N E S S = 0

!

N A M E = E I N S T E I N C O O L N E S S = 1 0

!

D O C U M E N T

Page 41: 10 d bs in 30 minutes
Page 42: 10 d bs in 30 minutes

E A S Y S H A R D I N G

Page 43: 10 d bs in 30 minutes
Page 44: 10 d bs in 30 minutes

G E O S PAT I A L I N D E X E S

Page 45: 10 d bs in 30 minutes
Page 46: 10 d bs in 30 minutes

S C H E M A L E S S

Page 47: 10 d bs in 30 minutes

EloqueraD A TA B A S E # 3 :

Page 48: 10 d bs in 30 minutes

O B J E C T D ATA B A S E

Page 49: 10 d bs in 30 minutes

E V E R Y R O W I S A “ T H I N G ”

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

Page 50: 10 d bs in 30 minutes

E V E R Y R O W I S A “ T H I N G ”

N A M E = P U S S C O O L N E S S = 0

!

N A M E = J E S S C O O L N E S S = 0

!

N A M E = D I N A H C O O L N E S S = 0

!

N A M E = E I N S T E I N C O O L N E S S = 1 0

!

D O C U M E N T

Page 51: 10 d bs in 30 minutes

E V E R Y R O W I S A “ T H I N G ”O B J E C T

public class Thing { public int coolness { get; set; } public string name { get; set; } public Species species { get; set;} }

Page 52: 10 d bs in 30 minutes

T R A N S PA R E N C Y T O T H E D B

Page 53: 10 d bs in 30 minutes

neo4jD A TA B A S E # 4 :

Page 54: 10 d bs in 30 minutes

G R A P H D ATA B A S E

Page 55: 10 d bs in 30 minutes
Page 56: 10 d bs in 30 minutes

N E O 4 J

I M P L E M E N T E D B Y …

Page 57: 10 d bs in 30 minutes
Page 58: 10 d bs in 30 minutes

T H E D ATA I S T H E R E L AT I O N S

Page 59: 10 d bs in 30 minutes
Page 60: 10 d bs in 30 minutes
Page 61: 10 d bs in 30 minutes
Page 62: 10 d bs in 30 minutes

VoldemortD A TA B A S E # 5 :

Page 63: 10 d bs in 30 minutes

– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E

“Reliability at massive scale is one of the biggest challenges we face at Amazon.com. Even the

slightest outage has significant financial consequences and impacts customer trust.”

Page 64: 10 d bs in 30 minutes

– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E

“Experience at Amazon has shown that data stores that provide ACID guarantees tend to have poor

availability”

Page 65: 10 d bs in 30 minutes

– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E

“Dynamo targets applications that operate with weaker consistency if this results in high

availability.”

Page 66: 10 d bs in 30 minutes

C O N S I S T E N C Y

A

B C

Page 67: 10 d bs in 30 minutes

C O N S I S T E N C Y

A

B C

Page 68: 10 d bs in 30 minutes

D Y N A M O I M P L E M E N TAT I O N S

Page 69: 10 d bs in 30 minutes

V O L D E M O R T

Page 70: 10 d bs in 30 minutes

K E Y / VA L U E S T O R E

Page 71: 10 d bs in 30 minutes

store.put(key, value)

Page 72: 10 d bs in 30 minutes

value = store.get(key)

Page 73: 10 d bs in 30 minutes

store.delete(key)

Page 74: 10 d bs in 30 minutes

B E WA R E : I T ’ S V E R Y L I M I T E D …

Page 75: 10 d bs in 30 minutes

L O W L AT E N C Y

Page 76: 10 d bs in 30 minutes

H I G H AVA I L A B I L I T Y

Page 77: 10 d bs in 30 minutes

HBase/HadoopD A TA B A S E # 6 :

Page 78: 10 d bs in 30 minutes
Page 79: 10 d bs in 30 minutes
Page 80: 10 d bs in 30 minutes
Page 81: 10 d bs in 30 minutes
Page 82: 10 d bs in 30 minutes
Page 83: 10 d bs in 30 minutes
Page 84: 10 d bs in 30 minutes
Page 85: 10 d bs in 30 minutes

B I G D ATA

W H E N T O U S E H A D O O P …

Page 86: 10 d bs in 30 minutes

– C H R I S S T U C C H I O

“Don't use Hadoop - your data isn't that big.”

Page 87: 10 d bs in 30 minutes
Page 88: 10 d bs in 30 minutes

L I N E A R S C A L A B I L I T Y

Page 89: 10 d bs in 30 minutes

A U T O M AT I C S H A R D I N G A N D S T R O N G C O N S I S T E N C Y

Page 90: 10 d bs in 30 minutes

B U I LT- I N E F F I C I E N T Q U E R Y M E T H O D S

Page 91: 10 d bs in 30 minutes

MarmottaD A TA B A S E # 7 :

Page 92: 10 d bs in 30 minutes

L I N K E D M E D I A F R A M E W O R K

Page 93: 10 d bs in 30 minutes

– L I N K E D M E D I A G U I D E L I N E S

Use URIs as names for things. Use HTTP URIs, so that people can look up those names.

Page 94: 10 d bs in 30 minutes

– L I N K E D M E D I A G U I D E L I N E S

When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).

Page 95: 10 d bs in 30 minutes

– L I N K E D M E D I A G U I D E L I N E S

Include links to other URIs, so that they can discover more things.

Page 96: 10 d bs in 30 minutes
Page 97: 10 d bs in 30 minutes

C O O L S K AT I N G

V I D E O

Page 98: 10 d bs in 30 minutes

C O O L S K AT I N G

V I D E O

C O O L S K AT E R

C O O L S K AT I N G

E V E N T

Page 99: 10 d bs in 30 minutes

C O O L S K AT I N G

V I D E O

C O O L S K AT E R

W I N D S U R F E R ( A K A C O O L

S K AT E R ’ S H U S B A N D )

C O O L S K AT I N G

E V E N T

S P O N S O R O F C O O L S K AT I N G

E V E N T

Page 100: 10 d bs in 30 minutes

C O O L S K AT I N G

V I D E O

C O O L S K AT E R

W I N D S U R F E R ( A K A C O O L

S K AT E R ’ S H U S B A N D )

W R I T E U P O F W I N D S U R F I N G

E V E N T

C O O L S K AT I N G

E V E N T

S P O N S O R O F C O O L S K AT I N G

E V E N T

I N T E R V I E W W I T H C E O O F

S P O N S O R

Page 101: 10 d bs in 30 minutes

A PA C H E M A R M O T TA

O U T O F T H E B O X …

Page 102: 10 d bs in 30 minutes

T R I P L E VA L U E S T O R E

Page 103: 10 d bs in 30 minutes

T R I P L E VA L U E S T O R E

• Video A contains Alice McSkaterton

• Alice McSkaterton is married to Brock Windsurferling

• Article B contains Brock Windsurferling

Page 104: 10 d bs in 30 minutes

T R I P L E VA L U E S T O R E

• Video A contains Alice McSkaterton

• Alice McSkaterton is married to Brock Windsurferling

• Article B contains Brock Windsurferling

• ENGINE SAYS VIDEO A IS RELATED TO ARTICLE B

Page 105: 10 d bs in 30 minutes

ElasticSearchD A TA B A S E # 8 :

Page 106: 10 d bs in 30 minutes

D O C U M E N T S T O R E

Page 107: 10 d bs in 30 minutes

E V E R Y R O W I S A “ T H I N G ”

N A M E = P U S S C O O L N E S S = 0

!

N A M E = J E S S C O O L N E S S = 0

!

N A M E = D I N A H C O O L N E S S = 0

!

N A M E = E I N S T E I N C O O L N E S S = 1 0

!

D O C U M E N T

Page 108: 10 d bs in 30 minutes

A PA C H E L U C E N E

Page 109: 10 d bs in 30 minutes

“Apache Lucene is a high-performance, full-featured text search engine library … It is a

technology suitable for nearly any application that requires full-text search”

Page 110: 10 d bs in 30 minutes

F O C U S E D A R O U N D T E X T S E A R C H I N G Q U E R I E S

Page 111: 10 d bs in 30 minutes

{ "query": { "match": {"hobbies": "skateboard"} } }

Page 112: 10 d bs in 30 minutes

{ "query": { {"fuzzy": {"hobbies": “skateboarig"}} } }

Page 113: 10 d bs in 30 minutes

{ "query": { {"match": {"hobbies": {"query": "writing reddit comments", "type": "phrase"}}} } }

Page 114: 10 d bs in 30 minutes

TempoDBD A TA B A S E # 9 :

Page 115: 10 d bs in 30 minutes

T I M E S E R I E S D ATA B A S E

Page 116: 10 d bs in 30 minutes

T I M E S TA M P /VA L U E PA I R S

Page 117: 10 d bs in 30 minutes
Page 118: 10 d bs in 30 minutes

Timestamp Value

2014-06-10T12:00:00+0100 17

2014-06-10T12:15:00+0100 17

2014-06-10T12:30:00+0100 20

2014-06-10T12:45:00+0100 22

2014-06-10T13:00:00+0100 24

2014-06-10T13:15:00+0100 28

2014-06-10T13:30:00+0100 32

Page 119: 10 d bs in 30 minutes

T I M E S E R I E S D ATA B A S E A S A S E R V I C E

!T E M P O D B

Page 120: 10 d bs in 30 minutes

S P E C I A L I S E D Q U E R I E S

Page 121: 10 d bs in 30 minutes

T I M E R O L L U P S

Page 122: 10 d bs in 30 minutes

Timestamp Value2014-06-10T12:00:00+0100 172014-06-10T12:15:00+0100 172014-06-10T12:30:00+0100 202014-06-10T12:45:00+0100 222014-06-10T13:00:00+0100 242014-06-10T13:15:00+0100 282014-06-10T13:45:00+0100 362014-06-10T12:00:00+0100 172014-06-10T12:15:00+0100 172014-06-10T12:30:00+0100 202014-06-10T12:45:00+0100 222014-06-10T13:00:00+0100 242014-06-10T13:15:00+0100 282014-06-10T13:45:00+0100 362014-06-10T12:00:00+0100 172014-06-10T12:15:00+0100 172014-06-10T12:30:00+0100 202014-06-10T12:45:00+0100 222014-06-10T13:00:00+0100 242014-06-10T13:15:00+0100 282014-06-10T13:45:00+0100 36

Page 123: 10 d bs in 30 minutes

Timestamp Average Max Min

2014-06-10T12:00:00+0100 35 36 17

2014-06-11T12:00:00+0100 21 22 20

2014-06-12T12:30:00+0100 20.5 21 19

2014-06-13T12:45:00+0100 20 20 20

2014-06-14T13:00:00+0100 18.5 19 18

Page 124: 10 d bs in 30 minutes

T E M P O R A L I N T E R P O L AT I O N

Page 125: 10 d bs in 30 minutes

Timestamp Value

2014-06-10T12:00:00+0100 17

2014-06-10T12:15:00+0100 17

2014-06-10T12:30:00+0100 20

2014-06-10T12:45:00+0100 22

2014-06-10T13:00:00+0100 24

2014-06-10T13:15:00+0100 28

2014-06-10T13:45:00+0100 36

Page 126: 10 d bs in 30 minutes

Timestamp Value

2014-06-10T12:00:00+0100 17

2014-06-10T12:15:00+0100 17

2014-06-10T12:30:00+0100 20

2014-06-10T12:45:00+0100 22

2014-06-10T13:00:00+0100 24

2014-06-10T13:15:00+0100 28

2014-06-10T13:30:00+0100 31.5

2014-06-10T13:45:00+0100 36

Page 127: 10 d bs in 30 minutes

PostgreSQLD A TA B A S E # 1 0 :

Page 128: 10 d bs in 30 minutes

A L L T H E G O O D S T U F F O F S Q L

Page 129: 10 d bs in 30 minutes

– P E T E R W AY N E R

“The smart NoSQL developers simply noted that NoSQL stood for "Not Only SQL." If the masses

misinterpreted the acronym, that was their problem.”

Page 130: 10 d bs in 30 minutes

O P E N S O U R C E A N D M AT U R E

Page 131: 10 d bs in 30 minutes

F O R E I G N D ATA W R A P P E R S

Page 132: 10 d bs in 30 minutes

F O R E I G N D ATA W R A P P E R S

neo4j File StoreLegacy Oracle

System

Page 133: 10 d bs in 30 minutes

F O R E I G N D ATA W R A P P E R S

neo4j File StoreLegacy Oracle

System

Page 134: 10 d bs in 30 minutes

F O R E I G N D ATA W R A P P E R S

neo4j File StoreLegacy Oracle

System

S E L E C T S T U F F F R O M N E O 4 J J O I N S T U F F F R O M F I L E S T O R E

J O I N S T U F F F R O M O R A C L E

Page 135: 10 d bs in 30 minutes

F O R E I G N D ATA W R A P P E R S

neo4j e.g. Patient

Data

File Store e.g. Academic

Results

Legacy Oracle System

e.g. Clinical Trials

S E L E C T S T U F F F R O M N E O 4 J J O I N S T U F F F R O M F I L E S T O R E

J O I N S T U F F F R O M O R A C L E

Page 136: 10 d bs in 30 minutes

F O R E I G N D ATA W R A P P E R S

• SQL Databse Wrappers

• NoSQL Databases (Mongo, neo4j etc.)

• Hadoop

• Files (JSON, FixedLengthText)

• Web services

• Twitter

Page 137: 10 d bs in 30 minutes

In conclusion…

Page 138: 10 d bs in 30 minutes

R E A S O N S T O U S E O T H E R D ATA B A S E S

• Geospatial indexes

• Schemaless data for query-time efficiency

• Transparent Sharding

• Be transparent to the database backend.

• More intuitive for the domain

• Cheap ‘joins’

• Low latency for simple data

• High availability in distributed systems

• Dealing with very large datasets

• Meeting standards such as Linked Media

• Support for time series databases

• Utilise pre-built text searching functionality.

• Interface for other data sources

Page 139: 10 d bs in 30 minutes

A N Y Q U E S T I O N S ?T H A N K Y O U …