Top Banner
CASSANDRA VS. THE FIELD blueplastic.com/c.pdf B Y S AMEER F AROOQUI S AMEER @ BLUEPLASTIC . COM linkedin.com/in/blueplastic/ @blueplastic http://youtu.be/ziqx2hJY8Hg #Cassandra13 COMPARING ARCHITECTURES:
39

C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Dec 05, 2014

Download

Documents

Have you wondered what actually happens when you submit a write to Cassandra? This vendor agnostic technical talk will cover the internals of the read and write paths of Cassandra and compare it to other NoSQL stores, especially HBase so you can pick the right database for your project. Some of the topics mentioned are consistency levels, memtables/memstores, SSTables/HFiles, bloom filters, block indexes, data distribution partitioners and optimal use cases.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

CASSANDRA VS. THE FIELD

blueplastic.com/c.pdf

BY SAMEER FAROOQUI

[email protected]

linkedin.com/in/blueplastic/

@blueplastic

http://youtu.be/ziqx2hJY8Hg

#Cassandra13

COMPARING ARCHITECTURES:

Page 2: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

NoSQL Options

Key -> Value Key -> Doc Column Family Graph ~Real Time

Riak

Redis

Memcached DB

Berkeley DB

Hamster DB

Amazon Dynamo

Voldemort

FoundationDB

LevelDB

Tokyo Cabinet

MongoDB

CouchDB

Terrastore

OrientDB

RavenDB

Elasticsearch

Cassandra

HBase

Hypertable

Amazon SimpleDB

Accumulo

HPCC

Cloudata

Neo4J

Infinite Graph

OrientDB

FlockDB

Gremlin

Titan

Storm

Impala

Stinger/Tez

Drill

Solr/Lucene

Page 3: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Key -> Value

Key (ID) Value (Name)

0001 Winston Smith

0002 Julia

0003 O'Brien

0004 Emmanuel Goldstein

- Simple API: get, put, delete

- K/V pairs are stored in containers called buckets

- Consistency only for a single key

Use cases: Content caching, Web Session info, User profiles, Preferences, Shopping Carts

- Very fast lookups and good scalability (sharding)

- All access via primary key

Don’t use for: Querying by data, multi-operation transactions, relationships between data

Can also be an object, blob, JSON, XML, etc.

Page 4: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Key -> Document

- Structure of docs stored should be similar,

but doesn’t need to be identical

- Like K/V, but value is examinable

Use cases: Event logging, content management systems, blogging platforms, web analytics

- Documents: XML, JSON, BSON, etc

Don’t use for: Complex transactions spanning Different Operations, Strict Schema applications

Key: 0001Value: {firstname: “Nuru”,

lastname: “Abdalla”,

location: “Uguanda”,

languages: [“English, Swahili”],

mother: “Aziza”,

father: “Mufa”,

refugee_camp: “camp-10”

picture: “01010110”

}

Key: 0039Value: {firstname: “Dee”,

location: “Uguanda”,

languages: “Swahili”,

refugee_camp: “camp-54”

picture: “01010110”

}

- Tolerant of incomplete data

Page 5: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Graph Databases

Use cases: Connected Data (social networks), shortest path, Recommendation Engines

Routing-Dispatch-Location services (node = location/address)

Don’t use for: Not easy to cluster/scale beyond one node, sometimes has to traverse entire graph

407-666-4012

GPS coordinates

IMSI #

407-384-4924

+44 #

+44 #

415-242-9492

407-336-1193

Page 6: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

~ Real TimeStorm

ImpalaStinger/Tez

DrillSpark/Shark

- Distributed, real time computation system / Stream processing

- For doing a continuous query on data streams and streaming the results into clients

(continuous computation)

- Still emerging, most are in alpha or beta stages

- Count hash tags

#

Spout

Bolt

Page 7: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Column Family

Col Fam 1

C1 C2 C3 C4

X

Col Fam 2

A B C D

Y

Col Fam 3

1 2 3 4

ROW-1

ROW-2

ROW-3

ROW-4

ROW-5

ROW-6

v1=Z

v2=K

(Table, Row Key, Col. Family, Column, Timestamp) → Value (Z)

Table-Name-Alpha

Region-1

Region-2

Page 8: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Column Family

- Know your R + W queries up front

- Design the data model and system architectureto optimally fulfil those queries

- Important to understand the architecture fundamentals

Page 9: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

How to pick a CF database

Page 10: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

How to pick a CF database

Google Trends

Page 11: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

How to pick a CF database

South Korea

India

USA

China

Russia

Netherlands

South Korea

Belgium

China

Taiwan

Google Trends

Page 12: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

How to pick a CF database

Date Apache Cassandra Apache HBase

Jan 2013 739 783

Feb 2013 714 797

March 2013 837 692

April 2013 730 741

May 2013 567 636

- Check activity on the Apache user mailing lists

Page 13: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Dynamo

BigTable

Nov, 2006

Oct, 2007 Storage Engine

Data Model

Cassandra

Page 15: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

• Written in Java

• Column Family Oriented Databases

• Have reached 1,000+ nodes in production

• Very low latency reads + writes

• Use Log Structured Merge Trees

• Atomic at row level

• No support for joins, transactions, foreign keys

Both:

Page 16: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

• Peer to Peer architecture

• Tunable consistency

• Secondary Indexes available

• Writes to ext3/4

• Conflict resolution handled during reads

• N-way writes

• Random and ordered sorting of row keys supported

• Master / Slave architecture

• Strict consistency

• No native secondary index support

• Writes to HDFS

• Conflict resolution handled during writes

• Pipelined write

• Ordered/Lexicographical sorting of row keys

vs.

Page 17: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Amazon.com’s Dynamo Use Cases

- Best seller lists

- Customer preferences

- Sales rank

- Product catalog

- Session management

Services that only need primary key

access to data store:No need for:

- Complex SQL queries

- Operations spanning multiple data items

• Shopping cart service must always allow

customers to add and remove items

• If there are 2 conflicting versions of a write,

the application should be able to pull both

writes and merge them

• Designed for apps that “need tight control

over the tradeoffs between availability,

consistency, cost-effectiveness and

performance”

Page 18: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Google’s BigTable Use Cases

- Gmail

- YouTube

- Google Earth

- Google Finance

- Google Analytics

- Personalized Search

60 products at Google once

used BigTable:

• Must be able to store the entire web crawl

data

• Rely on GFS for replication and data

availability

• Strong integration with MapReduce

Page 19: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

1

2

3

4

5

6

7

8Client

- Gossip runs ever second on a node

to 3 other nodes

- Used to discover location and state

information about other nodes

- Phi Accrual Failure Detector used to

detect failures via a suspicion level

on a continuous scale

Page 20: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

NameNode JobTracker HBase Master

ZooKeeper

Standby NN

DNTT

MM

R RS

DNTT

MM

R RS

DNTT

MM

R RS

DNTT

MM

R RS

MMM

MM

MRR R

R

OS

2 TB each

SATA RAID

OSOSOS

JTNN HM

HBase Master Standby

Master Machines

Slave Machines

Client

Page 21: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Effort to deploy

- One monolithic database install (1 JVM per node) + 1 log file

and 1 config file (YAML)

- No single points of failure, so no standby master nodes

- Good default settings

- More complex to deploy (multiple JVMs per node) +

many log files and many config files (XMLs)

- More moving parts: HDFS, HBase, MapReduce,

Passive NameNode, Standby HBase Master,

ZooKeeper

- Default settings usually need tweaking

#Cassandra13

Page 22: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Where to write?

Client

ZooKeeper

. . . . . . x xx

Synchronous

Replication via HDFS

-ROOT-

. . . . . .

.META.1 .META.2 .META.3

go to:

go to: go to: go to:

META 1, 2 or 3

RS a,b,c RS a,b,c RS a,b,c

No control over replication or

consistency for each write!-ROOT-

Region Location?

.META.

Root & Meta

Locations cached

R

.META. table?

M

Page 23: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

1

2

3

4

5

6

7

8

Clientcoordinator

R2

R1

R3

Replication F. = 3

Consistency = 1

Where to write?

Page 24: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

1

2

3

4

5

6

7

8

Clientcoordinator

R2

R1

R3

Replication F. = 3

Consistency = 2

Where to write?

Page 25: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

1

2

3

4

5

6

7

8

Clientcoordinator

R4

R2

Replication F. = 4

Consistency = 2

R1

R3

Where to write?

Page 26: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Strong Consistency Costs

- Write to 3 nodes (RF = 3, C = 2)

- Read from at least 2 nodes to guarantee strong consistency

- Write to 3 nodes (RF=3, C=3)

- Read from only 1 node to guarantee strong consistency

#Cassandra13

Page 27: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Log Structured Merge TreesC* HBase

(Table, Row Key, Col. Family, Column, Timestamp) → Value (Z)

Node

JVM

WAL

Memstore

Z

Z A B C D

HFile

FlushCommit

Log

Memtable

SStable

Page 28: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Log Structured Merge TreesC* HBase

Node

JVM

WAL

Memstore

Z

Z A B C D

Flush

Commit

Log

Memtable

SStable

, Value

Z A B C D

Flush

HFile

SStable

HFile

Page 29: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Flush

SStable

Z A B C D

HFile

Flush Details

Bloom Filter

Block Index

R only R + C

Z

A

B

C

D

- In HBase BF and BI are stored in the Hfile

- In C*, there are separate data, BF and Index Files.

Page 30: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Flush per Column Family

- Supported

- Flushes all Column Families together

- Unnecessary flushing puts more network pressure on

HBase since Hfiles have to be replicated to 2 other

HDFS nodes

- Flush per CF is under development via JIRA 3149

#Cassandra13

Page 31: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Secondary Indexes

- Native support for Secondary Indexes

- No native Secondary Indexes

- But a trigger can be launched after a put to

keep a secondary index (another CF) up to

date and not put the burden on the client

#Cassandra13

Page 32: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

SSD Support

- It is possible to place just the SStables on SSD

In YAML file, set commitlog_directory to spinning disks and

set data_file_directories to SSD

- See Rick Branson’s talk:

youtube.com/watch?v=zQdDi9pdf3I

- Not possible to tell HDFS to only store WAL or HFiles on SSD

- There is some support in MapR and Intel distributions for this

- Apache HDFS JIRAs 2832 & 4672 have preliminary discussions

#Cassandra13

Page 33: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Compactions

- Tiered and Leveled

- For leveled, see J. Ellis’s blog post:

- Only Tiered

- Note, many new algorithms and improvements coming in

HBase 0.95 like Stripe Compactions (JIRA 7667)

datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

#Cassandra13

https://issues.apache.org/jira/secure/attachment/12575449/Stripe%20compactions.pdf

Page 34: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Reading after disk failure

- Reads can just be fulfilled from another node natively

- After a disk failure, the slave machine will read

missing data from a remote disk until compaction

happens. So, region reads can be slow.

Page 35: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Data Partitioning

- Supports ordered partitioner and random partitioner

- Only supports ordered partitioner

- Row key range scans possible

- It is possible to externally md5 hash the row key and

add the hash to the row key: md5-rowkey

#Cassandra13

Page 36: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Triggers / Coprocessors

- Under development for C* 2.0, JIRA 1311

- Supported by Coprocessors (so after a get/put/del

on a column family, a trigger can be executed.

- Triggers are coded as java classes

#Cassandra13

Page 37: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Compare & Set

- Under development for C* 2.0

- Supported

#Cassandra13

Page 38: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

Multi-Datacenter/DR Support

- Very mature and well tested

- Synchronous or Asynchronous replication to DR

- Recovery Point Objective (RPO) can be 0

- Not as robust

- Only Asynchronous replication to DR

- Recovery Point Objective (RPO) cannot be 0

#Cassandra13

Page 39: C* Summit 2013: Comparing Architectures: Cassandra vs the Field by Sameer Farooqui

blueplastic.com/c.pdf

Sameer [email protected]

- Freelance Big Data consultant and trainer

- Taught 50+ courses on Hadoop, HBase, Cassandra and OpenStack

- Datastax authorized training partner

Ex: Hortonworks, Accenture R&D, Symantec

linkedin.com/in/blueplastic/

@blueplastic

http://youtu.be/ziqx2hJY8Hg

#Cassandra13