Top Banner
©2013 DataStax Confidential. Do not distribute without consent. Jonathan Ellis Cassandra: Beyond Bigtable CTO, DataStax
106

Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Jan 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

©2013 DataStax Confidential. Do not distribute without consent.

Jonathan Ellis

Cassandra: Beyond Bigtable

CTO, DataStax

Page 2: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Five years of Cassandra

Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13

0.1 0.3 0.6 0.7 1.0 1.2...

2.0

DSE

Jul-08

Page 3: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo

Page 4: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

Page 5: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

•Runtime “column” (cell) definition

Page 6: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

•Runtime “column” (cell) definition

•Schema-agnostic

Page 7: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

•Runtime “column” (cell) definition

•Schema-agnostic

•Size-tiered compaction

Page 8: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

•Runtime “column” (cell) definition

•Schema-agnostic

•Size-tiered compaction

•Gossip-based cluster status + failure detection

Page 9: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

•Runtime “column” (cell) definition

•Schema-agnostic

•Size-tiered compaction

•Gossip-based cluster status + failure detection

•Hinted handoff

Page 10: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

•Runtime “column” (cell) definition

•Schema-agnostic

•Size-tiered compaction

•Gossip-based cluster status + failure detection

•Hinted handoff•Read repair

Page 11: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

•Runtime “column” (cell) definition

•Schema-agnostic

•Size-tiered compaction

•Gossip-based cluster status + failure detection

•Hinted handoff•Read repair

•Anti-entropy repair

Page 12: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable + Dynamo•LSMT / SSTables

•Runtime “column” (cell) definition

•Schema-agnostic

•Size-tiered compaction

•Gossip-based cluster status + failure detection

•Hinted handoff•Read repair

•Anti-entropy repair•Eventually consistent

Page 13: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

... with some differences

Page 14: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

... with some differences•SuperColumns

Page 15: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

... with some differences•SuperColumns•Indexes

Page 16: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

... with some differences•SuperColumns•Indexes

•Timestamp-based conflict resolutionhttp://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks

Page 17: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Bigtable-inspired API list<ColumnOrSuperColumn> get_slice( 1:required binary key, 2:required ColumnParent column_parent, 3:required SlicePredicate predicate, 4:required ConsistencyLevel consistency_level)

Page 18: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Two years ago

•CQL: native protocol, prepared statements•Triggers

•Entity groups

•Smarter range queries enabling Hive predicate push-down

•Blue sky: streaming / CEP•Ease Of Use

Page 19: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Two years ago

•CQL: native protocol, prepared statements•Triggers

•Entity groups

•Smarter range queries enabling Hive predicate push-down

•Blue sky: streaming / CEP•Ease Of Use

Page 20: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Two years ago

•CQL: native protocol, prepared statements•Triggers

•Entity groups

•Smarter range queries enabling Hive predicate push-down

•Blue sky: streaming / CEP•Ease Of Use

Page 21: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Two years ago

•CQL: native protocol, prepared statements•Triggers

•Entity groups

•Smarter range queries enabling Hive predicate push-down

•Blue sky: streaming / CEP•Ease Of Use

Page 22: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Two years ago

•CQL: native protocol, prepared statements•Triggers

•Entity groups

•Smarter range queries enabling Hive predicate push-down

•Blue sky: streaming / CEP•Ease Of Use

Page 23: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Two years ago

•CQL: native protocol, prepared statements•Triggers

•Entity groups

•Smarter range queries enabling Hive predicate push-down

•Blue sky: streaming / CEP•Ease Of Use

Page 24: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Two years ago

•CQL: native protocol, prepared statements•Triggers

•Entity groups

•Smarter range queries enabling Hive predicate push-down

•Blue sky: streaming / CEP•Ease Of Use

Page 25: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

User defined typesCREATE TYPE address ( street text, city text, zip_code int, phones set<text>)

CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address>)

SELECT id, name, addresses.city, addresses.phones FROM users;

id | name | addresses.city | addresses.phones--------------------+----------------+-------------------------- 63bf691f | jbellis | Austin | {'512-4567', '512-9999'}

Page 26: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Collection indexingCREATE TABLE songs (

id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text>);

CREATE INDEX song_tags_idx ON songs(tags);

SELECT * FROM songs WHERE 'blues' IN tags;

id | album | artist | tags | title----------+---------------+-------------------+-----------------------+------------------ 5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind

Page 27: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Cassandra is a...•Partitioned row store with extensions•Typed document database

•Object database

•?

Page 28: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

SELECT * FROM usersWHERE username = ’jbellis’

[empty resultset]

INSERT INTO users (...)VALUES (’jbellis’, ...)

Session 1SELECT * FROM usersWHERE username = ’jbellis’

[empty resultset]

INSERT INTO users (...)VALUES (’jbellis’, ...)

Session 2

Paxos / CAS

Page 29: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Prepare / promise

Leader Replica

Replica

Replica

Leader Replica

Replica

Replica

Page 30: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Propose / accept

Leader Replica

Replica

Replica

Leader Replica

Replica

Replica

Page 31: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Read / results

Leader Replica

Replica

Replica

Leader Replica

Replica

Replica

Page 32: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Commit / acknowledge

Leader Replica

Replica

Replica

Leader Replica

Replica

Replica

Page 33: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Paxos stateCREATE TABLE paxos ( row_key blob, cf_id UUID, in_progress_ballot timeuuid, proposal_ballot timeuuid, proposal blob, most_recent_commit_at timeuuid, most_recent_commit blob, PRIMARY KEY (row_key, cf_id))

Page 34: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Implications•4 round trips vs 1 for normal updates•Paxos state is durable

•Linearizable consistency with no leader election or failover

•ConsistencyLevel.SERIAL

•http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

Page 35: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

UPDATE USERS SET email = ’[email protected]’, ...WHERE username = ’jbellis’IF email = ’[email protected]’;

INSERT INTO USERS (username, email, ...)VALUES (‘jbellis’, ‘[email protected]’, ... )IF NOT EXISTS;

Syntax

Page 36: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

TriggersCREATE TRIGGER <name> ON <table>USING <classname>;

Page 37: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Trigger implementationclass MyTrigger implements ITrigger{ public Collection<RowMutation> augment (ByteBuffer key, ColumnFamily update) { ... }}

Page 38: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Atomicity?

Page 39: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 40: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 41: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 42: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 43: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 44: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 45: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 46: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 47: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

Page 48: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

X

Page 49: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

X

Page 50: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

X

Page 51: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

X

Page 52: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Batches

CoordinatorNode

RedReplica

YellowReplica

BlueReplica

BatchlogNode

X

Page 53: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Off-HeapNot managed by GC

Java Process

On-HeapManaged by GC

On-Heap/Off-Heap

Page 54: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Read path (per sstable)

Bloomfilter

Memory

Disk

Page 55: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Read path (per sstable)

Bloomfilter

Memory

Disk

Partitionkey cache

Page 56: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Read path (per sstable)

Bloomfilter

Memory

Disk

Partitionkey cache

Partitionsummary

0X...0X...0X...

Page 57: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Read path (per sstable)

Bloomfilter

Memory

Disk0X...0X...0X...0X...

Partitionindex

Partitionkey cache

Partitionsummary

0X...0X...0X...

Page 58: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Read path (per sstable)

Bloomfilter

Memory

Disk0X...0X...0X...0X...

Partitionindex

Compressionoffsets

Partitionkey cache

Partitionsummary

0X...0X...0X...

Page 59: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Read path (per sstable)

Bloomfilter

Memory

Disk0X...0X...0X...0X...

PartitionindexData

Compressionoffsets

Partitionkey cache

Partitionsummary

0X...0X...0X...

Page 60: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Off heap in 2.0Partition key bloom filter1-2GB per billion partitions

Bloomfilter

Memory

Disk0X...0X...0X...0X...

PartitionindexData

Compressionoffsets

Partitionkey cache

Partitionsummary

0X...0X...0X...

Page 61: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Off heap in 2.0Compression metadata~1-3GB per TB compressed

Bloomfilter

Memory

Disk0X...0X...0X...0X...

PartitionindexData

Compressionoffsets

Partitionkey cache

Partitionsummary

0X...0X...0X...

Page 62: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Off heap in 2.0Partition index summary(depends on rows per partition)

Bloomfilter

Memory

Disk0X...0X...0X...0X...

PartitionindexData

Compressionoffsets

Partitionkey cache

Partitionsummary

0X...0X...0X...

Page 63: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Compaction•Size-tiered•Leveled

•Others?

Page 64: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Size-tiered compaction

Page 65: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Leveled compaction

L0

L1

L2

L3

L4

L5

Page 66: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Sad leveled compaction

L0

L1

L2

L3

L4

L5

Page 67: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

STCS in L0

L0

L1

L2

L3

L4

L5

Page 68: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

HLL and compaction

Page 69: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

HLL and compaction

Page 70: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

HLL and compaction

Page 71: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Data-aware compaction?•Append-only workloads

•No compaction necessary in trivial case; still needed for clustered scans

•Append-mostly workloads?•Bounded window for out-of-order updates

Page 72: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Rapid Read Protection

NONE

Page 73: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Typical reads

Client Coordinator

40%busy

90%busy

30%busy

Page 74: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Typical reads

Client Coordinator

40%busy

90%busy

30%busy

Page 75: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Typical reads

Client Coordinator

40%busy

90%busy

30%busy

Page 76: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Typical reads

Client Coordinator

40%busy

90%busy

30%busy

Page 77: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Typical reads

Client Coordinator

40%busy

90%busy

30%busy

Page 78: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

A failure

Client Coordinator

40%busy

90%busy

30%busy

Page 79: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

A failure

Client Coordinator

40%busy

90%busy

30%busy

Page 80: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

A failure

Client Coordinator

40%busy

90%busy

30%busy

Page 81: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

A failure

Client Coordinator

40%busy

90%busy

30%busyX

Page 82: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

A failure

Client Coordinator

40%busy

90%busy

30%busyXtimeout

Page 83: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Failure with read protection

Client Coordinator

40%busy

90%busy

30%busy

Page 84: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Failure with read protection

Client Coordinator

40%busy

90%busy

30%busy

Page 85: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Failure with read protection

Client Coordinator

40%busy

90%busy

30%busy

Page 86: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Failure with read protection

Client Coordinator

40%busy

90%busy

30%busyX

Page 87: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Failure with read protection

Client Coordinator

40%busy

90%busy

30%busyX

Page 88: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Failure with read protection

Client Coordinator

40%busy

90%busy

30%busyX

Page 89: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Failure with read protection

Client Coordinator

40%busy

90%busy

30%busyXsuccess

Page 90: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Latency (mid-compaction)

Page 91: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 92: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 93: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 94: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 95: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 96: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 97: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 98: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 99: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

More-efficient repair

Page 100: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Tombstones!DELETE FROM usersWHERE username = 'jbellis'

Coordinator Replica

Replica

Replica

jbellis

jbellis

jbellis

Page 101: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

When can we purge?•gc_grace_seconds

Page 102: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Pain points

Page 103: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Pain points•Easy to write a query that is O(N) in the number of tombstones•Tombstones must be read-repaired

Page 104: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Pain points•Easy to write a query that is O(N) in the number of tombstones•Tombstones must be read-repaired

•Clumsy hammer•tombstone_warn_threshold: 1000

•tombstone_failure_threshold: 100000

Page 105: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

Pain points•Easy to write a query that is O(N) in the number of tombstones•Tombstones must be read-repaired

•Clumsy hammer•tombstone_warn_threshold: 1000

•tombstone_failure_threshold: 100000

•http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

Page 106: Cassandra: Beyond Bigtable · Bigtable + Dynamo •LSMT / SSTables •Runtime “column” (cell) definition •Schema-agnostic •Size-tiered compaction •Gossip-based cluster status

©2013 DataStax Confidential. Do not distribute without consent. 48