Top Banner
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License HOW CASSANDRA DELETES DATA Alain Rodriguez
70

How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Apr 16, 2017

Download

Software

DataStax
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

HOW CASSANDRA DELETES DATAAlain Rodriguez

Page 2: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

• Tombstone issues

• Why tombstones

• Tombstone removal

Page 3: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Introduction

Page 4: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

About The Last Pickle

Page 5: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

About The Last Pickle and Alain Rodriguez

Page 6: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

About The Last Pickle and Alain Rodriguez

Page 7: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016
Page 8: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016
Page 9: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

About deletes in Cassandra

Deleted data in Cassandra do not just disappear,

Page 10: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Deleted data in Cassandra do not just disappear,

instead a tombstone is added.

About deletes in Cassandra

Page 11: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Ok so what’s the matter, why this talk ?

Tombstone are needed in Cassandra, not an issue…

Page 12: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Ok so what’s the matter, why this talk ?

Tombstone are needed in Cassandra, not an issue…

…until an SSTables or a result to a query look like this…

Page 13: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016
Page 14: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Then we can see that in the user mailing list or other community tools

Ok so what’s the matter, why this talk ?

Page 15: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Then we can see that in the user mailing list or other community tools

So I thought I could share,about this topic.

thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

Ok so what’s the matter, why this talk ?

Page 16: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues

Page 17: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: impacts

The read path: Reading tombstones induces

Latencies, Timeouts or Exceptions

Page 18: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: impacts

The read path: Reading tombstones induces

Latencies, Timeouts or Exceptions

The disk space: tombstones can fill up the disk

100%

Page 19: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: impacts

The read path: Reading tombstones induces

Latencies, Timeouts or Exceptions

The disk space: tombstones can fill up the disk

I am facing one of these issues, is it caused by tombstones?

100%

Page 20: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: Read Path

grep -i -e "ERROR" -e "WARN" /var/log/cassandra/system.log

Page 21: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: Read Path

grep -i -e "ERROR" -e "WARN" /var/log/cassandra/system.log

WARN [SharedPool-Worker-7] 2016-07-16 16:31:09,048 SliceQueryFilter.java:319 - Read 276 live and 1104 tombstone cells in mykeyspace.mytable for key: ItV9kZC8mFNiSvYM8AwufBU8tTtJkW5dUH5MNcq1H18 (see

tombstone_warn_threshold). 500 columns were requested, slices=[-]

Page 22: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: Read Path

grep -i -e "ERROR" -e "WARN" /var/log/cassandra/system.log

WARN [SharedPool-Worker-7] 2016-07-16 16:31:09,048 SliceQueryFilter.java:319 - Read 276 live and 1104 tombstone cells in mykeyspace.mytable for key: ItV9kZC8mFNiSvYM8AwufBU8tTtJkW5dUH5MNcq1H18 (see

tombstone_warn_threshold). 500 columns were requested, slices=[-]

ERROR [ReadStage:290729] 2016-07-16 17:00:18,708 SliceQueryFilter.java (line 206) Scanned over 100000 tombstones in mykeyspace.mytable; query aborted (see tombstone_failure_threshold) ERROR [ReadStage:290729] 2016-04-22 17:00:18,709 CassandraDaemon.java (line 258) Exception in thread Thread[ReadStage:290729,5,main]

java.lang.RuntimeException: org.apache.cassandra.db.filter.TombstoneOverwhelmingException

Page 23: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: Read Path

tombstoneScannedHistogram metric

Through nodetool cfstats, JMX…

Page 24: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: Read Path

tombstoneScannedHistogram metric

Through or a plugged monitoring tool such as Datadog, Grafana, SPM, OpsCenter…

Commercial

Free

Page 25: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: Disk space

DroppableTombstoneRatio metric provide interesting info.

Page 26: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Tombstone issues: Disk space

DroppableTombstoneRatio metric provide interesting info.

Through sstablemetadata tool, JMX and plugged monitoring tool such as Datadog, Grafana, SPM, OpsCenter, etc.

Possible to write a script to check biggest SSTables ratio for example

Page 27: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why tombstones?I want to remove data !

Page 28: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Cassandra write pathWrite path

Client write

Memory

Disk

Memtable

Commit Log SSTable SSTable

SSTable SSTable

Cassandra node

Flush

Immutable

Page 29: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Cassandra write pathWrite path

Client write

Memory

Disk

Memtable

Commit Log SSTable SSTable

SSTable SSTable

Cassandra node

Immutable

Client read

Flush

Page 30: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed system

Cassandra is a distributed system

Distributed deletes are tricky !

Page 31: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Cassandra consistency Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Page 32: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Cassandra consistency Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Page 33: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Cassandra consistency Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Page 34: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Cassandra consistency Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

Page 35: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Cassandra consistency & availability Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

Down

Client write “A”

Client read “A”

Ack

Ack

High availability

Page 36: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Page 37: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

Page 38: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

Page 39: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”

Client read “A”

Ack

Ack

Wrong

Page 40: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”

Client read “empty”

Ack

Ack

Correct

Page 41: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”

Client read “A”

Ack

Ack

Wrong

Page 42: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”

Client read “A”

Ack

Ack

Wrong

Page 43: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A

A

A

Client delete “A”

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”Ack

Ack

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”

Client read “A”

Ack

Ack

Wrong

Page 44: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Why Tombstones: Distributed deletes

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A

A

A

Client delete “A”

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”Ack

Ack

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”

Client read “A*”meaning “empty”

Ack

Ack

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”

Client read “A”

Ack

Ack

Wrong Correct

Page 45: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Cool story, but I really want to remove the data !

Tombstone removal!

Page 46: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

When are tombstones removed?

When should tombstones be removed?• Once the tombstone is fully replicated• When deleted data has been removed

Page 47: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

When are tombstones removed?

When should tombstones be removed?• Once the tombstone is fully replicated• When deleted data has been removed

When are tombstones actually removed?• After gc_grace_seconds• During compactions

IF all the deleted data and the tombstone itself are involved

Page 48: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

How tombstones are removed: Compaction!Write path

Client write

Memory

Disk

Memtable

Commit Log SSTable SSTable

SSTable SSTable

Cassandra node

Immutable

Client read

Flush

Page 49: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

How tombstones are removed: Compaction!Write path

Client write

Memory

Disk

Memtable

Commit Log SSTable SSTable

SSTable SSTable

Cassandra node

Immutable

Client read

Compacting 4 SSTables

Flush

Page 50: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

How tombstones are removed: Compaction!Write path

Client write

Memory

Disk

Memtable

Commit Log

SSTable

Cassandra node

Immutable

Client read

Flush

Page 51: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Implications in the real world

• No compaction = no eviction• + TTLs or deletes, tombstone stack (up to 100%)

Page 52: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Implications in the real world

• No compaction = no eviction• + TTLs or deletes, tombstone stack (up to 100%)

• Overlapping SSTable = no eviction• Fragmented data = eviction unlikely• LCS: tombstone level ≠ than data level = no eviction

Page 53: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Implications in the real world

• No compaction = no eviction• + TTLs or deletes, tombstone stack (up to 100%)

• Overlapping SSTable = no eviction• Fragmented data = eviction unlikely• LCS: tombstone level ≠ than data level = no eviction

• TTL << gc_grace_seconds = high % of useless data

Page 54: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning !

Good news:

Cassandra community and Committers are Awesome!

Page 55: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning !

Issue: No compaction = No eviction

CASSANDRA-3442: tombstone_threshold (C* 1.2.b1)

Compaction option, default:tombstone_threshold = 0.2 (ratio = 20% has been deleted)

Single SSTable compaction triggered based on an estimate!Low risk: worst case —> No-op

Page 56: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning !

Issue: Tombstone compaction loop!

CASSANDRA-4022: Check for key overlaps (C* 1.2.b1)

Internals improvement, not an option:

Estimated droppable tombstone improvedNow considering key overlapping with other SSTable

Page 57: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning !

Issue: Tombstone compaction loop!

CASSANDRA-4781: tombstone_compaction_interval (C* 1.2.b2)

Compaction option, default:tombstone_compaction_interval = 86400 (in seconds = 1 day)Definitely prevents loops

Page 58: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning !

Issue: Compacting to remove tombstone is expensive

CASSANDRA-5228: Expired SSTables (C*2.0.b1)

Internals improvement, not an optionEffective with Time series, DTCS / TWCS and TTLs !

Page 59: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning !

Issue: Tombstone compactions not triggering

CASSANDRA-6563: unchecked_tombstone_compaction (C* 2.0.9)

Compaction option, default:unchecked_tombstone_compaction = false

CASSANDRA-4022 becomes an option

Page 60: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning !

Issue: Overlapping preventing efficient tombstone compactions

CASSANDRA-7019: provide_overlapping_tombstones (C* 3.10)

Compaction option, default:provide_overlapping_tombstones = NONE (CELL / ROW / NONE)

Risky: • Not yet released, so not really tested• Heavier tombstones compactions

Page 61: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning - Tombstone distribution ! WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A

A

A

Client delete “A”

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”Ack

Ack

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”

Client read “A*”meaning “empty”

Ack

Ack

Correct

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

Tombstones not replicated Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”

Client read “A*”

Ack

Ack

Correct

Page 62: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning - Tombstone distribution !

Case were node fail + no repair=

Case without tombstone

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A

A

A

Client delete “A”

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”Ack

Ack

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”

Client read “A*”meaning “empty”

Ack

Ack

Correct

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

Tombstones not replicated Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

AClient read “A” Wrong

A* removed

Page 63: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning - Tombstone distribution !

Case were node fail + no repair=

Case without tombstone=

Zombie data !

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A

A

A

Client delete “A”

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”Ack

Ack

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”

Client read “A*”meaning “empty”

Ack

Ack

Correct

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

?

?

?

Client write “A”

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”Ack

Ack

StrongConsistency

Consistency Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

?

Client write “A”

Client read “A ”

Ack

Ack

StrongConsistency

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

Client delete “A”

WITHOUT Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

Client delete “A”Ack

Ack

Tombstones not replicated Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

AClient read “A” Wrong

A* removed

Page 64: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning - Tombstone distribution !

CASSANDRA-6434 (C*3.0.b1):

only_purge_repaired_tombstones(Default: False)

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A

A

A

Client delete “A”

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”Ack

Ack

Tombstones not replicated Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

A* not removed

Client read “A*”meaning “empty” Correct

Page 65: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Some tuning - Tombstone distribution !

CASSANDRA-6434 (C*3.0.b1):

only_purge_repaired_tombstones(Default: False)

Limitation

Repair failing or no repair=

permanent tombstone

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2

A

A

A

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A

A

A

Client delete “A”

WITH Tombstones Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

Client delete “A”Ack

Ack

Tombstones not replicated Cassandra Cluster 4 nodes RF = 3 Write CL = Quorum = 2 Read CL = Quorum = 2 A* = Tombstone on A

A*

A*

A

A* not removed

Client read “A*”meaning “empty” Correct

Page 66: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016
Page 67: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Conclusion

Page 68: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Things we know about tombstones

• Tombstones due to deletes and TTLs• Tombstone fits with Cassandra write path• Tombstones ensure consistency

• Reading tombstones is expensive and can produce failures• Tombstones take space on disk and might be tricky to remove• Tombstones need to be distributed before being removed

Page 69: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Takeaways

• Model data and workflow to avoid to reading many tombstones

• Deleted data = repair table within gc_grace_seconds

• Monitor tombstones, keep control! (Set some alerts ?)

• Use compaction options to tackle problems, there is always a way.

• Is there no way? Ask, or create a Jira and keep improving Cassandra!

Page 70: How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Summit 2016

Thank youQuestions ?

thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html