Top Banner
Improving Tombstone Compactions in Apache Cassandra Jim Witschey Philip Thompson
20

Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Jan 06, 2017

Download

Software

DataStax
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Improving Tombstone Compactions in Apache CassandraJim WitscheyPhilip Thompson

Page 2: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

What are Tombstones

Page 3: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

C* Read/Write Path

commit log

Memtable

SSTable

Write

Page 4: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

C* Read/Write Path

Page 5: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Tombstones

How do we handle deletes?

Page 6: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Tombstones

Deletion artifact to handle consistency issues

Page 7: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Tombstones

Safe to purge after gc_grace_seconds

Page 8: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Why Tombstones are Terrible

Page 9: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Tombstones are Terrible for Queries

• Tombstones returned not transparent to dev/client

• OOMs possible

Page 10: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Tombstones are Terrible for Operators (You!)

• Zombie Data from Repair, or lost disks, or restored nodes, or lots of stupid reasons

• Must repair within gc grace!• No disk space!

Page 11: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

What is CASSANDRA-7019“Improve Tombstone Compactions”

Page 12: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Pre CASSANDRA-7019

• Single sstable Tombstone purges based on % tombstones

• Major compactions• This has limitations

Page 13: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

CASSANDRA-7272

Major LCS Compaction

Page 14: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

CASSANDRA-7019

What was our goal?

Page 15: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

CASSANDRA-7019

A new algorithm for tombstone compactions

Page 16: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

nodetool garbagecollect

Trigger a tombstone purging compaction

Page 17: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

What great new things can we do now?

Page 18: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Disk space vs. Perf

• Cassandra-stress with the new CASSANDRA-7019 options• 50% Inserts• 33% Reads• 3% partition deletes• 6% row deletes• 6% cell deletes

Page 19: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Disk space vs Perf

None:

ROW:

CELL:

Page 20: Improving Tombstone Compactions in Apache Cassandra (James Witschey & Philip Thompson, DataStax) | C* Summit 2016

Using nodetool garbagecollect

• Use it as-needed during non-peak load!• Reclaim all your disk space, while not

upsetting your users!