Top Banner
SF CASSANDRA USERS MARCH 2016 CQL PERFORMANCE WITH APACHE CASSANDRA 3.0 Aaron Morton @aaronmorton CEO Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
70

Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Feb 14, 2017

Download

Technology

aaronmorton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

SF CASSANDRA USERS MARCH 2016

CQL PERFORMANCE WITH APACHE CASSANDRA 3.0

Aaron Morton@aaronmorton

CEO

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Page 2: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

About The Last Pickle.

Work with clients to deliver and improve Apache Cassandra based solutions.

Apache Cassandra Committer and DataStax MVPs.

Based in New Zealand, Australia, France & USA.

Page 3: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

How We Got HereStorage Engine 3.0

Write PathRead Path

Page 4: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

How We Got Here

Way back in 2011…

Page 5: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

2011

Blog: Cassandra Query Plans

http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html

Page 6: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

2012

Talk: Technical Deep Dive - Query Performance

https://www.youtube.com/watch?v=gomOKhMV0zc

Page 7: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

2012

Explain Read & Write performance in 45 minutes.

Page 8: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Skip Forward to 2016

Blog: Introduction To The Apache Cassandra 3.x Storage

Enginehttp://thelastpickle.com/blog/2016/03/04/introductiont-to-

the-apache-cassandra-3-storage-engine.html

Page 9: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Skip Forward to 2016

“Why don’t I do another talk about Cassandra performance.”

Page 10: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Skip Forward to 2016

It was a busy 4 years…

Page 11: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Skip Forward to 2016

CQL 3, Collection Types, UDTs, UDF’s, UDA’s,

Materialised Views, Triggers, SASI,…

Page 12: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Skip Forward to 2016

Explain Read & Write performance in 45 minutes.

Page 13: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

So Lets Avoid

CQL 3, Collection Types, UDTs, UDF’s, UDA’s,

Materialised Views, Triggers, SASI,…

Page 14: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

How We Got HereStorage Engine 3.0

Write PathRead Path

Page 15: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

High Level Storage Engine 3.0

Page 16: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Files

Data.db Index.db Filter.db

Page 17: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 FilesCompressionInfo.db

Statistics.db Digest.crc32

CRC.db Summary.db TOC.txt

Page 18: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

CQL Recapcreate table my_table ( partition_1 text, cluster_1 text, foo text, bar text, baz text, PRIMARY KEY (partition_1, cluster_1) );

Page 19: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

CQL Recap

WARNING: FAKE DATA AHEAD

Page 20: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

CQL With Thrift Pre 3.0[default@dev] list my_table; ------------------- RowKey: part_a => (column=clust_a:, value=, timestamp=1357…739000) => (column=clust_a:foo, value=some foo, timestamp=1357…739000) => (column=clust_a:bar, value=and bar, timestamp=1357…739000) => (column=clust_a:baz, value=no baz, timestamp=1357…739000) => (column=clust_b:, value=, timestamp=1357…739000) => (column=clust_b:foo, value=no foo, timestamp=1357…739000) => (column=clust_b:bar, value=no bar, timestamp=1357…739000) => (column=clust_b:baz, value=lots baz, timestamp=1357…739000)

Page 21: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

CQL Pre 3.0

Clustering Keys RepeatedColumn Names Repeated

Timestamps RepeatedFixed Width Encoding

No Knowledge Of Row Contents

Page 22: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Improvements

Delta EncodingVariable Int Encoding

Clustering Written OnceAggregated Metadata

Cell Presence

Page 23: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

SerializationHeader

For each SSTable*.

Stored in each SSTable.

Held in memory.

Page 24: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

SerializationHeaderpublic class SerializationHeader { private final AbstractType<?> keyType; private final List<AbstractType<?>> clusteringTypes;

private final PartitionColumns columns; private final EncodingStats stats; … }

Page 25: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

EncodingStats

Collected on the fly by the Memtable.

Page 26: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

EncodingStatspublic class EncodingStats { public final long minTimestamp; public final int minLocalDeletionTime; public final int minTTL; … }

Page 27: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

SerializationHeaderpublic class SerializationHeader { public void writeTimestamp(long timestamp, DataOutputPlus out) throws IOException

{ out.writeUnsignedVInt(timestamp - stats.minTimestamp);

} … }

Page 28: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

VIntCodingpublic class VIntCoding { public static void writeUnsignedVInt(long value, DataOutput output) throws IOException { int size = VIntCoding.computeUnsignedVIntSize(value); if (size == 1) { output.write((int)value); return; }

output.write(VIntCoding.encodeVInt(value, size), 0, size); }

Page 29: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Improvements

Delta EncodingVariable Int Encoding

Clustering Written OnceAggregated Metadata

Cell Presence

Page 30: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

CQL With Thrift Pre 3.0[default@dev] list my_table; ------------------- RowKey: part_a => (column=clust_a:, value=, timestamp=1357…739000) => (column=clust_a:foo, value=some foo, timestamp=1357…739000) => (column=clust_a:bar, value=and bar, timestamp=1357…739000) => (column=clust_a:baz, value=no baz, timestamp=1357…739000) => (column=clust_b:, value=, timestamp=1357…739000) => (column=clust_b:foo, value=no foo, timestamp=1357…739000) => (column=clust_b:bar, value=no bar, timestamp=1357…739000) => (column=clust_b:baz, value=lots baz, timestamp=1357…739000)

Page 31: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Data.db

Page 32: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Partition Header

Partition KeyPartition Deletion Information

Page 33: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Partition Header

Page 34: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Row

Clustering InformationRow Level LivenessRow Level DeletionColumn Presence

Columns

Page 35: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Row

Page 36: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Clustering Block

Clustering Cell PresenceClustering Cells

Page 37: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Clustering Block

Page 38: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Improvements

Delta EncodingVariable Int Encoding

Clustering Written OnceAggregated Cell Metadata

Cell Presence

Page 39: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

CQL With Thrift Pre 3.0[default@dev] list my_table; ------------------- RowKey: part_a => (column=clust_a:, value=, timestamp=1357…739000) => (column=clust_a:foo, value=some foo, timestamp=1357…739000) => (column=clust_a:bar, value=and bar, timestamp=1357…739000) => (column=clust_a:baz, value=no baz, timestamp=1357…739000) => (column=clust_b:, value=, timestamp=1357…739000) => (column=clust_b:foo, value=no foo, timestamp=1357…739000) => (column=clust_b:bar, value=no bar, timestamp=1357…739000) => (column=clust_b:baz, value=lots baz, timestamp=1357…739000)

Page 40: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Aggregated Cell Metadata

Only store Cell Timestamp, TTL, and Local Deletion Time if different to

the Row.

Page 41: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Aggregated Cell MetadataSimple Cell Component Byte Size

Flags 1

Optional Cell Timestamp (delta) varint 1…n

Optional Cell Local Deletion Time (delta) varint 1…n

Optional Cell TTL (delta) varint 1…n

Fixed Width Cell Value Byte Size

Value 1…n

Optional Cell Value See Below

Variable Width Cell Value Byte Size

Value Length varint 1…n

Value 1…n

Apache Cassandra 3.0 Storage Engine

Page 42: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Improvements

Delta EncodingVariable Int Encoding

Clustering Written OnceAggregated Cell Metadata

Cell Presence

Page 43: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Cell Presence

SSTable stores list of Cells in this SSTable.

Rows stores bitmap of Cells in this Row, with reference to SSTable.

Page 44: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Storage Engine 3.0 Row

Page 45: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Remember Where We Came From[default@dev] list my_table; ------------------- RowKey: part_a => (column=clust_a:, value=, timestamp=1357…739000) => (column=clust_a:foo, value=some foo, timestamp=1357…739000) => (column=clust_a:bar, value=and bar, timestamp=1357…739000) => (column=clust_a:baz, value=no baz, timestamp=1357…739000) => (column=clust_b:, value=, timestamp=1357…739000) => (column=clust_b:foo, value=no foo, timestamp=1357…739000) => (column=clust_b:bar, value=no bar, timestamp=1357…739000) => (column=clust_b:baz, value=lots baz, timestamp=1357…739000)

Page 46: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

How We Got HereStorage Engine 3.0

Write PathRead Path

Page 47: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Write Path

Commit LogMerge Into Memtable

Page 48: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Commit Log

Allocate space in the current commit log segment.

Page 49: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Allocate Segmento.a.c.m.

CommitLog.WaitingOnSegmentAllocation.95thPercentile

Page 50: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Merge Into Memtable

Find the Partition.

Loop trying to update the Rows in it using CAS.

Page 51: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Merge Into Memtable

If more than 10MB wasted allocations move to

Pessimistic locking on the Partition object.

Page 52: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

How We Got HereStorage Engine 3.0

Write PathRead Path

Page 53: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Read Paths

Ignoring Index Read paths.

Page 54: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Read Commands

PartitionRangeReadCommand SinglePartitionReadCommand

Page 55: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

AbstractClusteringIndexFilter

ClusteringIndexNamesFilter (When we know the column names.)

ClusteringIndexSliceFilter (When we do not know the column names.)

Page 56: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

ClusteringIndexNamesFilter

When we know what Columns to select, we know

when the search is over.

Page 57: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

ClusteringIndexNamesFilter1. Get Partition From Memtables.2. Filter named columns into a temporary

result.3. Select SSTables that may contain Partition

Key.4. Order in descending timestamp order.5. Read from SSTables in order.

Page 58: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Names Filter Short Circuits

If result has a Partition Deletion newer than next SSTable max

timestamp.

Stop Search.

Page 59: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Names Filter Short Circuits

If read all Columns and max timestamp of next SSTable less than selected Columns min timestamp.

Stop Search.

Page 60: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Names Filter Short Circuits

Note: list of Columns remaining to select is pruned after every SSTable is read based on max timestamp.

Page 61: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Names Filter Short Circuits

If search clustering value not within clustering range in the SSTable.

Skip SSTable.

Page 62: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Names Filter Short Circuits

If SSTable Cell not in search set.

Skip reading value.

Page 63: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

ClusteringIndexSliceFilter

When we do not know which columns to select, the search ends when it is exhausted.

Page 64: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

ClusteringIndexSliceFilter

Used with:

Distinct.Not all clustering columns

restricted.

Page 65: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

ClusteringIndexSliceFilter1. Get Partition From Memtables.2. Create Iterators for Partitions.3. Select SSTables that may contain Partition

Key.4. Order in reverse max timestamp order.5. Create Iterators for SSTables in order.

Page 66: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Slice Filter Short Circuits

If SSTable max timestamp is before max seen Partition Deletion

timestamp.

Stop Search.

Page 67: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Names Filter Short Circuits

If search clustering value not within clustering range in the SSTable.

Skip SSTable.

Page 68: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

So…

3.x is awesome.Starting using it as soon as

possible.

Page 69: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Thanks.

Page 70: Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X

Aaron Morton@aaronmorton

Co-Founder & Principal Consultantwww.thelastpickle.com