Top Banner
©2013 DataStax Confidential. Do not distribute without consent. CTO, DataStax Jonathan Ellis Project Chair, Apache Cassandra Cassandra 2.1 (mostly) 1
94
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra 2.1

©2013 DataStax Confidential. Do not distribute without consent.

CTO, DataStax

Jonathan EllisProject Chair, Apache Cassandra

Cassandra 2.1 (mostly)

1

Page 2: Cassandra 2.1

Five Years of Cassandra

Jun-09 Mar-10 Jan-11 Nov-11 Sep-12 Jul-13

0.1 0.3 0.6 0.7 1.0 1.2...

2.0

DSE

Jul-08

Page 3: Cassandra 2.1

•Massively scalable •High performance •Reliable/Available

Core values Cassandra HBase Redis MySQL

Page 4: Cassandra 2.1

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);!!CREATE INDEX ON users(state);!SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

New Core Value

•Massively scalable •High performance •Reliable/Available •Productivity + ease of use

Page 5: Cassandra 2.1

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

Collections

Page 6: Cassandra 2.1

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);!SELECT *FROM users NATURAL JOIN users_addresses;

Collections

Page 7: Cassandra 2.1

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);!SELECT *FROM users NATURAL JOIN users_addresses;X

Collections

Page 8: Cassandra 2.1

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text>);

Collections

Page 9: Cassandra 2.1

UPDATE usersSET email_addresses = email_addresses + {‘[email protected]’, ‘[email protected]’};

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text>);

Collections

Page 10: Cassandra 2.1

Cassandra 2.0

Page 11: Cassandra 2.1

Race conditionSELECT name!FROM users!WHERE username = 'pmcfadin';

Page 12: Cassandra 2.1

Race conditionSELECT name!FROM users!WHERE username = 'pmcfadin';

(0 rows) SELECT name!FROM users!WHERE username = 'pmcfadin';

Page 13: Cassandra 2.1

Race conditionSELECT name!FROM users!WHERE username = 'pmcfadin';

(0 rows) SELECT name!FROM users!WHERE username = 'pmcfadin';

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00');

(0 rows)

Page 14: Cassandra 2.1

Race conditionSELECT name!FROM users!WHERE username = 'pmcfadin';

(0 rows) SELECT name!FROM users!WHERE username = 'pmcfadin';

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00');

(0 rows)

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ea24e13ad9...',! '2011-06-20 13:50:01');

Page 15: Cassandra 2.1

Race condition

This one wins

SELECT name!FROM users!WHERE username = 'pmcfadin';

(0 rows) SELECT name!FROM users!WHERE username = 'pmcfadin';

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00');

(0 rows)

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ea24e13ad9...',! '2011-06-20 13:50:01');

Page 16: Cassandra 2.1

Lightweight transactions

Page 17: Cassandra 2.1

Lightweight transactionsINSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')!IF NOT EXISTS;

Page 18: Cassandra 2.1

Lightweight transactionsINSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')!IF NOT EXISTS;

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ea24e13ad9...',! '2011-06-20 13:50:01')!IF NOT EXISTS;

Page 19: Cassandra 2.1

Lightweight transactionsINSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')!IF NOT EXISTS;

[applied]!-----------! True

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ea24e13ad9...',! '2011-06-20 13:50:01')!IF NOT EXISTS;

Page 20: Cassandra 2.1

Lightweight transactions

[applied] | username | created_date | name !-----------+----------+----------------+----------------! False | pmcfadin | 2011-06-20 ... | Patrick McFadin

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')!IF NOT EXISTS;

[applied]!-----------! True

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('pmcfadin',! 'Patrick McFadin',! ['[email protected]'],! 'ea24e13ad9...',! '2011-06-20 13:50:01')!IF NOT EXISTS;

Page 21: Cassandra 2.1

Atomic log appends with LWTCREATE TABLE log (! log_name text,! seq int static,! logged_at timeuuid,! entry text,! primary key (log_name, logged_at)!);!!INSERT INTO log (log_name, seq) !VALUES ('foo', 0);

Page 22: Cassandra 2.1

Atomic log appends with LWTBEGIN BATCH!!UPDATE log SET seq = 1!WHERE log_name = 'foo'!IF seq = 0;!!INSERT INTO log (log_name, logged_at, entry)!VALUES ('foo', now(), 'test');!!APPLY BATCH;!

Page 23: Cassandra 2.1

Details•http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

•Paxos state is durable + quorum based •Paxos made Simple

•Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •4 round trips vs 1 for normal updates

•http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency-hopeful-consistency-by-christos-kalantzis

Page 24: Cassandra 2.1

Reads in a cluster

Client Coordinator

40% busy

90% busy

30% busy

Page 25: Cassandra 2.1

Reads in a cluster

Client Coordinator

40% busy

90% busy

30% busy

Page 26: Cassandra 2.1

Reads in a cluster

Client Coordinator

40% busy

90% busy

30% busy

Page 27: Cassandra 2.1

Reads in a cluster

Client Coordinator

40% busy

90% busy

30% busy

Page 28: Cassandra 2.1

Reads in a cluster

Client Coordinator

40% busy

90% busy

30% busy

Page 29: Cassandra 2.1

A failure

Client Coordinator

40% busy

90% busy

30% busy

Page 30: Cassandra 2.1

A failure

Client Coordinator

40% busy

90% busy

30% busy

Page 31: Cassandra 2.1

A failure

Client Coordinator

40% busy

90% busy

30% busy

Page 32: Cassandra 2.1

A failure

Client Coordinator

40% busy

90% busy

30% busyX

Page 33: Cassandra 2.1

A failure

Client Coordinator

40% busy

90% busy

30% busyXtimeout

Page 34: Cassandra 2.1

Rapid read protection

Client Coordinator

40% busy

90% busy

30% busy

Page 35: Cassandra 2.1

Rapid read protection

Client Coordinator

40% busy

90% busy

30% busy

Page 36: Cassandra 2.1

Rapid read protection

Client Coordinator

40% busy

90% busy

30% busy

Page 37: Cassandra 2.1

Rapid read protection

Client Coordinator

40% busy

90% busy

30% busyX

Page 38: Cassandra 2.1

Rapid read protection

Client Coordinator

40% busy

90% busy

30% busyX

Page 39: Cassandra 2.1

Rapid read protection

Client Coordinator

40% busy

90% busy

30% busyX

Page 40: Cassandra 2.1

Rapid read protection

Client Coordinator

40% busy

90% busy

30% busyXsuccess

Page 41: Cassandra 2.1

Rapid Read Protection

NONE

Page 42: Cassandra 2.1

Latency (mid-compaction)

Page 43: Cassandra 2.1

Cold data

10,000 req/s 5,000 req/s

4,000 req/s 10 req/s

Page 44: Cassandra 2.1

Cold data

10,000 req/s 5,000 req/s

4,000 req/s 10 req/s

Page 45: Cassandra 2.1

Cold data compaction

10 req/s

10,000 req/s

Page 46: Cassandra 2.1

Cassandra 2.1

Page 47: Cassandra 2.1

User defined typesCREATE TYPE address ( street text, city text, zip_code int, phones set<text>)!CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address>)!SELECT id, name, addresses.city, addresses.phones FROM users;! id | name | addresses.city | addresses.phones--------------------+----------------+-------------------------- 63bf691f | jbellis | Austin | {'512-4567', '512-9999'}

Page 48: Cassandra 2.1

Collection indexingCREATE TABLE songs ( id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text>);!CREATE INDEX song_tags_idx ON songs(tags);!SELECT * FROM songs WHERE tags CONTAINS 'blues';! id | album | artist | tags | title----------+---------------+-------------------+-----------------------+------------------ 5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind!!

Page 49: Cassandra 2.1

(UDT indexing?)

Page 50: Cassandra 2.1

Counters++

Page 51: Cassandra 2.1

Counters++•simpler implementation, no more edge cases

Page 52: Cassandra 2.1

Counters++•simpler implementation, no more edge cases•possible to properly repair now

Page 53: Cassandra 2.1

Counters++•simpler implementation, no more edge cases•possible to properly repair now•significantly less garbage and internode traffic generated

Page 54: Cassandra 2.1

Counters++•simpler implementation, no more edge cases•possible to properly repair now•significantly less garbage and internode traffic generated•better performance for 99% of uses

Page 55: Cassandra 2.1

Counters++•simpler implementation, no more edge cases•possible to properly repair now•significantly less garbage and internode traffic generated•better performance for 99% of uses

•RF>1, replicate_on_write=true

Page 56: Cassandra 2.1

Counters++•simpler implementation, no more edge cases•possible to properly repair now•significantly less garbage and internode traffic generated•better performance for 99% of uses

•RF>1, replicate_on_write=true

•topology changes not leading to data loss (#4071)

Page 57: Cassandra 2.1

Counters++•simpler implementation, no more edge cases•possible to properly repair now•significantly less garbage and internode traffic generated•better performance for 99% of uses

•RF>1, replicate_on_write=true

•topology changes not leading to data loss (#4071)•commitlog now 100% safe to replay (#4417)

Page 58: Cassandra 2.1

Counters++•simpler implementation, no more edge cases•possible to properly repair now•significantly less garbage and internode traffic generated•better performance for 99% of uses

•RF>1, replicate_on_write=true

•topology changes not leading to data loss (#4071)•commitlog now 100% safe to replay (#4417)•Internal format overhaul still coming in 3.0 (#6506)

Page 59: Cassandra 2.1

What hasn’t changed

Page 60: Cassandra 2.1

What hasn’t changed•same API

Page 61: Cassandra 2.1

What hasn’t changed•same API•same average throughput

Page 62: Cassandra 2.1

What hasn’t changed•same API•same average throughput•same restrictions on mixing counter and non-counter columns

Page 63: Cassandra 2.1

What hasn’t changed•same API•same average throughput•same restrictions on mixing counter and non-counter columns•same restrictions on mixing counter and non-counter updates

Page 64: Cassandra 2.1

What hasn’t changed•same API•same average throughput•same restrictions on mixing counter and non-counter columns•same restrictions on mixing counter and non-counter updates•same restrictions on counter deletes

Page 65: Cassandra 2.1

What hasn’t changed•same API•same average throughput•same restrictions on mixing counter and non-counter columns•same restrictions on mixing counter and non-counter updates•same restrictions on counter deletes •same retry limitations

Page 66: Cassandra 2.1

Writes (low contention)

Page 67: Cassandra 2.1

Writes (high contention)

Page 68: Cassandra 2.1

Data directories (2.0)/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-CompressionInfo.db/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Data.db/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Filter.db/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Index.db/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Statistics.db/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Summary.db/var/lib/cassandra/data/foo/bar/foo-bar-jb-1-TOC.txt

Page 69: Cassandra 2.1

Data directories (2.1)/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-CompressionInfo.db/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Data.db/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Filter.db/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Index.db/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Statistics.db/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Summary.db/var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-TOC.txt

Page 70: Cassandra 2.1

Inefficient bloom filters

+

= ?

Page 71: Cassandra 2.1

+

=

Inefficient bloom filters

Page 72: Cassandra 2.1

+

=

Inefficient bloom filters

Page 73: Cassandra 2.1

Inefficient bloom filters

Page 74: Cassandra 2.1

HyperLogLog applied

Page 75: Cassandra 2.1

HLL and compaction

Page 76: Cassandra 2.1

HLL and compaction

Page 77: Cassandra 2.1

HLL and compaction

Page 78: Cassandra 2.1

More-efficient repair

Page 79: Cassandra 2.1

More-efficient repair

Page 80: Cassandra 2.1

More-efficient repair

Page 81: Cassandra 2.1

More-efficient repair

Page 82: Cassandra 2.1

More-efficient repair

Page 83: Cassandra 2.1

More-efficient repair

Page 84: Cassandra 2.1

More-efficient repair

Page 85: Cassandra 2.1

More-efficient repair

Page 86: Cassandra 2.1

More-efficient repair

Page 87: Cassandra 2.1

Implications for LCS (and STCS)

Page 88: Cassandra 2.1

The new query cache

Page 89: Cassandra 2.1

The new row cacheCREATE TABLE notifications ( target_user text, notification_id timeuuid, source_id uuid, source_type text, activity text, PRIMARY KEY (target_user, notification_id))WITH CLUSTERING ORDER BY (notification_id DESC) AND caching = 'rows_only' AND rows_per_partition_to_cache = '3';!

Page 90: Cassandra 2.1

The new row cachetarget_user notification_id source_id source_type activity

nick e1bd2bcb- d972b679- photo jbellis liked

nick 321998c- d972b679- photo rbranson commented

nick ea1c5d35- 88a049d5- user mbulman created account

nick 5321998c- 64613f27- photo jbellis commented

nick 07581439- 076eab7e- user thobbs created account

rbranson 1c34467a- f04e309f- user jbellis created account

Page 91: Cassandra 2.1

The new row cachetarget_user notification_id source_id source_type activity

nick e1bd2bcb- d972b679- photo jbellis liked

nick 321998c- d972b679- photo rbranson commented

nick ea1c5d35- 88a049d5- user mbulman created account

nick 5321998c- 64613f27- photo jbellis commented

nick 07581439- 076eab7e- user thobbs created account

rbranson 1c34467a- f04e309f- user jbellis created account

Page 92: Cassandra 2.1

Read performance

Page 93: Cassandra 2.1

Reads post-compaction

Page 94: Cassandra 2.1

Questions?