Big Data Grows Up A (re)introduction to Cassandra Robbie Strickland
Jan 15, 2015
Big Data Grows UpA (re)introduction to Cassandra
Robbie Strickland
Who am I?
Robbie StricklandSoftware Development ManagerThe Weather Channel
[email protected]@dont_use_twitter
Who am I?
● Cassandra user/contributor since 2010● … it was at release 0.5 back then● 4 years? Oracle DBA’s aren’t impressed● Done lots of dumb stuff with Cassandra● … and some really awesome stuff too
Cassandra in 2010
Cassandra in 2010
Cassandra in 2014
Why Cassandra?
It’s fast:
● No locks● Tunable consistency● Sequential R/W● Decentralized
Why Cassandra?
It scales (linearly):
● Multi data center● No SPOF● DHT● Hadoop integration
Why Cassandra?
It’s fault tolerant:
● Automatic replication● Masterless● Failed nodes
replaced with ease
… a lot in the last year (ish)
What’s different?
What’s new?
● Virtual nodes● O(n) data moved off-heap● CQL3 (and defining schemas)● Native protocol/driver● Collections● Lightweight transactions● Compaction throttling that actually works
What’s gone?
● Manual token management● Supercolumns● Thrift (if you use the native driver)● Directly managing storage rows
What’s still the same?
● Still not an RDBMS● Still no joins (see above)● Still no ad-hoc queries (see above again)● Still requires a denormalized data model (^^)● Still need to know what the heck you’re
doing
Linear scalability without the migraine
Token Management
The old way● 1 token per node● Assigned manually● Adding nodes ==
reassignment of all tokens
● Node rebuild heavily taxes a few nodes
A
BF
C
D
E
cluster with no vnodes
… enter Vnodes● n tokens per node● Assigned magically● Adding nodes ==
painless● Node rebuild
distributed across many nodes
A B
C
Dcluster with vnodes
N
M
L
H G
F
E
I
J
K
Node rebuild without Vnodes
Node rebuild with Vnodes
because the JVM sometimes sucks
Going Off-heap
Why go off-heap
● GC overhead● JVM no good with big heap sizes● GC overhead● GC overhead● GC overhead
O(n) data structures
● Row cache● Bloom filters● Compression offsets● Partition summary
… all these are moved off-heap
New memory allocation
native
JVM
heap
Row cacheBloom filtersCompression offsetsPartition summary
Partition key cache
Or, how to build a killer data store without a crappy interface
Death of a (Thrift) Salesman
Reasons not to ditch Thrift
● Lots of client libraries still use it● You finally got it installed● You didn’t know there was another choice● It sucks less than many alternatives
… in spite of all those benefits, you really should ditch Thrift because:
● It requires your entire result set to fit into RAM on both client and server
● The native protocol is better, faster, and supports all the new features
● Thrift-based client libraries are always a step behind
● It’s going away eventually
… and did I mention ...
It requires your entire result set to fit into RAM
on both client and server!!!
Requesting too much data
really catchy tag line here
Going Native
Native protocol
● It’s binary, making it lighter weight● It supports cursors (FTW!)● It supports prepared statements● Cluster awareness built-in● Either synchronous or asynchronous ops● Only supports CQL-based operations● Can be used side-by-side with Thrift
Native drivers
from DataStax:JavaC#Python
… other community supported drivers available
Native query exampleval insert = session.prepare("INSERT INTO myKsp.myTable (myKey, col1, col2) VALUES (?,?,?)")val select = session.prepare("SELECT * FROM myKsp.myTable WHERE myKey = ?")val cluster = Cluster.builder().addContactPoints(host1, host2, host3)val session = cluster.connect()session.execute(insert.bind(myKey, col1, col2))val result = session.execute(select.bind(myKey))
Or, how to make Cassandra more awesome while simultaneously irritating early adopters
Wait, was that SQL?!!
Introducing CQL3
● Because the first two attempts sucked● Stands for “Cassandra Query Language”● Looks a heck of a lot like SQL● … but isn’t● Substantially lowers the learning curve● … but also makes it easier to screw up● An abstraction over the storage rows
Storage rows[default@unknown] create keyspace Library;[default@unknown] use Library;[default@Library] create column family Books... with comparator=UTF8Type... and key_validation_class=UTF8Type… and default_validation_class=UTF8Type;[default@Library] set Books['Patriot Games']['author'] = 'Tom Clancy';[default@Library] set Books['Patriot Games']['year'] = '1987';[default@Library] list Books;
RowKey: Patriot Games=> (name=author, value=Tom Clancy, timestamp=1393102991499000)=> (name=year, value=1987, timestamp=1393103015955000)
Storage rows - composites[default@Library] create column family Authors... with key_validation_class=UTF8Type... and comparator='CompositeType(LongType,UTF8Type,UTF8Type)'... and default_validation_class=UTF8Type;[default@Library] set Authors['Tom Clancy']['1987:Patriot Games:publisher'] = 'Putnam';[default@Library] set Authors['Tom Clancy']['1987:Patriot Games:ISBN'] = '0-399-13241-4';[default@Library] set Authors['Tom Clancy']['1993:Without Remorse:publisher'] = 'Putnam';[default@Library] set Authors['Tom Clancy']['1993:Without Remorse:ISBN'] = '0-399-13825-0';[default@Library] list Authors;
RowKey: Tom Clancy=> (name=1987:Patriot Games:ISBN, value=0-399-13241-4, timestamp=1393104011458000)=> (name=1987:Patriot Games:publisher, value=Putnam, timestamp=1393103948577000)=> (name=1993:Without Remorse:ISBN, value=0-399-13825-0, timestamp=1393104109214000)=> (name=1993:Without Remorse:publisher, value=Putnam, timestamp=1393104083773000)
CQL - simple introcqlsh> CREATE KEYSPACE Library WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor':1};cqlsh> use Library;cqlsh:library> CREATE TABLE Books ( ... title varchar, ... author varchar, ... year int, ... PRIMARY KEY (title) ... );cqlsh:library> INSERT INTO Books (title, author, year) VALUES ('Patriot Games', 'Tom Clancy', 1987);cqlsh:library> INSERT INTO Books (title, author, year) VALUES ('Without Remorse', 'Tom Clancy', 1993);
CQL - simple intro
Storage rows:
CQL - composite keyCREATE TABLE Authors (
name varchar,year int,title varchar,publisher varchar,ISBN varchar,PRIMARY KEY (name, year, title)
)
CQL - composite key
Storage rows:
Keys and Filters
● Ad hoc queries are NOT supported● Query by key● Key must include all potential filter columns● Must include partition key in filter● Subsequent filters must be in order● Only last filter can be a range
Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (title))
Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author, title))
Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author, year))
Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (year, author))
Secondary Indexes
● Allows query-by-value● CREATE INDEX myIdx ON myTable (myCol)● Works well on low cardinality fields● Won’t scale for high cardinality fields● Don’t overuse it -- not a quick fix for a bad
data model
Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (author))CREATE INDEX Books_year ON Books(year)
Composite Partition Keys
● PRIMARY KEY((year, author), title)● Creates a more granular shard key● Can be useful to make certain queries more
efficient, or to better distribute data● Updates sharing a partition key are atomic
and isolated
Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY ((year, author), title))
Example - Books tableCREATE TABLE Books ( title varchar, author varchar, year int, PRIMARY KEY (year, author, title))
denormalization done well
Collections
Supported types
● Sets - ordered naturally● Lists - ordered by index● Maps - key/value pairs
Caveats
● Max 64k items in a collection● Max 64k size per item● Collections are read in their entirety, so keep
them small
Sets
Sets
Set name
Itemvalue
Lists
Lists
List name Ordering meta data
List item value
Maps
Maps
Map name
Key Value
(tracing on)
TRON
Using tracing
● In cqlsh, “tracing on”● … enjoy!
Example1393126200000
AntipatternCREATE TABLE WorkQueue ( name varchar, time bigint, workItem varchar, PRIMARY KEY (name, time))
… do a bunch of inserts ...SELECT * FROM WorkQueue WHERE name='ToDo' ORDER BY time ASC;DELETE FROM WorkQueue WHERE name=’ToDo’ AND time=[some_time]
Antipattern - enqueue
Antipattern - dequeue
Antipattern
20k tombstones!! 13ms of 17ms spent reading tombstones
(no it’s not ACID)
Lightweight Transactions
Primer
● Supports basic Compare-and-Set ops● Provides linearizable consistency● … aka serial isolation● Uses “Paxos light” under the hood● Still expensive -- four round trips!● For most cases quorum reads/writes will be
sufficient
UsageINSERT INTO Users (login, name)VALUES (‘rs_atl’, ‘Robbie Strickland’)IF NOT EXISTS;
UPDATE UsersSET password=’super_secure_password’WHERE login=’rs_atl’IF reset_token=’some_reset_token’;
Other cool stuff
● Triggers (experimental)● Batching multiple requests● Leveled compaction● Configuration via CQL● Gossip-based rack/DC configuration
Thank you!
Robbie StricklandSoftware Development ManagerThe Weather Channel
[email protected]@dont_use_twitter