Cassandra 2.0 to 2.1

Cassandra 2.0 2.1 Codebits, Lisbon, April 2014

www.datastax.com @DataStaxEU

About Me

©2014 DataStax. Do not distribute without consent. @DataStaxEU 2

Johnny Miller Solutions Architect

•  @CyanMiller

•  www.linkedin.com/in/johnnymiller

We are hiring www.datastax.com/careers

@DataStaxCareers

DataStax - Introduction


•  Founded in April 2010

•  We drive Apache Cassandra™

•  400+ customers (25 of the Fortune 100)

•  200+ employees

•  Home to Apache Cassandra™ Chair & most committers

•  Contribute ~ 90% of code into Apache Cassandra™ code base

•  Headquartered in San Francisco Bay area

•  European headquarters established in London

•  Offices in France and Germany

Our Goal

To be the first and best database choice for online applications

Why DataStax?


DataStax supports both the open source community and enterprises.

Open Source/Community Enterprise Software

•  Apache Cassandra (employ Cassandra chair and 90+% of the committers)

•  DataStax Community Edition •  DataStax OpsCenter •  DataStax DevCenter •  DataStax Drivers/Connectors •  Online Documentation •  Online Training •  Mailing lists and forums

•  DataStax Enterprise Edition •  Certified Cassandra •  Built-in Analytics •  Built-in Enterprise Search •  Enterprise Security

•  DataStax OpsCenter •  Expert Support •  Consultative Help •  Professional Training

History of Cassandra


Cassandra Adoption


Source: http://db-engines.com/en/ranking, April 2014

Core Values


•  Massive Scalability •  High Performance •  Reliability/Availability

Performance and Scale


“In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.” Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August 2012. Benchmark paper presented at the Very Large Database Conference, 2012. http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf

End Point Independent NoSQL Benchmark

Lowest in latency…

Netflix Cloud Benchmark… Highest in throughput…

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf

Performance and Scale


Cassandra works for small to huge deployments. •  Cassandra @ Netflix

•  80+ Clusters •  2500+ nodes •  4 Data Centres (Amazon Regions) •  > 1 Trillion transactions per day

•  Cassandra @ Ebay •  >250TB of data, dozens of nodes, multiple

data centres •  > 6 billion writes, > 5 billion reads per day

Source: http://planetcassandra.org

Availability


•  Cassandra was designed with the understanding that system/hardware failures can and do occur

•  Peer-to-peer, distributed system •  All nodes the same – masterless with no single point of failure •  Read/Write-anywhere and across data centres

“Cassandra, our distributed cloud persistence store which is distributed across all zones and regions, dealt with the loss of one third of its regional nodes without any loss of data or availability”. http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html

“During Hurricane Sandy, we lost an entire data center. Completely. Lost. It. Our application fail-over resulted in us losing just a few moments of serving requests for a particular region of the country, but our data in Cassandra never went offline.” http://planetcassandra.org/blog/post/outbrain-touches-over-80-of-all-us-online-users-with-help-from-cassandra/

Cassandra 1.2


New Core Value


•  Massive Scalability •  High Performance •  Reliability/Availability •  Ease of Use

CREATE TABLE users (! id uuid PRIMARY KEY,! name text,! country text,! birth_date int!);!!CREATE INDEX ON users(country);!!SELECT * FROM users !WHERE country=‘Portugal’! AND birth_date > 1950;!

Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40", "10.158.02.44") .build();

Session session = cluster.connect("akeyspace");

session.execute( "INSERT INTO user (username, password) ” + "VALUES(‘johnny’, ‘password1234’)" );

CQL3 Delivers


"Coming from a relational database background we found the transition to Cassandra to be very straightforward. There are a few simple key concepts one must grasp at first but ever since it’s been smooth sailing for us.”

- Boris Wolf, Comcast

Find out more: •  Introduction to CQL3 and Data Modeling

Slides: http://bit.ly/jpm_003, Video: http://bit.ly/jpm_004 [Cassandra Meetup, Helsinki, Feb 2014]

Native Drivers and Protocol


Traditionally, Cassandra clients (Hector, Astynax1 etc..) were developed using Thrift With Cassandra 1.2 and the introduction of CQL3 and the CQL native protocol and drivers a new easier way of using Cassandra was introduced. Why? •  Easier to develop and model •  Best practices for building modern distributed applications •  Integrated tools and experience •  Enable Cassandra to evolve easier and support new features 1Astynax is being updated to include the native driver: https://github.com/Netflix/astyanax/wiki/Astyanax-over-Java-Driver

Native Drivers

©2014 DataStax. Do not distribute without consent. 15

•  Java •  C# •  Python •  C++ (beta) •  ODBC (beta) •  Clojure •  Erlang •  Node.js •  Ruby •  Plus many, many more….

Get them here: http://www.datastax.com/download

Find out more: •  Going Native With Apache Cassandra

http://bit.ly/jpm_001 [QCon, London 2014]

Asynchronous Read


ResultSetFuture future = session.executeAsync( "SELECT * FROM user");

for (Row row : future.get()) {

String userName = row.getString("username");

String password = row.getString("password");

}

Note: The future returned implements Guava's ListenableFuture interface. This means you can use all Guava's Futures1 methods! 1http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/Futures.html

Read with Callbacks


final ResultSetFuture future =

session.executeAsync("SELECT * FROM user");

future.addListener(new Runnable() {

public void run() {

for (Row row : future.get()) {

String userName = row.getString("username");

String password = row.getString("password");

}

}

}, executor);

Parallelize Calls


int queryCount = 99;

List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>();

for (int i=0; i<queryCount; i++) {

futures.add(

session.executeAsync("SELECT * FROM user "

+"WHERE username = '"+i+"'"));

}

for(ResultSetFuture future : futures) {

for (Row row : future.getUninterruptibly()) {

//do something

}

}

Query Tracing


•  You can turn tracing on or off for queries with the TRACING ON | OFF command.

•  This can help you understand what Cassandra is doing and identify any performance problems.

Find out more: •  http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

Also worth noting…


•  Automatic Batches •  CQL3 Authentication Support •  CQL3 Collections Data Type •  Virtual Nodes (vnodes) •  JBOD improvements •  Parallel leveled compaction •  LZ4 compression Plus much, much more….

Cassandra 2.0 DataStax Enterprise 4.0


Lightweight Transactions (LWT)


Why? •  Solve a class of race conditions in Cassandra that you would otherwise need to install

an external locking manager to solve.

Syntax: !INSERT INTO customer_account (customerID, customer_email)!

!VALUES (‘Johnny’, ‘[email protected]’) !IF NOT EXISTS;!

!

!UPDATE customer_account !

!SET customer_email=’[email protected]’!

!IF customer_email=’[email protected]’;!

!

Example Use Case: •  Registering a user

Race Condition


SELECT name!FROM users!WHERE username = 'johnny';!

(0 rows)!

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('johnny',! 'Johnny Miller',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00');!

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('johnny',! 'Johnny Miller',! ['[email protected]'],! 'ea24e13ad9...',! '2011-06-20 13:50:01');!

This one wins!

SELECT name!FROM users!WHERE username = 'johnny';!

(0 rows)!

Lightweight Transactions


INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('johnny',! 'Johnny Miller',! ['[email protected]'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')!IF NOT EXISTS;!

INSERT INTO users ! (username, name, email,! password, created_date)!VALUES ('johnny',! 'Johnny Miller',! ['[email protected]'],! 'ea24e13ad9...',! '2011-06-20 13:50:01’)!IF NOT EXISTS;!!

[applied]!-----------! True!

[applied] | username | created_date | name !-----------+----------+----------------+----------------! False | johnny | 2011-06-20 ... | Johnny Miller!

Lightweight Transactions


•  Uses Paxos algorthim •  All operations are quorum-based i.e. we can loose nodes and its still going

to work!

•  See Paxos Made Simple - http://bit.ly/paxosmadesimple

•  Consequences of Lightweight Transactions •  4 round trips vs. 1 for normal updates

•  Operations are done on a per-partition basis

•  Will be going across data centres to obtain consensus

•  Cassandra user will need read and write access i.e. you get back the row!

Great for 1% your app, but eventual consistency is still your friend!

Find out more: •  http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0 •  Eventual Consistency != Hopeful Consistency

http://www.youtube.com/watch?v=A6qzx_HE3EU

Batch Statements and LWT


BEGIN BATCH !

!UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1; !

!UPDATE foo SET z = 2 WHERE x = 'a' AND y = 2 IF t = 4; !

APPLY BATCH;!

•  Allows you to group multiple conditional updates in a batch as long as all those updates

apply to the same partition

Triggers


CREATE TRIGGER <name> ON <table> USING <classname>; !class MyTrigger implements Itrigger {! public Collection<RowMutation> augment(ByteBuffer key, ColumnFamily update) {! ...! }!}!

•  The trigger defined on a table fires before a requested DML statement occurs

•  You place the trigger code in a lib/triggers subdirectory of the Cassandra installation directory

•  A full working example can be found in the Cassandra examples/triggers directory

•  EXPERIMENTAL: Expect changes in Cassandra 2.1

Find out more: •  http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-

triggers-support

In-Memory Tables (DataStax Enterprise 4.0)


CREATE TABLE users (!

uid text,!

fname text,!

lname text,!

PRIMARY KEY (uid)!

) WITH compaction={'class': 'MemoryOnlyStrategy', ‘size_limit_in_mb’:100}!

AND memtable_flush_period_in_ms=3600000;!

!

•  We expect that in memory column families will be on average 20-50% faster with significantly less observed variance on read queries.

•  Great use case is for workloads with a lot of overwrites •  Caution: more tables = more memory = gc death spiral

Find out more: •  http://www.datastax.com/2014/02/why-we-added-in-memory-to-cassandra

Static Columns


A static column is a special column that is shared by all the rows of the same partition !

CREATE TABLE foo ( !

x text, !

y bigint, !

t bigint static, !

z bigint, !

PRIMARY KEY (x, y) );!

!

INSERT INTO foo (x,y,t, z) VALUES ('a', 1, 1, 10);!

INSERT INTO foo (x,y,t, z) VALUES ('a', 2, 2, 20);!

!

SELECT * from foo;!

!

x | y | t | z!

---+---+---+----!

a | 1 | 2 | 10!

a | 2 | 2 | 20!

Static Columns


•  Considerations •  Use them when you want to store some per-partition

“static” information alongside clustered rows and still want to be able to query both of those with a single SELECT.

•  only columns not part of the PRIMARY key can be static. •  only tables with at least one clustering column can have

static columns •  tables with the COMPACT STORAGE option cannot have

static columns.

No more CQL2


•  CQL2 is not supported any more. •  CQL2 has been discouraged for a while, and if you

are still using it, do not upgrade until you have rewritten your application to use CQL3.

Clustered columns can be indexed


CREATE TABLE foo (!

a int,!

b int,!

c int,!

PRIMARY KEY (a, b)!

);!

•  It was previously impossible to create an index on the ‘b’ column, since that column was a special clustered column.

•  This restriction has now been fixed and you can create indexes on clustered columns just as if they were regular CQL columns.

CREATE INDEX ON foo (b);!

Conditional create/drop ks/table/index statements in CQL3


•  You can now use IF EXISTS and IF NOT EXISTS conditionals for dropping and creating tables and keyspaces.

Automatic Paging


•  This is great! •  Historically difficult to get huge result sets out of Cassandra. It has

generally been necessary to explicitly enumerate your row keys in reasonably small batches (1000 rows or so per batch would be common).

•  This feature now allows you to get huge result sets (including “select * from table), and have the server automatically page the results, while the client is just able to trivially iterate over the entire result set.

•  This should remove a very common cause of OOMs (out of memory exceptions), and should make data exploration much easier.

Paging (before)


CREATE TABLE timeline (! user_id uuid,! tweet_id timeuuid,! tweet_author uuid,! tweet_body text,! PRIMARY KEY (user_id, tweet_id)!);!!SELECT *!FROM timeline!WHERE (user_id = :last_key ! AND tweet_id > :last_tweet)! OR token(user_id) > token(:last_key)!LIMIT 100!

Paging (after)


SELECT * FROM timeline!

Thrift


•  Replace Thrift HsHa with LMAX Disruptor based implementation

•  Because of the substantial changes at the Thrift transport layer, be sure to update your app to use Thrift clients compatible with Cassandra 2.0, and test your application thoroughly before going to production.

Streaming


•  This is a major rewrite of the Cassandra streaming protocol, and should be much more robust and reliable than the previous implementation.

•  It includes: •  several performance optimizations

•  multiple parallel sstable streaming

•  better logging

•  more metrics

Reduce request latency with rapid retry protection/eager retries


•  This should substantially help with your 95%-99% latency. •  By rapidly detecting that a query was sent to a slow node, this feature

will greatly speed up performing a retry on another node. •  There is new metadata associated with each table

speculative_retry='99.0PERCENTILE' //default •  Be careful – retries will have an effect on what throughput you can

achieve in your cluster.

ALTER TABLE users WITH speculative_retry = '10ms’;!

!

ALTER TABLE users WITH speculative_retry = '99percentile'; !!

Official way to disable compactions


•  nodetool disableautocompaction •  nodetool enableautocompaction

Remove row-level bloom filters


•  This should be a largely invisible change since there was never a noticeable performance improvement from having these bloom filters.

•  However, you will see a reduction in memory usage as a result.

add default_time_to_live


•  This has been a long-requested Cassandra feature and makes auto-expiring data easier.

•  You can have a single per-table TTL that will always be set unless overridden by the client.

•  It also allows for significant performance optimizations on the server side.

New network topology snitch for mixed ec2/other envs


•  There is a new snitch(YamlFileNetworkTopologySnitch) and a new yaml file (cassandra-topology.yaml) that will be used if you select it.

•  This snitch should probably be used for any cluster that spans both EC2 as well as non-EC2 environments.

Removed compatibility with pre-1.2.5 sstables and network messages


•  This is very important as it means that you must upgrade to Cassandra 1.2.6 ( or equivalent DSE) or later before upgrading to Cassandra 2.0.x or DSE 4.0.x.

Improve memory use defaults


•  Memtables now use ¼ your heap by default instead of ⅓.

•  Additionally, the write timeout has been dramatically lowered to 2 seconds from 10 seconds, and the read timeout has been changed to 5 seconds.

add SHOW SESSION <tracing-session> command


•  If you aren’t already using tracing to debug your dev and production clusters, then start doing so.

•  It’s one of the most powerful tools that you have at your disposal to understand what is going on.

•  This lets explicitly specify which session you want to display the output for.

•  Previously you would have had to manually query it from the system_traces.sessions and system_traces.events tables.

Single-pass compaction


•  This should noticeably improve the performance of compaction since Cassandra no longer has to read through each sstable twice.

Compact hottest sstables first and optionally omit coldest from compaction entirely


•  Read-coldness (how [in]frequently a row is read) is now used in consideration of compaction.

•  If you have a lot of cold data, this could greatly reduce the amount of unnecessary re-compaction.

Leveled compaction performs size- tiered compactions in L0


•  If LCS gets behind, read performance deteriorates as we have to check bloom filters on many sstables in L0.

•  For wide rows, this can mean having to seek for each one since the BF doesn't help us reject much.

•  Performing size-tiered compaction in L0 will mitigate this until we can catch up on merging it into higher levels

New CQL-aware SSTableWriter


•  Prior to Cassandra 2.0.4, It has been possible to write SStables for CQL3 tables, but only with a lot of difficulty.

•  Particularly with complex schemas, this is very complicated and error prone, and should be deprecated as an approach.

•  Instead the new CQL3 aware SSTableWriter should be used:

String schema = "CREATE TABLE foo (c1 int, c2 text, c3 float, PRIMARY KEY (c1, c2))"!String insert = "INSERT INTO foo(c1, c2, c3) VALUES (?, ?, ?)"!CQLSSTableWriter writer = CQLSSTableWriter.builder()! .for(schema)! .using(insert)! .build();!!writer.addRow(3, "foo", 2.3f);!writer.addRow(1, "bar", 0.0f); !

Plus more….


•  Java7 is now required! •  Tracking statistics on clustered columns allows eliminating

unnecessary sstables from the read path •  Faster partition index lookups and cache reads by improving

performance of off-heap memory •  Faster reads of compressed data by switching from CRC32 to Adler

checksums •  JEMalloc support for off-heap allocation •  The potentially dangerous countPendingHints JMX call has been

replaced by a Hints Created metric •  The on-heap partition cache (“row cache”) has been removed •  Vnodes are on by default in Cassandra (off by default in DataStax

Enterprise).

And more……

Find out more…


•  Cassandra 2.0 documentation http://www.datastax.com/documentation/cassandra/2.0/

•  DataStax Enterprise 4.0 documentation http://www.datastax.com/documentation/datastax_enterprise/4.0/

•  What’s new in Cassandra 2.0 http://www.datastax.com/wp-content/uploads/2013/09/WP-DataStax-WhatsNewC2.0.pdf

•  New CQL features in Cassandra 2.0.6 http://www.datastax.com/dev/blog/cql-in-2-0-6

•  What’s under the hood in Cassandra 2.0 http://www.datastax.com/dev/blog/whats-under-the-hood-in-cassandra-2-0

•  Facebook’s Cassandra paper, annotated and compared to Apache Cassandra 2.0 http://www.datastax.com/documentation/articles/cassandra/cassandrathenandnow.html

Cassandra 2.1


User Defined Types


CREATE TYPE address (! street text,! city text,! zip_code int,! phones set<text>!)!!CREATE TABLE users (! id uuid PRIMARY KEY,! name text,! addresses map<text, address>!)!!SELECT id, name, addresses.city, addresses.phones FROM users;!! id | name | addresses.city | addresses.phones!--------------------+----------------+--------------------------! 63bf691f | johnny | London | {’0201234567', ’0796622222'}!

User Defined Types


Considerations •  you cannot update only parts of a UDT value, you have to overwrite the

whole thing every time (limitation in current implementation, may change). •  Always read entirely under the hood (as of the current implementation at

least) •  UDTs are not meant to store large and complex "documents" as of their

current implementation, but rather to help make the denormalization of short amount of data more convenient and flexible.

•  It is possible to use a UDT as type of any CQL column, including clustering ones.

Find out more: •  http://www.datastax.com/dev/blog/cql-in-2-1

Secondary indexes on collections


CREATE TABLE songs (!

id uuid PRIMARY KEY,!

artist text,!

album text,!

title text,!

data blob,!

tags set<text>!

);!

!

CREATE INDEX song_tags_idx ON songs(tags);!

!

SELECT * FROM songs WHERE tags CONTAINS 'blues';!

!

id | album | artist | tags | title!

----------+---------------+-------------------+-----------------------+------------------!

5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind!

!

!

!

Secondary indexes on map keys


•  If you prefer indexing the map keys, you can do so by creating a KEYS index and by using CONTAINS KEY

CREATE TABLE products (! id int PRIMARY KEY,! description text,! price int,! categories set<text>,! features map<text, text>! );!! CREATE INDEX feat_key_index ON products(KEYS(features));!! SELECT id, description! FROM products! WHERE features CONTAINS KEY 'refresh-rate';! ! id | description! -------+-----------------------------! 34134 | 120-inch 1080p 3D plasma TV!

Counters++


•  simpler implementation, no more edge cases •  possible to properly repair now •  significantly less garbage and internode traffic

generated •  better performance for 99% of uses

Row Cache


CREATE TABLE notifications (!

target_user text,!

notification_id timeuuid,!

source_id uuid,!

source_type text, !

activity text,!

PRIMARY KEY (target_user, notification_id)!

)!

WITH CLUSTERING ORDER BY (notification_id DESC)!

AND caching = 'rows_only'!

AND rows_per_partition_to_cache = '3';!

Thrift post-Cassandra 2.1


•  There is a proposal to freeze thrift starting with 2.1.0 •  http://bit.ly/freezethrift

•  Will retain it for backwards compatibility, but no new features or changes to the Thrift API after 2.1.0

“CQL3 is almost two years old now and has proved to be the better API that Cassandra needed. CQL drivers have caught up with and passed the Thrift ones in terms of features, performance, and usability. CQL is easier to learn and more productive than Thrift.” - Jonathan Ellis, Apache Chair, Cassandra

2.1 Roadmap


•  Beta1 - 20th Feb •  Beta2 - ? •  RC - ? •  Final release currently mid-2014

Find Out More


DataStax: •  http://www.datastax.com Getting Started: •  http://www.datastax.com/documentation/gettingstarted/index.html Training: •  http://www.datatstax.com/training Downloads: •  http://www.datastax.com/download Documentation: •  http://www.datastax.com/docs Developer Blog: •  http://www.datastax.com/dev/blog Community Site: •  http://planetcassandra.org Webinars: •  http://planetcassandra.org/Learn/CassandraCommunityWebinars


Cassandra 2.0 to 2.1

Technology

datastax introduction

day cassandra

cassandra chair

data centres cassandra

cassandra meetup

history of cassandra

cassandra adoption

cassandra clients hector