©2013 DataStax Confidential. Do not distribute without consent. @rustyrazorblade Jon Haddad Technical Evangelist, DataStax Cassandra 3.0 Awesomeness 1
©2013 DataStax Confidential. Do not distribute without consent.
@rustyrazorblade
Jon HaddadTechnical Evangelist, DataStax
Cassandra 3.0 Awesomeness
1
2.2 Stuff First
User Defined Functions• Apply functions to data in a table • Defined as Java or Javascript • Bring your own Python/Ruby
CREATE OR REPLACE FUNCTION fLog (input double) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return Double.valueOf(Math.log(input.doubleValue()));';
Functions in Action
cqlsh:test> create table blah2 ( id double primary key, name text );
cqlsh:test> insert into blah2 (id, name) values (1.0334343, 'jon'); cqlsh:test> select flog(id) from blah2;
test.flog(id) --------------- 0.032888
(1 rows)
Aggregates• Several built ins • min(), max(), avg(), count(), sum()• Can provide user defined aggregates • Defined as Java or JavaScript • Do not aggregate across partitions • Enable in cassandra.yaml
Native JSON Support
INSERT INTO mytable JSON '{"myKey": 0, "value": 0}'
SELECT JSON name, occupation FROM users WHERE userid = 199;
3.0 Stuff
G1GC• Improvement over ParNew+CMS • Hard to tune • CASSANDRA-8150
• G1 has more predictable pauses • Better latency •Many new gen, many old gen • G1 is adaptive to usage
E SO
SO E
O S
EE
Eden Old GenS0 S1
Improved vnode allocation• Previous method was randomly allocate • vnode problems • increased sockets • repairs take longer
•New clusters can allocate less (4-12) • CASSANDRA-7032
Pre 3.0 Hints• Cassandra is a pretty bad queue • Pre 3.0 hints are a queue • Generates lots of tombstones • Can result in instability
CREATE TABLE system.hints ( target_id uuid, hint_id timeuuid, message_version int, mutation blob, PRIMARY KEY (target_id, hint_id, message_version) ) WITH COMPACT STORAGE
X X
3.0 Hints• CASSANDRA-9427 •Write hints to a file instead • Removes overhead of compaction •No longer using C* as a queue
Materialized Views
Cassandra Data Modeling
sensor_id timestamp value
1 1 1
1 2 2
2 1 2
2 2 1
create table sensor_data ( sensor_id int, timestamp int, value int, primary key (sensor_id, timestamp) );
Cool… but…•What if we want to query sensor
data by timestamp? •We can't efficiently query on
timestamp •Need to maintain 2 tables • In 3.0, use materialized views
CREATE MATERIALIZED VIEW sensor_by_value as SELECT value, timestamp, sensor_id FROM sensor_data WHERE value is not null AND timestamp is not null PRIMARY KEY (timestamp, sensor_id, value);
Materialized View• Table managed for you • Updated async behind the scenes • Built automatically when created • Can't be mixed w/ functions yet • CASSANDRA-9664
cqlsh:test> select * from sensor_by_value;
timestamp | sensor_id | value -----------+-----------+------- 1 | 1 | 1 1 | 2 | 2 2 | 1 | 2 2 | 2 | 1
(4 rows)
cqlsh:test> select * from sensor_by_value where timestamp = 1;
timestamp | sensor_id | value -----------+-----------+------- 1 | 1 | 1 1 | 2 | 2
(2 rows)
New Storage Engine
Pre 3.0• Clustering keys are repeated for each cell • Timestamps are repeated in each cell • TTLS are.. you get the idea • Rows are a bolted on construct, only
known by a convention • Lots of wasted space • Lots of repetition
Storage in 3.0• Rows are a first class entity • Timestamps and TTLS can be
stored on the Row • Clustering keys are not repeated • Conversion to iterators for
memory efficiency
©2013 DataStax Confidential. Do not distribute without consent. 19