Top Banner
Christopher Batey @chbatey 2.2 & 3.0
44

Cassandra London - 2.2 and 3.0

Aug 12, 2015

Download

Software

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra London - 2.2 and 3.0

Christopher Batey@chbatey

2.2 & 3.0

Page 2: Cassandra London - 2.2 and 3.0

@chbatey

First comes a blog• Each new feature has a vastly more detailed blog post:

http://christopher-batey.blogspot.co.uk/

Page 3: Cassandra London - 2.2 and 3.0

@chbatey

Were did 2.2 come from?

Page 4: Cassandra London - 2.2 and 3.0

@chbatey

Don't start Thrift rpc by default (CASSANDRA-9319)

Page 5: Cassandra London - 2.2 and 3.0

@chbatey

New features• 2.2- JSON- User defined functions- User defined aggregates- The small print• 3.0- New storage engine- A new way to denormalise/duplicate

Page 6: Cassandra London - 2.2 and 3.0

@chbatey

So who’s taken some data out of C* and serialised it as JSON?

Page 7: Cassandra London - 2.2 and 3.0

@chbatey

Hello JSON• create TABLE user (username text primary key,

first_name text , last_name text , emails set<text> , country text);• INSERT INTO user JSON '{"username": "chbatey", "first_name":"Christopher", "last_name": "Batey", “emails":["[email protected]"]}';

Page 8: Cassandra London - 2.2 and 3.0

@chbatey

Goodbye JSON

Page 9: Cassandra London - 2.2 and 3.0

@chbatey

JSON + User Defined Types• CREATE TYPE movie (title text, time timestamp,

description text);• ALTER TABLE user ADD movies set<frozen<movie>>;• UPDATE user SET movies = {{ title:'Batman', time:'2011-02-03T04:05:00+0000', description: 'This film rocks' }} where username = 'chbatey';

Page 10: Cassandra London - 2.2 and 3.0

@chbatey

Out it comes

Page 11: Cassandra London - 2.2 and 3.0

@chbatey

Cassandra HTTP Wrapper?

@RequestMapping(method = {RequestMethod.POST}, value = "/{keyspace}/{table}", consumes = "application/json") public ResponseEntity<String> store(@PathVariable String keyspace, @PathVariable String table, @RequestBody String body) { session.execute(String.format("insert into %s.%s JSON '%s'", keyspace, table, body)); return ResponseEntity.ok("OK");}

Keyspace Table

Raw JSON

curl --header "Content-Type: application/json" -X POST -v "localhost:8080/twotwo/user" --data '{"username": "trev2", "country": null, "emails": ["[email protected]", "[email protected]"], "first_name": "trevor", "last_name": "bunting", "movies": null}'

Page 12: Cassandra London - 2.2 and 3.0

@chbatey

User defined functions• Run code on the server !Dangerous!• Java + JavaScript supported out of the box• javax.script implementations should work

Page 13: Cassandra London - 2.2 and 3.0

@chbatey

UDF exampleCREATE TABLE user ( username text primary key, first_name text , last_name text , emails set<text> , country text);

Page 14: Cassandra London - 2.2 and 3.0

@chbatey

Concat

CREATE FUNCTION name ( first_name text, last_name text ) CALLED ON NULL INPUT RETURNS text LANGUAGE java AS ‘ return first_name + " " + last_name; ‘;

cqlsh:twotwo> select name(first_name, last_name) FROM user;

twotwo.name(first_name, last_name)------------------------------------ Christopher Batey

Page 15: Cassandra London - 2.2 and 3.0

@chbatey

User defined aggregatesCREATE AGGREGATE average ( int ) SFUNC averageState STYPE tuple<int,bigint> FINALFUNC averageFinal INITCOND (0, 0);

Called for every row state passed between

Initial state

Return type (CQL)

Optional function called onfinal state

Page 16: Cassandra London - 2.2 and 3.0

@chbatey

State functionCREATE FUNCTION averageState ( state tuple<int,bigint>, value int ) CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java AS ' if (val != null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state; ';

Type Columns

Page 17: Cassandra London - 2.2 and 3.0

@chbatey

Final functionCREATE FUNCTION averageFinal ( state tuple<int,bigint> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS ' if (state.getInt(0) == 0) return null; double r = state.getLong(1) / state.getInt(0); return Double.valueOf(r); ';

State typeOverall return type

Page 18: Cassandra London - 2.2 and 3.0

@chbatey

Putting it all together

Page 19: Cassandra London - 2.2 and 3.0

@chbatey

Customer events

CREATE AGGREGATE count_by_type(text) SFUNC countEventTypes STYPE map<text, int> INITCOND {};

CREATE FUNCTION countEventTypes( state map<text, int>, type text ) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = 1; else count = count + 1; state.put(type, count); return state; ';

Page 20: Cassandra London - 2.2 and 3.0

@chbatey

Customer events

Page 21: Cassandra London - 2.2 and 3.0

@chbatey

Built in aggregates• count• max• min• avg• sum

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java

Page 22: Cassandra London - 2.2 and 3.0

@chbatey

Built in time functions

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java

Page 23: Cassandra London - 2.2 and 3.0

@chbatey

Built in aggregates in action

Page 24: Cassandra London - 2.2 and 3.0

@chbatey

“Materialised views” with Spark

Page 25: Cassandra London - 2.2 and 3.0

@chbatey

Pure C*

Page 26: Cassandra London - 2.2 and 3.0

@chbatey

Small print• Compressed commit log• Resumable bootstrapping• Stop individual compactions• New types- smallint - short- tinyint - byte- date - time• Warnings now sent back to client- batch too large

Page 27: Cassandra London - 2.2 and 3.0

@chbatey

Time

Page 28: Cassandra London - 2.2 and 3.0

@chbatey

Customer events tableCREATE TABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))

create INDEX on customer_events (staff_id) ;

Page 29: Cassandra London - 2.2 and 3.0

@chbatey

Indexes to the rescue?customer_id time staff_idchbatey 2015-03-03 08:52:45 trevorchbatey 2015-03-03 08:52:54 trevorchbatey 2015-03-03 08:53:11 billchbatey 2015-03-03 08:53:18 billrusty 2015-03-03 08:56:57 billrusty 2015-03-03 08:57:02 billrusty 2015-03-03 08:57:20 trevor

staff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbateybill rustybill rustytrevor rusty

Page 30: Cassandra London - 2.2 and 3.0

@chbatey

Indexes to the rescue?

staff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbatey

staff_id customer_idbill rustybill rustytrevor rusty

A B

chbatey rusty

customer_id time staff_idchbatey 2015-03-03 08:52:45 trevorchbatey 2015-03-03 08:52:54 trevorchbatey 2015-03-03 08:53:11 billchbatey 2015-03-03 08:53:18 billrusty 2015-03-03 08:56:57 billrusty 2015-03-03 08:57:02 billrusty 2015-03-03 08:57:20 trevor

customer_events tablestaff_id customer_idtrevor chbateytrevor chbateybill chbateybill chbateybill rustybill rustytrevor rusty

staff_id index

Page 31: Cassandra London - 2.2 and 3.0

@chbatey

Do it your self indexCREATE TABLE if NOT EXISTS customer_events ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))

CREATE TABLE if NOT EXISTS customer_events_by_staff ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (staff_id, time))

Page 32: Cassandra London - 2.2 and 3.0

@chbatey

1.2 Logged batchesclient

C BATCH LOG

BL-R

BL-R

BL-R: Batch log replica

Page 33: Cassandra London - 2.2 and 3.0

@chbatey

Pattern• Write only:- Duplicate with a different primary key- (Optional) Logged batch for eventual consistency• Full updates:- No real difference• Partial updates:- No staff id in update?

Page 34: Cassandra London - 2.2 and 3.0

@chbatey

Page 35: Cassandra London - 2.2 and 3.0

@chbatey

KillrWeather data model

Page 36: Cassandra London - 2.2 and 3.0

@chbatey

KillrWeather data model

Page 37: Cassandra London - 2.2 and 3.0

@chbatey

KillrWeather data model

Page 38: Cassandra London - 2.2 and 3.0

@chbatey

KillrWeather data modelINSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station1', 2012, 12, 25, 1, 'GB', 'Cumbria', 14.0, 20) ;

INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station2', 2012, 12, 25, 1, 'GB', 'Cumbria', 4.0, 2) ;

INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station3', 2012, 12, 25, 1, 'GB', 'Greater London', 16.0, 10) ;

Page 39: Cassandra London - 2.2 and 3.0

@chbatey

Querying by state?

Page 40: Cassandra London - 2.2 and 3.0

@chbatey

Combining aggregates + MVs

Page 41: Cassandra London - 2.2 and 3.0

@chbatey

Including the month

Page 42: Cassandra London - 2.2 and 3.0

@chbatey

Fine print - all subject to change• Primary key columns + one other in your MV primary key• Un-used Primary key columns are added to the end of

your MV PK• If the part of your primary key is NULL then it won't

appear in the materialised view• This is not free!

Page 43: Cassandra London - 2.2 and 3.0

@chbatey

Conclusions• We still denormalise and duplicate to achieve scalability

and performance• We just let C* do it for us :)

Page 44: Cassandra London - 2.2 and 3.0

@chbatey

• Robert Stupp (Contentteam AG) - UDA/Fs

• Carl Yeksigian (DataStax) - Materalised views

• Jason Brown - Gossip• Christos Kalantzis (Netflix) -

Chaos Money & Cassandra