Coursera, Cassandra, Java Drivers
Coursera, Cassandra, Java Drivers
Biography
Daniel Chia @DanielJHChia
Software Engineer, Infrastructure Team
2
1 Introduction
2 Why We Chose Cassandra
3 Example Use Cases
4 Pain Points
5 Java Drivers
Coursera
4
5
6
Web iOS Android
Why Cassandra
7
Coursera Tech Stack
• 100% AWS • MySQL + Cassandra • Service-oriented
8
Consistently Fast Latencies
9
Availability
10
Scalability
11
Use Case #1
• Resume video where you left off • High write volume • TTL data
12
13
CREATE TABLE video_progress_kvs_basic ( user_id int, course_id varchar, video_id varchar, viewed_up_to bigint, updated_at bigint PRIMARY KEY ((user_id, course_id, video_id)));
Use Case #2: Media Asset Service
14
15
16
Use case #3: Video Workflows
17
Input.mp4
Step 1: Audio
Step 2: Low Res Video
Step 3: High Res Video
Assembly 1: Crash
Assembly 2: Ok
Assembly 3: Crash
Assembly 4: Ok
Assembly 5: Ok
18
CREATE TABLE transloadit_workflow ( workflow_id text, step_id text, assembly_id text, step_details text, step_payload map<text, text>, step_status text, PRIMARY KEY (workflow_id, step_id, assembly_id))
19
20
Looking Back
Cassandra - Initial Pain Points
• Can’t execute arbitrary queries • Filtering, sorting, etc.
• Can’t be abused as an OLAP database
• Worries about ‘eventual’ consistency
21
Gotchas
• Lots of truly ad-hoc queries is hard • Don’t use C* directly to explore your data. (Spark?)
• Sorting, filtering can be hard • Consider Solr / ElasticSearch • Or even MySQL depending on load / importance
22
Helpful Things
• Data modeling consulting
• Monitoring
• Data access layer for common use cases
23
24
25
Java Drivers
Best Practices
• Driver Choice • Cluster / Connection Setup • Executing Queries
27
28
Datastax Java Drivers
29
public class Scratch { static Cluster cluster;
public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra") .build();
readRow("asset:QoMqLLyCEeSOi3paAormVw");
cluster.close(); }
static void readRow(String id) { Session session = cluster.connect("asset");
ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id);
System.out.println(result.one()); session.close(); }}
30
cluster = Cluster.builder() .addContactPoint("cassandra") .build();
31
LoadBalancingPolicy policy = new TokenAwarePolicy( new DCAwareRoundRobinPolicy());
cluster = Cluster.builder() .addContactPoint(“cassandra") .withLoadBalancingPolicy(policy) .build();
32
cluster = Cluster.builder() .addContactPoint(“cassandra") .withLoadBalancingPolicy(policy)
.withRetryPolicy(retryPolicy) .build();
Default Retry Policy
• Retries read if enough replicas alive, but data fetch failed. • Retries write only for batched writes. • Retries next host on Unavailable. 2.0.11+ or 2.1.7 (JAVA-709)
33
Share Session!
34
public static void main(String args[]) { cluster = Cluster.builder()
.addContactPoint(“cassandra”).build();
readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg");
cluster.close(); }
static void readRow(String id) { Session session = cluster.connect("asset");
ResultSet result = session.execute( "SELECT * from asset_kvs_timestamp where part_key = ?", id);
System.out.println(result.one()); session.close(); }
35
public static void main(String args[]) { cluster = Cluster.builder() .addContactPoint("cassandra").build();
session = cluster.connect();
readRow("asset:QoMqLLyCEeSOi3paAormVw"); readRow("asset:7i2ClbKnEeSk_npaAormVw"); readRow("asset:KS1vywpGEeWKtzoMw4q1xg");
session.close(); cluster.close();}
static void readRow(String id) { ResultSet result = session.execute( "SELECT * from asset.asset_kvs_timestamp where part_key = ?", id);
System.out.println(result.one());}
Use prepared statements
• If doing query more than once • Better performance • Token aware routing
36
37
static PreparedStatement statement;
public static void main(String args[]) { …
session = cluster.connect(); statement = session.prepare( "SELECT * from asset.asset_kvs_timestamp where part_key = ?")
readRow("asset:QoMqLLyCEeSOi3paAormVw");
… }
static void readRow(String id) { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound);
System.out.println(result.one()); }
There Be Dragons.. JAVA-420
statement = session.prepare( "SELECT part_key, time_key, content from asset.asset_kvs_timestamp where part_key = ?")
38
Always specify columns explicitly for prepared statements!
Consider Async
static List<String> readRows(List<String> ids) { return ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSet result = session.execute(bound); return result.one().getString("c_enc"); }).collect(Collectors.toList());}
39
Async..
static ListenableFuture<List<String>> readRowsAsync(List<String> ids) { List<ListenableFuture<String>> futures = ids.stream().map(id -> { BoundStatement bound = statement.bind().setString("part_key", id); ResultSetFuture future = session.executeAsync(bound);
return Futures.transform(future, (ResultSet result) -> result.one().getString(“c_enc"));
}).collect(Collectors.toList());
return Futures.allAsList(futures);}
40
http://www.datastax.com/dev/blog/java-driver-async-queries
Thank you
Cassandra Summit 2016 September 7-9 San Jose, CA
Get 15% Off with Code: MeetupPromo Cassandrasummit.org