Top Banner
Building a Cassandra Based Application From 0 to Deploy Patrick McFadin Solution Architect at DataStax Wednesday, November 7, 12
27

Cassandra data modeling talk

Jan 15, 2015

Download

Technology

Patrick McFadin

This is similar to the talk I did for the Cassandra Summit but all examples are in CQL 3.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra data modeling talk

Building a Cassandra Based Application

From 0 to Deploy

Patrick McFadinSolution Architect at DataStax

Wednesday, November 7, 12

Page 2: Cassandra data modeling talk

Me

• Solution Architect at DataStax, THE Cassandra company

• Cassandra user since .7

• Follow me here: @PatrickMcFadin

Wednesday, November 7, 12

Page 3: Cassandra data modeling talk

Goals

• Take a new application concept

• What is the data model??

• Express that in CQL 3

• Some sample code

Wednesday, November 7, 12

Page 4: Cassandra data modeling talk

The Plan

• Conceptualize a new application

• Identify the entity tables

• Identify query tables

• Code. Rinse. Repeat.

• Deploy

Wednesday, November 7, 12

Page 5: Cassandra data modeling talk

Start with a concept

• Video sharing website

www.killrvideos.com

Video TitleRecommended

MeowAds

by Google

Comments

Description

Upload New!

Username

Rating: Tags: Foo Bar

*Cat drawing by goodrob13 on Flickr

Text

Wednesday, November 7, 12

Page 6: Cassandra data modeling talk

Break down the features

• Post a video

• View a video

• Add a comment

• Rate a video

• Tag a video

Wednesday, November 7, 12

Page 7: Cassandra data modeling talk

Create Entity Tables

Basic storage unit

Wednesday, November 7, 12

Page 8: Cassandra data modeling talk

Users

CREATE TABLE users ( username varchar, firstname varchar, lastname varchar, email varchar, password varchar, created_date timestamp, PRIMARY KEY (username));

Usernamepasswordfirstname lastname created_dateemail

• Similar to a RDBMS table. Fairly fixed columns • Username is unique• Use secondary indexes on firstname and lastname for lookup• Adding columns with Cassandra is super easy

Wednesday, November 7, 12

Page 9: Cassandra data modeling talk

Users: The insert codestatic void setUser(User user, Keyspace keyspace) { // Create a mutator that allows you to talk to casssandra Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

try {

// Use the mutator to insert data into our table mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("firstname", user.getFirstname())); mutator.addInsertion(user.getUsername(), "users”, HFactory.createStringColumn("lastname", user.getLastname())); mutator.addInsertion(user.getUsername(), "users", HFactory.createStringColumn("password", user.getPassword()));

// Once the mutator is ready, execute on cassandra mutator.execute();

} catch (HectorException he) { he.printStackTrace(); }}

Wednesday, November 7, 12

Page 10: Cassandra data modeling talk

Videos (one-to-many)

CREATE TABLE videos ( videoid uuid, videoname varchar, username varchar, description varchar, tags varchar, upload_date timestamp, PRIMARY KEY (videoid,videoname));

VideoId<UUID>

tagsvideoname username upload_datedescription

• Use a UUID as a row key for uniqueness• Allows for same video names• Tags should be stored in some sort of delimited format• Index on username may not be the best plan

Wednesday, November 7, 12

Page 11: Cassandra data modeling talk

Videos: The get codestatic Video getVideoByUUID(UUID videoId, Keyspace keyspace){ Video video = new Video(); //Create a slice query. We'll be getting specific column names SliceQuery<UUID, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, uuidSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("videos"); sliceQuery.setKey(videoId); sliceQuery.setColumnNames("videoname","username","description","tags");

// Execute the query and get the list of columns ColumnSlice<String,String> result = sliceQuery.execute().get(); // Get each column by name and add them to our video object video.setVideoName(result.getColumnByName("videoname").getValue()); video.setUsername(result.getColumnByName("username").getValue()); video.setDescription(result.getColumnByName("description").getValue()); video.setTags(result.getColumnByName("tags").getValue().split(",")); return video;}

Wednesday, November 7, 12

Page 12: Cassandra data modeling talk

Comments (many-to-many)

CREATE TABLE comments ( videoid uuid, username varchar, comment_ts timestamp, comment varchar, PRIMARY KEY (videoid,username,comment_ts));

VideoId<UUID>

username comment_ts comment

• Videos have many comments• Comments have many users• Order is as inserted• Use getSlice() to pull some or all of the comments

Wednesday, November 7, 12

Page 13: Cassandra data modeling talk

Comments... pt 2

• This is what’s really going on

• VideoID is the key

• Composite of username and comment_ts are the column name

• 1 column per comment

Wide rowTime ordered

VideoId<UUID>

username:comment_ts

comment

username:comment_ts

comment

..

..

Wednesday, November 7, 12

Page 14: Cassandra data modeling talk

Ratings

CREATE TABLE video_rating ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid));

VideoId<UUID>

rating_count rating_total

<counter> <counter>

• Use counter for single call update• rating_count is how many ratings were given• rating_total is the sum of rating• Ex: rating_count = 5, rating_total = 23, avg rating = 23/5 = 4.6

Wednesday, November 7, 12

Page 15: Cassandra data modeling talk

Video Events

CREATE TABLE video_event ( videoid_username varchar, event varchar, event_timestamp timestamp, video_timestamp bigint, PRIMARY KEY (videoid_username, event_timestamp, event)) WITH CLUSTERING ORDER BY (event_timestamp DESC, event ASC);

VideoId:Usernamestart_<timestamp> stop_<timestamp> start_<timestamp>

video_<timestamp>

Latest .. Oldest

• Track viewing events• Combine Video ID and Username for a unique row• Stop time can be used to pick up where they left off• Great for usage analytics later• Reverse comparator!

Wednesday, November 7, 12

Page 16: Cassandra data modeling talk

Create Query Tables

Indexes to support fast lookups

Wednesday, November 7, 12

Page 17: Cassandra data modeling talk

Index table principles

• Lookup by rowkey

• Indexed

• Cached (most times)

RowKey1

RowKey2

RowKey3

RowKey4RowKey5

RowKey6

RowKey7

RowKey8

RowKey9

RowKey10

RowKey11

RowKey12

Lookup5RowKey5

Wednesday, November 7, 12

Page 18: Cassandra data modeling talk

Index table principles

• Get row by the key

• Slice. Get data in one pass

• Cached (sometimes)

RowKey5 Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8

GetSlice6Col37Col6

Col3 Col4 Col5 Col6

Sequential Read

Wednesday, November 7, 12

Page 19: Cassandra data modeling talk

Video by Username

CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid, upload_date));

UsernameVideoId:<timestamp> .. VideoId:<timestamp>

Wide row

• Username is unique• One column for each new video uploaded• Column slice for time span. From x to y• VideoId is added the same time a Video record is added

Wednesday, November 7, 12

Page 20: Cassandra data modeling talk

Video by Tag

CREATE TABLE tag_index ( tag varchar, videoid varchar, timestamp timestamp, PRIMARY KEY (tag, videoid));

tagVideoId .. VideoId

timestamp timestamp

• Tag is unique regardless of video• Great for “List videos with X tag”• Tags have to be updated in Video and Tag at the same time• Index integrity is maintained in app logic

Wednesday, November 7, 12

Page 21: Cassandra data modeling talk

Deployment

• Replication factor?

• Multi-datacenter?

• Cost?

Wednesday, November 7, 12

Page 22: Cassandra data modeling talk

Deployment

• Today != tomorrow

• Scale when needed

• Have expansion plan ready

Wednesday, November 7, 12

Page 23: Cassandra data modeling talk

DataStax Enterprise

• Analytics - Hadoop

• Search - Solr

Wednesday, November 7, 12

Page 24: Cassandra data modeling talk

Hadoop

• Embedded with Cassandra

• No single point of failure

• Use native c* data

• Hive, Pig, Mahout

Wednesday, November 7, 12

Page 25: Cassandra data modeling talk

Solr

• Embeded with Cassandra

• Fast reverse-index

• Shards Solr by key range

Wednesday, November 7, 12

Page 26: Cassandra data modeling talk

OpsCenter

Wednesday, November 7, 12

Page 27: Cassandra data modeling talk

Thank you!

Connect with me at @PatrickMcFadinOr linkedIn

Wednesday, November 7, 12