Top Banner
©2014 DataStax Confidential. Do not distribute without consent. @rstml Rustam Aliyev Solution Architect Deep dive into CQL and CQL improvements in Cassandra 2.1 1
33

Deep dive into CQL

Dec 01, 2014

Download

Technology

Rustam Aliyev

How Cassandra database represents on storage layer various CQL types.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep dive into CQL

©2014 DataStax Confidential. Do not distribute without consent.

@rstml

Rustam Aliyev Solution Architect

Deep dive into CQL and CQL improvements in Cassandra 2.1

1

Page 2: Deep dive into CQL

What is CQL? * Cassandra Query Language (CQL)

* SQL-like language for communicating with Cassandra

* Simpler than the Thrift API

* An abstraction layer that hides implementation details

This is what we want to understand

Page 3: Deep dive into CQL

Use Case * Messaging Application

* Group Conversations

* Attachments

Page 4: Deep dive into CQL

Simple CQL Table

CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) );

Page 5: Deep dive into CQL

TimeUUID * Also known as a Version 1 UUID

* Sortable

Timestamp to Microsecond + UUID = TimeUUID

04d580b0-9412-11e3-baa8-0800200c9a66 12 February 2014 13:18:06 GMT

http://www.famkruithof.net/uuid/uuidgen"

=

Page 6: Deep dive into CQL

Primary Key

CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender uuid, PRIMARY KEY (conversation_id, message_id) );

Partition Key Clustering Column

* Also Primary Index

Page 7: Deep dive into CQL

Partition Key conversation_id: 04d580b0-9412-…9a66

Replica * Determines partition (and replicas)

* Remaining columns are stored on the determined partition

RF=3

Page 8: Deep dive into CQL

Clustering Column

Merged, Sorted and Stored Sequentially

04d580b0-9412-…9a66

2013-04-03 07:01:00 content: Hi! sender: [email protected]

2013-04-03 07:03:20 content: Hello! Sender: tom@example…

2013-04-03 07:04:52 content: Where are you? sender: [email protected]

2013-04-03 07:05:01 content: in Istanbul sender: tom@example…

2013-04-03 07:06:32 content: wow! how come sender: [email protected]

* Data on disk is ordered based on Clustering Column

* Efficient retrieval with range queries (slice)

SELECT * FROM messages WHERE conversation_id = '04d580b0-9412-…9a66' AND message_id > minTimeuuid('2013-04-03 07:04:00') AND message_id < maxTimeuuid('2013-04-03 07:10:00');

Page 9: Deep dive into CQL

Data on Disk

Partition Key (Row Key)

Column Name 1 Column Value 1

Column Name 2 Column Value 2

Column Name 3 Column Value 3

...

Column Name N Column Value N

Page 10: Deep dive into CQL

Data on Disk

04d580b0-9412-3a00-93d1-46196ee79a66

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:content Hello!

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

...

Clustering Column (message_id) Column Name Column Value

Partition Key (conversation_id)

 INSERT  INTO  messages  (conversation_id,  message_id,  content,  sender)  VALUES      (04d580b0-­‐9412-­‐3a00-­‐93d1-­‐46196ee79a66,  2f3feb0f-­‐9c24-­‐11e2-­‐7f7f-­‐7f7f7f7f7f7f,          'Hello!',  '[email protected]');  

Page 11: Deep dive into CQL

Order of Clustering Keys

CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) ) WITH CLUSTERING ORDER BY (message_id DESC);

* We need only most recent N messages

* Storing messages in reverse TimeUUID order will speedup queries

Page 12: Deep dive into CQL

Static Columns

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) );

* Let’s add conversation owner (admin)

* Owner is related to conversation (Partition Key) not message (Clustering Key)

Page 13: Deep dive into CQL

Static Columns

UPDATE messages SET conversation_owner = '[email protected]' WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66;

* Same UPDATE with non-static field will fail

Page 14: Deep dive into CQL

Static Columns on Disk

04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:content Hello!

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

...

Static Column

Page 15: Deep dive into CQL

Collections: Set

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, PRIMARY KEY (conversation_id, message_id) );

* We want to keep message recipients

* List of recipients may vary as people join and leave conversation

Page 16: Deep dive into CQL

Collections: Set UPDATE messages SET recipients = {'[email protected]', '[email protected]'} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Page 17: Deep dive into CQL

Set on Disk

04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

...

Set

Page 18: Deep dive into CQL

Collections: Map

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );

* Let’s add attachments to message

* Each attachment would have name and location (URI)

Page 19: Deep dive into CQL

Collections: Map

UPDATE messages SET attachments = {'picture.png':'http://cdn.exmpl.com/1234.png', 'audio.wav':'http://cdn.exmpl.com/5678.wav'} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Page 20: Deep dive into CQL

Map on Disk 04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:attachments:picture.png http://cdn.exmpl.com/1234.png

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:attachments:audio.wav http://cdn.exmpl.com/5678.wav

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

...

Map Name Key Value

Page 21: Deep dive into CQL

Collections: List

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, attachments map<text,text>, seen_by list<text>, PRIMARY KEY (conversation_id, message_id) );

* We want to know which participants have seen message and preserve order

Page 22: Deep dive into CQL

Collections: List UPDATE messages SET seen_by = ['[email protected]', '[email protected]'] WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Page 23: Deep dive into CQL

List on Disk 04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-...-7f7f-7f7f7f7f7f7f:seen_by:26017c10-f487-11e2-801f-df9895e5d0f8 [email protected]

dbcd9d0f-...-7f7f-7f7f7f7f7f7f:seen_by:26017c11-f487-11e2-801f-df9895e5d0f8 [email protected]

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

...

List Name Element ID (TimeUUID) Value

Page 24: Deep dive into CQL

User Defined Types (UDT)

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,attachment>, PRIMARY KEY (conversation_id, message_id) );

* New in Cassandra 2.1

* Let’s add more attributes to attachments

CREATE TYPE attachment ( size int, mime text, uri text );

Page 25: Deep dive into CQL

User Defined Types UPDATE messages SET attachments = attachments + { 'picture.png': { size: 10240, mime: 'image/png', uri: 'http://cdn.exmpl.com/1234.png' }} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Page 26: Deep dive into CQL

UDT on Disk 04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

dbcd9d0f-...-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-...-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-...-7f7f7f7f7f7f:attachments:picture.png 10240:'image/png':'http://cdn.exmpl.com/1234.png'

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

...

Map Key UDT Value

Page 27: Deep dive into CQL

Secondary Indexes

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );

* What if we want to lookup messages by sender?

CREATE INDEX sender_idx ON messages(sender); "

Page 28: Deep dive into CQL

Secondary Indexes

Page 29: Deep dive into CQL

Secondary Indexes Internally

sender_idx { "[email protected]" { 54bbfd0f-9c02-11e2-7f7f-7f7f7f7f7f7f : null, df04610f-9c02-11e2-7f7f-7f7f7f7f7f7f : null }, "[email protected]" { a82e4b0f-9c02-11e2-7f7f-7f7f7f7f7f7f : null } }

* Each node will keep reverse index for local data only

Page 30: Deep dive into CQL

Indexes on Collections

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );

* New in Cassandra 2.1

CREATE INDEX recipients_idx ON messages(recipients); "

Page 31: Deep dive into CQL

Indexes on Collections

Page 32: Deep dive into CQL

Way more information

• 5 minute interviews • Use cases • Free training!

www.planetcassandra.org

Page 33: Deep dive into CQL

Questions?