Top Banner
Things You Should Be Doing When Using Cassandra Drivers Rebecca Mills Junior Evangelist at Datastax @rebccamills
52

Things YouShould Be Doing When Using Cassandra Drivers

Jul 18, 2015

Download

Technology

Rebecca Mills
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Things YouShould Be Doing When Using Cassandra Drivers

Things You Should Be Doing When Using Cassandra Drivers

Rebecca Mills Junior Evangelist at Datastax @rebccamills

Page 2: Things YouShould Be Doing When Using Cassandra Drivers

What do I do?

2 Confidential

•  Try to create awareness for open source Cassandra

•  Develop content

•  Identify problems newcomers might be encountering

•  Develop strategies and material to help with that first ease of initial use

Page 3: Things YouShould Be Doing When Using Cassandra Drivers

Of course all this extends to drivers!

Confidential 3

•  Learning and playing with the drivers as much as I can

•  Develop “Getting Started” tutorials for drivers in various programming languages

•  Making it my mission to bring the details to light

Page 4: Things YouShould Be Doing When Using Cassandra Drivers

So How Can We Communicate with Cassandra in “X” Language?

Confidential 4

Page 5: Things YouShould Be Doing When Using Cassandra Drivers

We have what you need!

Confidential 5

•  Datastax provides drivers for Java, Python, C#

•  Fresh out of the oven Ruby, Node.js, and C++

•  Also loads of open source drivers to chose from

•  Check out the Planet Cassandra Client Drivers section

Page 6: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 6

Let’s get into some of the basics of smart Cassandra driver usage:

Page 7: Things YouShould Be Doing When Using Cassandra Drivers

1. One Cluster instance per cluster

Confidential 7

•  Configure different important aspects of the way connections and queries will be handled.

•  Contact points •  Retry Policies •  Load Balancing Policies

cluster  =  Cluster(['10.1.1.3',  '10.1.1.4',  '10.1.1.5'],          compression=True,          load_balancing_policy=TokenAwarePolicy(                  DCAwareRoundRobinPolicy(local_dc='US_EAST')))  

Page 8: Things YouShould Be Doing When Using Cassandra Drivers

2. One Session per keyspace

Confidential 8

•  Query execution, connection pooling •  Long-lived object •  Not to be used in a request/response short-lived

fashion •  Share the same cluster and session instances

across your application

Page 9: Things YouShould Be Doing When Using Cassandra Drivers

Cluster & Session

Confidential 9

cluster  =  Cluster(['10.1.1.3',  '10.1.1.4',  '10.1.1.5'],          compression=True,          load_balancing_policy=TokenAwarePolicy(                  DCAwareRoundRobinPolicy(local_dc='US_EAST')))    session  =  cluster.connect('demo')  

Page 10: Things YouShould Be Doing When Using Cassandra Drivers

3. Use Prepared Statements

Confidential 10

•  If you execute a statement more than once

•  Has multiple benefits

•  Prepare once, bind and execute multiple times

•  We’ll talk more about this soon!

Page 11: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 11

Cool

Useful

Page 12: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 12

Page 13: Things YouShould Be Doing When Using Cassandra Drivers

Deep Dives:

Confidential 13

•  Prepared Statements •  Load Balancing Policies •  Retry Policies •  Connection Pooling

•  Async API

Page 14: Things YouShould Be Doing When Using Cassandra Drivers

Why use Prepared Statements?

Confidential 14

•  More performant than using strings •  Will be parsed only once on the server •  We expect you to use them with repeated queries in

production •  Avoid CQL injection

Page 15: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 15

Consider a string session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Jones’,  35,  ‘Austin’,  ‘[email protected]’,  ‘Bob’)  

 

"""

Page 16: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 16

session.execute("""  

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Smith’,  24,  ‘Tampa’,  ‘[email protected]’,  ‘Bob’)  

 

""")  

 

session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Power’,  45,  ‘New  York’,  ‘[email protected]’,  ‘Kate’)  

 

""")  

 

session.execute(""”  

 

INSERT  INTO  users  (lastname,  age,  city,  email,  firstname)  VALUES  (‘Renolds’,  33,  ‘Miami’,  ‘[email protected]’,  ‘Carl’)  

 

""")  

Page 17: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 17

Now the same, as a prepared statement  

Prepared_stmt  =  session.prepare  (“INSERT  INTO  users  (lastname,  age,  city,  email,                        firstname)  VALUES  (?,  ?,  ?,  ?,  ?)”)  

Bound_stmt  =  prepared.bind([‘Jones’,  35,  ‘Austin’,  ‘[email protected]’,  ‘Bob’])  

Stmt  =  session.execute(bound_stmt)      

Page 18: Things YouShould Be Doing When Using Cassandra Drivers

What’s the difference?

Confidential 18

Page 19: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 19

Client Cassandra Entire Query String

Client Cassandra Query ID & Bound Values

INSERT with strings

INSERT with PreparedStatements

Large amount of data Parse cost

Smaller amount of data No parsing

Page 20: Things YouShould Be Doing When Using Cassandra Drivers

So what does that mean to me?

Confidential 20

Page 21: Things YouShould Be Doing When Using Cassandra Drivers

Speed!

Confidential 21

Page 22: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 22

http://techblog.netflix.com/2013/12/astyanax-update.html

Page 23: Things YouShould Be Doing When Using Cassandra Drivers

Prepared Statements

Confidential 23

Putting a prepared statement in a for loop is an anti-pattern  for  (int  i;  i  <  10;  i++)  {      PreparedStatement  ps  =  session.prepare("UPDATE  user  SET  disabled  =  1  WHERE  id  =  ?");  

           session.execute(ps.bind(i));  }  

Page 24: Things YouShould Be Doing When Using Cassandra Drivers

Load Balancing

Confidential 24

•  A load balancing policy will determine which node to run an insert or query.

•  Since a client can read or write to any node, sometimes that can be inefficient.

•  If a node receives a read or write owned on another node, it will coordinate that request for the client.

•  We can use a load balancing policy to control that action.

Page 25: Things YouShould Be Doing When Using Cassandra Drivers

Load Balancing deep dive

Confidential 25

Using this example

Cluster cluster = new Cluster! .builder().! .addContactPoint(“10.0.0.1”)! .withRetryPolicy(DefaultRetryPolicy.INSTANCE)! .withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!

Page 26: Things YouShould Be Doing When Using Cassandra Drivers

Example data model

Confidential 26

CREATE TABLE users (!

username text PRIMARY KEY!

firstName text,!

lastName text!

);!

!

INSERT INTO users (username, firstName, lastName)!

VALUES (‘rmills’, ‘Rebecca’, ‘Mills’);!

!

INSERT INTO users (username, firstName, lastName)!

VALUES (‘pmcfadin’, ‘Patrick’, ‘McFadin’);!

!

Page 27: Things YouShould Be Doing When Using Cassandra Drivers

Discover cluster

Confidential 27

Client .addContactPoint(“10.0.0.1”)!

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

RF=3

Page 28: Things YouShould Be Doing When Using Cassandra Drivers

Populate connection pool

Confidential 28

10.0.0.1 00-25

Client

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC1!

DC1!

Page 29: Things YouShould Be Doing When Using Cassandra Drivers

Request for data

Confidential 29

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

rmills Murmur3 Hash Token = 15!

DC1!

Page 30: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 30

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

withLoadBalancingPolicy(! new TokenAwarePolicy(!

DC1!

Page 31: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 31

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1! Which node?

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 32: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 32

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1!

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 33: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 33

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

Token = 15!

DC1!

DC1!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 34: Things YouShould Be Doing When Using Cassandra Drivers

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Token Aware

Confidential 34

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Page 35: Things YouShould Be Doing When Using Cassandra Drivers

Token Aware

Confidential 35

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 36: Things YouShould Be Doing When Using Cassandra Drivers

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Token Aware - Retry

Confidential 36

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘rmills’;!

DC1!

DC1!

withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!

Retry Timeout

Page 37: Things YouShould Be Doing When Using Cassandra Drivers

Without Token Aware

Confidential 37

Using this modified example

Cluster cluster = new Cluster! .builder().! .addContactPoint(“10.0.0.1”)! .withRetryPolicy(DefaultRetryPolicy.INSTANCE)! .withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Page 38: Things YouShould Be Doing When Using Cassandra Drivers

Request for data

Confidential 38

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

pmcfadin Murmur3 Hash Token = 77!

DC1!

Page 39: Things YouShould Be Doing When Using Cassandra Drivers

No Token Aware

Confidential 39

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 40: Things YouShould Be Doing When Using Cassandra Drivers

Data placement

Confidential 40

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

Page 41: Things YouShould Be Doing When Using Cassandra Drivers

Standard Round Robin

Confidential 41

Client 10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!

Token = 77!

DC1!

DC1!

.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50 Coordinate

Page 42: Things YouShould Be Doing When Using Cassandra Drivers

Load Balancing

Confidential 42

•  Default pre-java 2.0.2: RoundRobinPolicy •  Now: TokenAwarePolicy – Adds token awareness to

a child policy •  Acts as a filter, wraps around another policy •  Used to reduce network hops, as only replicas will

be considered

Page 43: Things YouShould Be Doing When Using Cassandra Drivers

Load Balancing - Whitelist

Confidential 43

•  Ensures only the hosts from a provided list are used

•  Wraps a child policy

•  Used to limit the effects of automatic peer discovery

•  Execute queries only a given list of hosts

Page 44: Things YouShould Be Doing When Using Cassandra Drivers

Asynchronous Statements

Confidential 44

•  Native binary protocol supports request pipelining

•  A single connection can be used for single simultaneous and independent request/response exchanges

Page 45: Things YouShould Be Doing When Using Cassandra Drivers

Asynchronous Statements

Confidential 45

•  Don’t have to wait for a query to complete and return rows directly, non-blocking IO

•  Method almost immediately returns a future  object

Node Client

Page 46: Things YouShould Be Doing When Using Cassandra Drivers

Asynchronous Statements

Confidential 46

query  =  "SELECT  *  FROM  users  WHERE  lastname=%s"  future  =  session.execute_async(query,  [lastname])    #  ...  do  some  other  work    try:          rows  =  future.result()          user  =  rows[0]          print  user.name,  user.age  except  ReadTimeout:          log.exception("Query  timed  out:")  

Page 47: Things YouShould Be Doing When Using Cassandra Drivers

Asynchronous Statements

Confidential 47

 #  build  a  list  of  futures  futures  =  []  query  =  "SELECT  *  FROM  users  WHERE  lastname=%s"  for  user_id  in  ids_to_fetch:          futures.append(session.execute_async(query,  [lastname])    #  wait  for  them  to  complete  and  use  the  results  for  future  in  futures:          rows  =  future.result()          print  rows[0].name,  rows[0].age  

Page 48: Things YouShould Be Doing When Using Cassandra Drivers

Where can I download the drivers?

Confidential 48

Page 49: Things YouShould Be Doing When Using Cassandra Drivers

Planet Cassandra

Confidential 49

•  A great place for Apache Cassandra resources!

•  Blog post, webinars, tutorials, and much much more!

•  Also a great place for your driver needs

Page 50: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 50

Page 51: Things YouShould Be Doing When Using Cassandra Drivers

Confidential 51

Page 52: Things YouShould Be Doing When Using Cassandra Drivers

Thank You!Twitter: @rebccamills

Confidential 52