Top Banner
1
31
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cassandra Essentials Day Cambridge

1

Page 2: Cassandra Essentials Day Cambridge

2

Page 3: Cassandra Essentials Day Cambridge

We specialize in addressing 5 major KPIS Velocity - achieve goals faster Security Availability Performance Cost And help in strategies to optimize those KPIs

3

Page 4: Cassandra Essentials Day Cambridge

• Explosion in data volumes –More types of data tracked –Higher detail levels –Retention over time

•Need to manage with existing staff •Globally-distributed data •Rise of public clouds for real business use •Limited budgets and resources

4

Page 5: Cassandra Essentials Day Cambridge

5

Page 6: Cassandra Essentials Day Cambridge

• Peer-to-peer architecture • No single master or directory node • Data divided into partitions with primary keys • Allocated to a node using a hash function • Clients can connect to any node

6

Page 7: Cassandra Essentials Day Cambridge

• Additional nodes bring extra storage and processing capacity • Vertical scaling too • No built-in size limits • Data distributed by hash function - no central directory required

7

Page 8: Cassandra Essentials Day Cambridge

8

Page 9: Cassandra Essentials Day Cambridge

• Log-structured data format appends all changes • Particularly fast for writes • No read-before-write • High concurrency through load distribution • Smart routing to nearest replica

9

Page 10: Cassandra Essentials Day Cambridge

Log-structured data is a new concept for people coming from relational databases For speed, we never look at the current data That means that inserts and updates are the same In this example, Billy has his first name updated to William The last line is a deletion, or “tombstone” as referred to in Cassandra, and tracks that a deletion has taken place

10

Page 11: Cassandra Essentials Day Cambridge

• Data is replicated across multiple nodes • Administrator can control how many • All nodes can serve data • Peer-to-peer “gossip” of state information • Automatic retry and routing around failed nodes

11

Page 12: Cassandra Essentials Day Cambridge

• Supports multiple geographic regions • Simultaneously active in all • Automatic replication • Can run across both your own datacenter and the cloud • Can be used for workload isolation too

12

Page 13: Cassandra Essentials Day Cambridge

Easiest way to show is as an example Assuming replication factor of 3: all data is on all nodes Insert on A Data will be replicated everywhere Buy do you wait until all nodes respond? What if node C is down? OK, write one What if C comes back up? You read old data Fix: force read on a majority of nodes (quorum) for strong consistency model But you trade off overhead doing reads twice, and waiting for two writes

13

Page 14: Cassandra Essentials Day Cambridge

What if node C is down? A will hang. OK, write one What if C comes back up? You read old data Fix: force read on a majority of nodes (quorum) for strong consistency model But you trade off overhead doing reads twice, and waiting for two writes

14

Page 15: Cassandra Essentials Day Cambridge

What if C comes back up? You read old data (Cassandra has mechanisms to help catch up quickly though) Fix: force read on a majority of nodes (quorum) for strong consistency model But you trade off overhead doing reads twice, and waiting for two writes You can set this at the statement level

15

Page 16: Cassandra Essentials Day Cambridge

16

Page 17: Cassandra Essentials Day Cambridge

17

Page 18: Cassandra Essentials Day Cambridge

This afternoon you’ll hear a lot about data modeling in Cassandra A common theme: don’t treat it like an Excel spreadsheet Cassandra doesn’t do joins between tables Instead, data nesting can store similar data together

18

Page 19: Cassandra Essentials Day Cambridge

• Selects ideally on primary key • Duplicate data for frequently-used queries • But ad-hoc queries could have hundreds of combinations • Not efficient to duplicate at this level • Need to run aggregate functions: SUM(), GROUP BY, etc

19

Page 20: Cassandra Essentials Day Cambridge

20

Page 21: Cassandra Essentials Day Cambridge

21

Page 22: Cassandra Essentials Day Cambridge

DataStax OpsCenter: dashboarding and monitoring tool One-click management operations like rebalancing See the status of your entire cluster in one place

22

Page 23: Cassandra Essentials Day Cambridge

DataStax DevCenter: integrated development for environment Visual view of database structure Interactive query execution

23

Page 24: Cassandra Essentials Day Cambridge

24

Page 25: Cassandra Essentials Day Cambridge

25

Page 26: Cassandra Essentials Day Cambridge

26

Page 27: Cassandra Essentials Day Cambridge

If you’re not quite so skilled on Cassandra, Azure, or both, Pythian can help (pass onto Jeff?)

34

Page 28: Cassandra Essentials Day Cambridge

The result is competitive advantage by addressing most companies top KPIs—

velocity, efficiency, security, performance, and availability—through continuous

business transformation and operational excellence.

Page 29: Cassandra Essentials Day Cambridge
Page 30: Cassandra Essentials Day Cambridge

37

Page 31: Cassandra Essentials Day Cambridge

38