Top Banner
16

Apache cassandra architecture internals

Apr 16, 2017

Download

Data & Analytics

Bhuvan Rawal
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Apache cassandra architecture internals

APACHE CASSANDRAArchitecture & Internals

BHUVAN RAWA L

SNAPDEAL .COM

Page 2: Apache cassandra architecture internals

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

NOSQL-DATABASE.ORG

> MASSIVELY SCALABLE

> PARTITIONED ROW STORE

> MASTERLESS ARCHITECTURE

> LINEAR SCALABILITY

> NO SINGLE POINT OF FAILURE

>  MULTIPLE DC SUPPORT OUT OF BOX

Page 3: Apache cassandra architecture internals

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

2008Open sourced by Facebook on Google Code, in

2009 became an Apache Incubator Project. In

2010 gained top level status at Apache.

Page 4: Apache cassandra architecture internals

Can be adapted for different

class of use cases

GENERAL PURPOSECan be available at the loss of

Node/Rack/DC

AVAILABLE

BHUVAN RAWAL

KEY FEATURES

CASSANDRA - AN OVERVIEW

Seamless distribution across

datacentres across continents

DISTRIBUTED

Page 5: Apache cassandra architecture internals

JVM Heap & GC Algorithms

Compaction Strategy

Key Cache Size

Row Cache

Compression Chunk Size

Speculative Retries

Throughput vs Latency tuning

KEY TUNABLES

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

Page 6: Apache cassandra architecture internals

Cassandra is the most popular wide column

store - Wikipedia

Deployed by 400+ Fortune-500 Firms 

667 Companies Verified  on siftery

Apple 100,000+ Node Deployment

Netflix - 95% Data on Cassandra

Uber - 20 Cassandra Clusters, soon will be 100

Spotify - 100+ Production Clusters 

SOME USERS

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

Page 7: Apache cassandra architecture internals

Determines how data is to be stored in

nodes

Should be same across the cluster

Ordered Partitioner

Random Partitioner

Murmur3 Partitioner

PARTITIONER

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

Page 8: Apache cassandra architecture internals

Determines node placement

Allows to spread enough replicas to

handle failures

Failure Modes : Node -> Rack -> DC ->

Region

Tries its best to not have same replica in

same rack

SNITCH

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

Page 9: Apache cassandra architecture internals

status

health

tokens

schema version

data size

phi_threshold

GOSSIP PROTOCOL

BHUVAN RAWAL

CASSANDRA - AN OVERVIEW

Page 10: Apache cassandra architecture internals

As with most databases, data model is the key

to successful deployments & scalability

Test thoroughly on stage env

Avoid Client Side joins as far as possible

Materialized view - Boon for automated

denormalization

Tune Partition size to not affect cluster

abnormally

DATA MODEL

WWW.AUGUSTA&CO.COM

CASSANDRA - AN OVERVIEW

Page 11: Apache cassandra architecture internals

BHUVAN RAWAL

TEAM

Operations Manager

CASSANDRA - AN OVERVIEW

Page 12: Apache cassandra architecture internals

BHUVAN RAWAL

TEAM

CEO / Director

NANCY D. BROOKSHead Architect

RICHARD B. BEVERIDGEOperations Manager

JOHN V. POWELL

CASSANDRA - AN OVERVIEW

Page 13: Apache cassandra architecture internals

WWW.AUGUSTA&CO.COM

CASSANDRA - AN OVERVIEW

Datastax Driver for Spark:

-> Reads localized data off

Cassandra Nodes

-> Support for Hadoop

-> Pig, Hive, Squoop, Mahout

-> Solr integration

ANALYTICS SUPPORT

Page 14: Apache cassandra architecture internals

BHUVAN RAWA L

CASSANDRA - AN OVERVIEW

-> Memtable

-> SSTable - Sorted String

-> Index

-> Partition Summary

-> Bloom Filter

-> Compression

STORAGE

Page 15: Apache cassandra architecture internals

BHUVAN RAWAL

FELLOW DATASTORES

HBASE

RIAK MONGODB

AEROSPIKE BIGTABLE

SCYLLA

CASSANDRA - AN OVERVIEW

Page 16: Apache cassandra architecture internals

THANK YOU!  Bhuvan Rawal