Cassandra A Decentralized Structured Storage System Avinash Lakshman Prashant Malik Facebook Facebook Presented by Sameera Nelson
Jan 15, 2015
CassandraA Decentralized Structured Storage System
Avinash Lakshman Prashant Malik Facebook Facebook
Presented by Sameera Nelson
Outline …
Introduction
Data Model
System Architecture
Bootstrapping & Scaling
Local Persistence
Conclusion
What is Cassandra ?
Distributed Storage System
Manages Structured Data
Highly available , No SPoF
Not a Relational Data Model
Handle high write throughput
◦ No impact on read efficiency
Motivation
Operational Requirements in Facebook
◦ Performance
◦ Reliability/ Dealing with Failures
◦ Efficiency
◦ Continues Growth
Application◦ Inbox Search Problem, Facebook
Related Work
Google File System◦ Distributed FS, Single master/Slave
Ficus/ Coda
◦ Distributed FS
Farsite
◦ Distributed FS, No centralized server
Bayou◦ Distributed Relational DB System
Dynamo
◦ Distributed Storage system
Data Model
Data Model
Figure from Eben Hewitt’s slides.
• Table• Multidimensional map indexed by key
• Columns • Grouped in to Column Families• Simple• Super (Nested Column Families)
• Column has• Name/ Value/ Timestamp
Data Model
Supported Operations
insert(table; key; rowMutation)
get(table; key; columnName)
delete(table; key; columnName)
Query Language
CREATE TABLE users
( user_id int PRIMARY KEY,
fname text,
lname text );
INSERT INTO users
(user_id, fname, lname) VALUES (1745, 'john', 'smith');
SELECT * FROM users;
System Architecture
Fully Distributed …No Single Point of Failure
Cassandra Architecture
PartitioningData distribution across nodes
ReplicationData duplication across nodes
Cluster MembershipNode management in cluster
adding/ deleting
Partitioning
The Token Ring
Partitioning Partitions using Consistent hashing
Partitioning Assignment in to the relevant partition
Replication
Based on configured replication factor
Replication
Different Replication Policies
◦Rack Unaware
Replicate at N-1 nodes
◦Rack Aware
Zookeeper, using a leader
◦Data center Aware
similar to Rack Aware, leader chosen at
Datacenter level.
Cluster Membership
Based on scuttlebutt
Efficient Gossip based mechanism
Inspired for real life rumor
spreading.
Anti Entropy protocol
◦ Repair replicated data by comparing &
reconciling differences
Cluster Membership
Gossip Based
Cluster Membership
Failure Detection◦ Accrual Failure Detector
If a node is faulty, the suspicion level increases.
Φ(t) k as t kk - threshold variable
◦ If node is correct
Φ(t) = 0
Bootstrapping & Scaling
Bootstrapping & ScalingBootstrapping
◦Node selects random token
◦Locally persisted, gossiped to cluster
Scaling
◦Cassandra bootstrap algorithm initiated by
operator
◦New node get a spitted range of heavily
loaded node
Local Persistence
Local Persistence
Write Operation
Local Persistence
Write Operation
◦Flush to disk after threshold
◦Sequential Entries, Index per each
◦Data file merging
◦Rolling Commit logs
Local Persistence
Read Operation
◦Indexes all data on primary key
◦Maintain column indicesRead
Data
Conclusion
Conclusion
Proven high scalability, performance, and
wide applicability
Very high update throughput, delivering
low latency
Future work
◦ Adding compression
◦ Support atomicity across keys
◦ Secondary index support
Thank You