Transcript
Breaking free fromrelational databases
Matija Gobec@mad_max0204
SmartCat@SmartCat_io
AgendaWhat’s wrong with RDBMS
What we need today
Introduction to Cassandra
Use-cases and benefits
Learning process and common mistakes
Migrating to Cassandra
What’s wrong with RDBMS
Reliable and we understand it
Been around for ages
Development and administration tools
Worked great for few decades
What’s wrong with RDBMS
Requires expensive server hardware
Doesn’t scale
Limited fault tolerance
Can’t handle big amounts of data
Architectural limitations
What’s wrong with RDBMS
Master
Slave Slave
What’s wrong with RDBMS
Master
Slave Slave
●Network split
●Hardware failure
●Latency
What’s wrong with RDBMS
Master
Master Slave
What’s wrong with RDBMS
Master
Master Slave
? ?
What we need today
90% of world's data generated over last two years
Data impacts business
Systems that can learn and adopt
Personalized experience
We store everything
What we need today
AvailabilityScalability
Fault tolerancePerformance
ACID vs BASE
AtomicConsistent
IsolatedDurable
Basically AvailableSoft state
Eventually Consistent
Cassandra - introduction
Row partitioned storage
Fast, scalable and fault tolerant
Share nothing masterless architecture
Active everywhere design
Native multi-datacenter support
Cassandra - architecture
Cassandra - architecture
Client contact
Cassandra - architecture
Client request
Cassandra - architecture
Client
response
Cassandra - architecture
DC1 DC2
Cluster
Cassandra - architecture
DC1 DC2
Cluster
Cassandra - data layout
Partition key K:V K:VPartition
Cells
Partition keyK:V K:V
Partition
Cells
Clustering key
Cassandra - use casesIoT applications
Product catalogs and retail appsActivity trackingFraud detection
MessagingAnalytics and recommendation engines
...
Cassandra - benefits
Reliable storage
High performance
Easy scaling on commodity hardware
Solves problems by design
But at what cost?
An arm and a
leg ?
Learning processCQL looks like SQL
CREATE TABLE songs ( id uuid PRIMARY KEY, title text, album text, artist text, data blob);
CREATE TABLE playlists ( id uuid, song_order int, song_id uuid, title text, album text, artist text, PRIMARY KEY (id, song_order ));
Learning processCQL looks like SQL
CREATE TABLE songs ( id uuid PRIMARY KEY, title text, album text, artist text, data blob);
CREATE TABLE playlists ( id uuid, song_order int, song_id uuid, title text, album text, artist text, PRIMARY KEY (id, song_order ));
BUT IT’S NOT!!!
Learning process
I’ll create a data model based on my relational data model(especially while migrating)
Learning process
I’ll create a data model based on my relational data model
(especially while migrating)
WRONG!!!
Learning process
I can read from database what I just written
(write read antipattern)
Learning process
I can read from database what I just written
(write read antipattern)
WRONG!!!
Learning process
I’ll read from database to calculate what I write
(read write antipattern)
Learning process
I’ll read from database to calculate what I write
(read write antipattern)
WRONG AGAIN!!!
Learning process
Secondary indexes are your friend
(at least Mongo didn’t mind)
Learning process
Secondary indexes are your friend
(at least Mongo didn’t mind)
WRONG!!!
Migrating from RDBMSUnderstand how data is queried
Conceptual model is reusable
Run in parallel (leverage MQ)
Start developing with 3 nodes
Leverage parallel execution
You cannot beat laws of physics
MongoDB to Cassandra
MongoDB uses “relational” model
MongoDB is more flexible for R&D
Don’t measure performance of single nodes
Don’t use secondary indexes
Don’t use MongoDB
When not to use Cassandra
When you don’t need scalability
When you have a lot of updates
When you need query flexibility
When you don’t know what you need
Action points
Data modeling is query based
Understand physical data layout
Respect eventual consistency
Have fun
Thank you
Matija Gobec@mad_max0204
matija.gobec@smartcat.io
SmartCatwww.smartcat.io@SmartCat_io
https://github.com/smartcat-labs
top related