Predicting Replicated Database Scalability Sameh Elnikety, Microsoft Research Steven Dropsho, Google Inc. Emmanuel Cecchet, Univ. of Mass. Willy Zwaenepoel, EPFL
Feb 14, 2016
Predicting Replicated Database Scalability
Sameh Elnikety, Microsoft Research Steven Dropsho, Google Inc.Emmanuel Cecchet, Univ. of Mass.Willy Zwaenepoel, EPFL
• Environment– E-commerce website– DB throughput is 500 tps
• Is 5000 tps achievable?– Yes: use 10 replicas– Yes: use 16 replicas – No: faster machines needed
• How tx workload scales on replicated db?
Motivation
SingleDBMS
2
Multi-Master Single-Master
Replica 2
Replica 1
Replica 3
3
Slave 1
Master
Slave 2
Background: Multi-Master
Replica 2
Replica 1
Replica 3
StandaloneDBMS
Load Balancer
4
Read Tx
Replica 2
Replica 1
Replica 3
Load Balancer
T
5
Read tx does not change DB state
Update Tx
Replica 2
Replica 1
Replica 3
CertLoad
BalancerT
ws wswsws
6
Update tx changesDB state
Additional Replica
Replica 2
Replica 1
Replica 3
Load Balancer T ws
Replica 3
7
Replica 4
Certwsws
• Standalone DBMS– Service demands
• Multi-master system– Service demands– Queuing model
• Experimental validation
Coming Up …
8
• Required– readonly tx: R – update tx: W
• Transaction load– readonly tx: R – update tx: W / (1 - A1)
Standalone DBMSSingleDBMS
Abort probability is A1 Submit W / (1 - A1) update tx
Commited tx: WAborted tx: W ∙ A1 / (1- A1) 9
Standalone DBMSSingleDBMS
1
(1)(1 )
WLoad R rc wc
A
10
• Required– readonly tx: R – update tx: W
• Transaction load– readonly tx: R – update tx: W / (1 - A1)
Service Demand
1
(1)(1 )
WLoad R rc wc
A
1
(1)(1 )
PwD Pr rc wc
A
11
• Required (whole system of N replicas)– Readonly tx: N ∙ R – Update tx: N ∙ W
• Transaction load per replica– Readonly tx: R – Update tx: W / (1 - AN) – Writeset: W ∙ (N - 1)
Multi-Master with N Replicas
( 1)(1 )
( )N
MMW
R rc wc W N wsA
Load N
12
MM Service Demand
( 1)(1 )
( )N
MMW
R rc wc W N wsA
Load N
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
13Explosive cost!
Compare: Standalone vs MM
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
Explosive cost!
1
(1)(1 )
PwD Pr rc wc
A
14
• Standalone:
• Multi-Master:
Readonly Workload
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
Explosive cost!
1
(1)(1 )
PwD Pr rc wc
A
15
• Standalone:
• Multi-Master:
Update Workload
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
Explosive cost!
1
(1)(1 )
PwD Pr rc wc
A
16
• Standalone:
• Multi-Master:
Closed-Loop Queuing Model
Replica i
LB
LB
LB
...CPU
Disk
TT
TT
TT
Cert
Cert
Cert
Think time
Load balancer
& network
delay
Certifier delay
Pw...
...
N replicas
17
• Standard algorithm• Iterates over the number of clients• Inputs:
– Number of clients– Service demand at service centers– Delay time at delay centers
• Outputs:– Response time– Throughput
Mean Value Analysis (MVA)
18
Using the Model
Replica i
LB
LB
LB
...CPU
Disk
TT
TT
TT
Cert
Cert
Cert
Think time
Load balancer
& network
delay
Certifier delay
Pw...
...
N replicas
19
• Copy of database• Log all txs, (Pr : Pw)• Python script replays txs
– Readonly (rc)– Updates (wc)
• Writesets– Instrument db with triggers– Play txs to log writesets– Play writesets (ws)
Standalone Profiling (Offline)
20
MM Service Demand
( )(1 )
1)N
MMPw
N Pr rc wc Pw wsA
D N
21Explosive cost!
Abort Probability
( )
(1)1(1 ) (1 )
CW N
LN
NA A
• Predicting abort probability is hard• Single-master
– No prediction needed – Measure offline on master
• Multi-master– Approximate using
– Sensitivity analysis in the paper
22
Using the Model
Replica i
LB
LB
LB
...CPU
Disk
TT
TT
TT
Cert
Cert
Cert
Think time
Load balancer
& network
delay
Certifier delay
Pw...
...
N replicas
# clients, think time
1.5 ∙ fsync()
1 ms
23
• Compare– Measured performance vs model predictions
• Environment– Linux cluster running PostgreSQL
• TPC-W workload– Browsing (5% update txs)– Shopping (20% update txs)– Ordering (50% update txs)
• RUBiS workload– Browsing (0% update txs)– Bidding (20% update txs)
Experimental Validation
24
Multi-Master TPC-W Performance Throughput Response time
25
26
Browsing, 5% u
15.7 X
Ordering, 50% u6.7 X15%
Multi-Master RUBiS Performance Throughput Response time
27
28
Browsing, 0% u
16 X
bidding, 20% u
3.4 X
• Database system– Snapshot isolation– No hotspots– Low abort rates
• Server system– Scalable server (no thrashing)
• Queuing model & MVA– Exponential distribution for service demands
Model Assumptions
29
• Models– Single-Master– Multi-Master
• Experimental results– TPC-W– RUBiS
• Sensitivity analysis– Abort rates– Certifier delay
Checkout the Paper
30
Urgaonkar, Pacifici, Shenoy, Spreitzer, Tantawi. “An analytical model for multi-tier internet services
and its applications.” Sigmetrics 2005.
Related Work
31
• Derived an analytical model– Predicts workload scalability
• Implemented replicated systems– Multi-master– Single-master
• Experimental validation– TPC-W– RUBiS– Throughput predictions match within 15%
Conclusions
32
• Questions?
Danke Schön!
33
Predicting Replicated Database Scalability