Introduction System Design Performance Conclusions MoSQL: An Elastic Storage Engine For MySQL Alexander Tomic, Daniele Sciascia, Fernando Pedone University of Lugano, Switzerland March 20, 2013 ACM SAC 2013 - Dependable and Distributed Systems Track 1/17
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IntroductionSystem Design
PerformanceConclusions
MoSQL: An Elastic Storage Engine For MySQL
Alexander Tomic, Daniele Sciascia, Fernando Pedone
University of Lugano, Switzerland
March 20, 2013
ACM SAC 2013 - Dependable and Distributed Systems Track
1/17
IntroductionSystem Design
PerformanceConclusions
1 Introduction
2 System DesignMySQL ServersStorage NodesCertifier
3 PerformanceTPC-C
4 ConclusionsFuture WorkAppendix: Similar Offerings to MoSQLAppendix: B+Tree Details
2/17
IntroductionSystem Design
PerformanceConclusions
MySQL is a popular open-source RDBMS at the core ofmany web-based applications (part of “LAMP” stack)
Typical approaches to scaling MySQL in the wild (e.g.sharding, asynchronous replication) provide weakguarantees and are inflexible1
Elasticity highly desirable in a cyclical world whereover-provisioning and energy costs are significant
Strong guarantees (serializability) make development mucheasier
1Though since original master’s thesis in Sept 2011 some commercialofferings have attempted to remedy this. Details in appendix
3/17
IntroductionSystem Design
PerformanceConclusions
What do we define as “elastic”?
Add/remove servers to/from a running system
Ideally little performance impact
Get Good Things like higher throughput, reduced latency,increased system capacity
4/17
IntroductionSystem Design
PerformanceConclusions
What do we define as “elastic”?
Add/remove servers to/from a running system
Ideally little performance impact
Get Good Things like higher throughput, reduced latency,increased system capacity
4/17
IntroductionSystem Design
PerformanceConclusions
SQL (90’s) -> NoSQL (00’s) -> NewSQL (10’s)
SQL transactions are great, but legacy RDBMS architecturestoo slow and inflexible
“NoSQL” systems of various flavours attempted to fill thevoid (Dynamo, BigTable, etc.), but pushed significantcomplexity up to app. developers
Re-emergence of (semi-)relational model in contemporarysystems such as Spanner and Megastore (Google)
Ultimately, no panacea but the usual game of tradeoffs
5/17
IntroductionSystem Design
PerformanceConclusions
MySQL ServersStorage NodesCertifier
Three Layer Architecture of MoSQL
6/17
IntroductionSystem Design
PerformanceConclusions
MySQL ServersStorage NodesCertifier
MySQL Servers
MySQL has a storage engineinterface enabling differentstorage strategies to beimplemented
Serves as a translator from SQL-> our storage layer API
Multiple MySQL “servers” canbe connected arbitrarily tostorage nodes
7/17
IntroductionSystem Design
PerformanceConclusions
MySQL ServersStorage NodesCertifier
Storage Nodes
Multi-version, indexedkey-value storage layer
Keys distributed among nodesusing consistent hashing
Keys can be cached; storagenodes can be started ascache-only
8/17
IntroductionSystem Design
PerformanceConclusions
MySQL ServersStorage NodesCertifier
Certifier
Checks whether entries read bycommitting update tx areup-to-date at time of commit
Propagates new entries createdby committing tx to nodes
Read-only tx do not requirecertification; updates proceedoptimistically
9/17
IntroductionSystem Design
PerformanceConclusions
TPC-C
Performance
10/17
IntroductionSystem Design
PerformanceConclusions
TPC-C
Experimental Configuration for n-node MoSQL
11/17
IntroductionSystem Design
PerformanceConclusions
TPC-C
TPC-C Throughput vs. InnoDB
Increasing cost of using disk:
0K
10K
20K
30K
40K
50K
60K
70K
80K
10 20 40 80 160 10 20 40 80 160
Thro
ughp
ut (T
pmC)
Number of warehouses (10 warehouses per node in MoSQL)
MoSQLMySQL (InnoDB)
Ideal
12/17
IntroductionSystem Design
PerformanceConclusions
TPC-C
TPC-C Latency
Large stock-level transactions read from many nodes:
0
0.1
0.2
0.3
0.4
0.5
0.6
0 2 4 6 8 10 12 14 16
Late
ncy (
s)
Number of nodes (10 warehouses per node)
DeliveryNew Order
Order Status
PaymentStock Level
13/17
IntroductionSystem Design
PerformanceConclusions
TPC-C
Remote reads and N-O Thruput for 4 and 8 Nodes
From a cold start, inner B+Tree nodes must be cached
0K5K
10K15K20K25K30K35K40K
Tpm
C
0K10K20K30K40K50K60K70K80K
0 50 100 150 200 250 300 350 400 450 500 550
Rem
ote
read
requ
ests
Time (sec)
4 nodes8 nodes
14/17
IntroductionSystem Design
PerformanceConclusions
TPC-C
Adding Two Storage Nodes Online
60 WH, add 8 clients every 12 seconds, add volatile storagenodes at t = 72, 108
Future WorkAppendix: Similar Offerings to MoSQLAppendix: B+Tree Details
Appendix
17/17
IntroductionSystem Design
PerformanceConclusions
Future WorkAppendix: Similar Offerings to MoSQLAppendix: B+Tree Details
Related Work
ElasTraS (UCSB): Elastic data store providing transactionalmulti-key access to data
ecStore (NU Singapore): peer-to-peer elastic storage withrange-query and tx support; neither ecStore nor ElasTraSsupport full SQL transactions
Spanner (Google): Semi-relational model with wide-area tx,but depends on specialized hardware providingglobally-meaningful timestamps
Megastore (Google): Semi-relational wide-area tx but withlow latency within small partitions; 2PC used forcross-partition tx
18/17
IntroductionSystem Design
PerformanceConclusions
Future WorkAppendix: Similar Offerings to MoSQLAppendix: B+Tree Details
MySQL Specific
GenieDB: A storage engine for MySQL with a geo-replicatedstorage layer. Does not appear to offer elasticity.
Xeround: A cloud database service for MySQL applicationspromising elastic storage for MySQL. ACID-compliance isprovided through a quorum-based approach based on aquick look at the patent and whitepaper they have availablefor download.
Parelastic: Claim many of the features that MoSQL providesincluding elasticity. I would have to register in order to getthe whitepaper, but looking at the patent they have received,it looks superficially like some kind of middleware approachnot unlike Sprint.
19/17
IntroductionSystem Design
PerformanceConclusions
Future WorkAppendix: Similar Offerings to MoSQLAppendix: B+Tree Details
MySQL “Compatibile”
Clustrix: Shared-nothing system claiming MySQLcompatibility and acid-compliance. Engine written frombottom up to be distributed, using push-down of compiledquery fragments to individual nodes, enabling apparentlybetter concurrency.
Scalebase: Another example of Sprint-like middleware thatresides between the application and “demoted” RDBMSnodes and manage transactions and the distribution of dataacross nodes.
Intalio: Claims elastic scalability and compatibilty with anumber of different RDBMS systems, so it would appear tobe some sort of Sprint-like middleware, but details are a bitscarce.
20/17
IntroductionSystem Design
PerformanceConclusions
Future WorkAppendix: Similar Offerings to MoSQLAppendix: B+Tree Details
B+Tree and Row Data
Boxes a) - i) are key-values.
100 120 /
100 105 120 12595 / / /
95<raw data>
100<raw data>
105<raw data>
120<raw data>
125<raw data>
(a)
(b) (c) (d)
(e) (f) (g) (h) (i)
21/17
IntroductionSystem Design
PerformanceConclusions
Future WorkAppendix: Similar Offerings to MoSQLAppendix: B+Tree Details
Some Unnecessary Aborts
Consider concurrent tx:t1 = INSERT .. (60) and t2 = INSERT .. (130).Writesets of t1, t2 are (a), and (a,d), so t1 will be aborted ifcertified after t2.