Towards Multi-Tenant Performance SLOs Willis Lang*, Srinath Shankar + , Jignesh M. Patel*, Ajay Kalhan ^ *University of Wisconsin-Madison + Microsoft Gray Systems Lab ^ Microsoft Corp. To appear in ICDE 2012 1
Jan 03, 2016
1
Towards Multi-Tenant Performance SLOsWillis Lang*, Srinath Shankar+, Jignesh M. Patel*, Ajay Kalhan^
*University of Wisconsin-Madison+Microsoft Gray Systems Lab^Microsoft Corp.
To appear in ICDE 2012
2Overall Operating Costs of Providing Cloud Services are High
Dominating costs are server and power costs: 57% and 31% respectively
Monthly Cost of 46,000 Server Data Center[Hamilton, 2011]
Servers $1,852,778
Networking$260,039
Power$1,007,651
Infrastructure$130,019
Server & Power 88%
Networking8%Infrastructure
4%
3Performance Service Level Objectives and Managing Cloud Costs
Tenants can get their own server and high performance
Tenants have performance objectives
Consolidate tenants onto the fewest number of servers (maximize the degree of multi-tenancy) while maintaining perf objectives
Per
form
ance
per
Ten
ant
Data Center Costs
4
An Optimization Problem
Find:(1) Tenant Scheduling Policies and (2) Hardware Provisioning Policies
Such that costs are minimized and performance is delivered
Given: Groups of tenants with different performance objectives and a number of server configurations
High Perf Low Perf
5
Multi-Tenant Scheduling
Perf Objective – TPC-C throughput H tenants– 100tps L tenants– 10tps
Want to maximize degree of multi-tenancy without breaking SLO
What if we also have different server types available?
H Tenants L Tenants
#H: #L:1 15 2020 40
Avg H Perf Avg L Perf
tps ea. tps ea.2000 2000900 110130 30
6
Hardware Setup 2 x Intel Nehalem L5630 32GB DDR3 RAID battery-backed cache 1 x 10k RPM SAS – OS/software+
“diskC” - $4000 ($111 per month)Data: 2 x 10k RPM SAS
300GBLog: 1 x 10k RPM SAS 300GB
“ssdC” - $4500 ($125 per month)Data: 2 x Crucial C300 256GBLog: 1 x Crucial C300 256GB
7
Software Setup SQL Server 2012
All tenants of the ‘H’ performance class get an individual database within a SQL Server instance
Databases in SQL Server have their own physical files for data and log
All tenants of the ‘L’ performance class get an individual database within a different SQL Server instance
SQL Server instance memory provisioning to control performance (not VM)
8
Benchmark server to find max degree multi-tenancy for perf objectives
Systematically reduce ‘H’ tenants, steadily increase ‘L’ tenant scheduling until a perf objective fails
Server characterizing function:
Both perf objectives met
Some perf objective fails
Heterogeneous SLO Characterization
diskC ssdC
0 10 20 30 40 50 60 70 80 901000
5
10
15
20
25
Number of L (10tps) Tenants
Nu
m o
f H
(10
0tp
s) T
enan
ts
0 10 20 30 40 50 60 70 80 901000
5
10
15
20
25
Number of L (10tps) Tenants
Nu
m o
f H
(10
0tp
s) T
enan
ts
0 10 20 30 40 50 60 70 80 901000
5
10
15
20
25
Number of L (10tps) Tenants
Nu
m o
f H
(10
0tp
s) T
enan
ts
9
Applying Our Optimization Framework
0 20 40 60 80 100 120 1400
10
20
30
40ssdC diskC
Number of L (10tps) Tenants
Nu
mb
er H
(10
0tp
s) T
enan
ts
Scenario: 10,000 tenants, 2,000x100tps & 8,000x10tps
Optimal Solution: 94 ssdC servers, 38 10tps tenants and 20 100tps tenants + 5 diskC servers, 25 10tps tenants and 20 100tps tenants + 43 ssdC servers, 100 10tps tenants
38
10
Applying Our Optimization Framework
Optimal Only diskC Tenant Segregated
$0
$5,000
$10,000
$15,000
$20,000
$25,000
$30,000M
on
thly
Se
rve
r C
os
ts
ssdC – 100tps tenantsdiskC – 10tps tenants
11
SummaryWe have presented an optimization framework that tells a Database-as-a-Service provider how to provide performance Service Level Objectives while minimizing cluster infrastructure costs
12
An optimization framework to determine the optimal tenant scheduling and server provisioning in light of tenant performance goals [ICDE 2012]
Complex parallel analytic workloads cause non-linear speedup and force low-power server clusters to be much larger and more expensive than traditional clusters[DaMoN 2010 Best Paper]Parallel data processing bottlenecks such as network bandwith and algorithmic choices are a cause of energy inefficiency [Under Submission]
Computational complexity of MR jobs affects the ability to save energy by using smaller clusters [VLDB 2010] By exploiting existing replication schemes, an elegant relationship between load balancing and energy efficiency can be exploited [SIGMOD Record 2009]
Demonstrated that it is possible to decrease energy and performance in a controlled way using hardware mechanisms (e.g., CPU frequency/voltage and memory parking) and algorithmic choices [CIDR 2009, IEEE DEB 2011]
Thesis Research
Cluster Design, Performance in
the Cloud
Low-Power Server
Hardware
Cluster-level Performance and Energy
Consumption
Node-local Performance and Energy
Consumption
Characterizing Performance vs
Energy and Server Costs
CIDR 09, IEEE DEB 11VLDB 10, SIGMOD Rec 09
DaMoN 10, Under Submission
ICDE 12
Per
form
ance
Data Center Costs
13
AcknowledgementsSpecial thanks to David
DeWitt, Jeff Naughton, Alan Halverson, Eric Robinson, Rimma Nehme, Dimitris Tsirogiannis, Nikhil Teletia, Chris Ré
Funded by a grant from Microsoft Gray Systems Lab
Cluster Design, Performance in
the Cloud
Low-Power Server
Hardware
Cluster-level Performance and Energy
Consumption
Node-local Performance and Energy
Consumption
Characterizing Performance vs
Energy and Server Costs
CIDR 09, IEEE DEB 11VLDB 10, SIGMOD Rec 09
DaMoN 10, Under Submission
ICDE 12
14
15
16
Memory-based resource governor E.g., 2 performance goals, 100tps and 10tps 20 tenants pay for 100tps and 30 tenants pay for 10tps
The aggregate memory for all 100tps tenants:
Similarly, for 10tps tenants:
17
Simplicity vs Cost
Methods ssdC SKU diskC SKU
Optimal Hetero SLO Hetero SLO
ssdC-only Hetero SLO NA
diskC-only NA Hetero SLO
ssdC-H High-perf Low-perf
ssdC-L Low-perf High-perf
20% 100tps, 80% 10tps
50% 100tps, 50% 10tps
80% 100tps, 20% 10tps
0.0
0.5
1.0
1.5
2.0
Re
l. C
os
t
20% 100tps, 80% 10tps
50% 100tps, 50% 10tps
80% 100tps, 20% 10tps
0.0
0.5
1.0
1.5
2.0
Re
l. C
os
t
diskC cost -10% vs ssdCNone of these heuristic methods consistently provides solutions near to the optimal method.
diskC cost -30% vs ssdC
18
Log Disk Bottlenecks
200/0175/1150/1125/1100/175/10
2
4
6
8
10
12
14
020406080100120140160180
<# 1tps tnt>/<# 100tps tnt>
Ave
rag
e L
og
Wri
te W
ait
Tim
e (m
s)
TP
S A
chie
ved
by
On
e 10
0tp
s T
enan
t