Performance Variability of Production Cloud Services by Allen Metcalfe, Josh Mercer, Aaron Wagner, and Spence Southard
Performance Variability of
Production Cloud
Services
by
Allen Metcalfe, Josh Mercer, Aaron Wagner,
and Spence Southard
Discussion
1. Background
2. Problem
3. Cloud Services
4. Quick Statistics Review
5. Benchmarking Results
6. Critical Analysis
7. Areas of Future Research
8. Summary
Background: Cloud Computing
● Owned, operated, and maintained by an
independent vendor.
● Services usually sold as: o Infrastructure as a Service
o Platform as a Service
o Software as a Service
● Deployed as many VMs operating on one
physical machine.
Goals of Cloud Computing
● Performance
● Cost
● Flexibility/Scalability
Background Scenario
A small mobile app development company is
expanding their application by harnessing the
additional compute and storage potential of
traditional computers where smart devices fall
short.
Two choices:
1. Maintain dedicated servers
2. Obtain servers though a cloud vendor
Problem: Performance Variability
● Dependability: o Machine downtime
● I/O Sharing o Contention for disk writes
● Performance Stability o Overutilization of resources
HealthCare.gov (Still loading)
Problem Analysis
How does variance impact performance?
How do we define what is an acceptable level
of performance variance?
What trends and seasonal factors may
contribute to large scale variance?
Research Overview
● At the time of the study there existed no
other investigation into cloud performance
variance.
● Study the long-term variability of
performance for production cloud systems o Amazon Web Services
o Google App Engine
Google App Engine (GAE)
● Python, Java, PHP, Go
● Used by:
Amazon Web Services (AWS)
● Python, Java, PHP, Ruby, .NET
● Used by:
Cloud Providers & Services Tested
Google App Engine
● Python runtime environment
● Datastore storage
● Memcache stores data queries
● URL fetch issues http/https requests
Amazon
● EC2 virtual machines
● S3 storage
● SDB database
● SQS message queue processing
● FPS payment processing
All tests run using CloudStatus.com
Google App Engine - Test
Parameters
● Google Run Service o Calculate Fibonacci
● Google Datastore Service o Create Time
o Read Time
o Delete Time
● Google Memcache Service o Get Time
o Put Time
o Response Time
● Google URL Fetch Service o Response Time
api.facebook.com, api.hi5.com, api.myspace.com, ebay.com,
s3.amazonaws.com, and paypal.com.
Amazon Web Services - Test
Parameters
● Amazon EC2
o Deployment Latency
● Amazon S3
o Get Throughput
o Put Throughput
● Amazon SDB
o Query Response Time
o Update Latency
● Amazon SQS
o Average Lag Time
● Amazon FPS
o Response Time
Statistics Review: Quartiles
Quartiles: ● Q1 # of measurements
from the mean to -25%
● Q3 # of measurements
from the mean to +25%
● iQR (inter quartile range)
distance Q1-Q3
● Q2 Or the “median” is
# of measurements evenly
between Q1 - Q3
Statistics Review: Measuring Variability
Two Qualitative Measures of Variability:
Because there aren’t Industry Measures
1. If the Mean deviates from the Median (value)
by more than 10% of the iQR (value range)
This implies values scale towards one end of the
range.
1. If the Median (population) is less than half
of the iQR (population) IE. Q2<0.5*iQR This implies most measurements are highly variable.
EC2 Deployment Time
Amazon Get EU HI Hourly
Amazon Get EU HI Monthly
Amazon Get US HI Monthly
Amazon SDB Update Time
Amazon SQS Query Time
Amazon FPS payment Time
Google Python Run
Google Datastore
Gooogle Memcache
Google URL Fetch
Performance penalty scenarios
● Researchers generated hypothetical models
for use cases comparing performance vs.
traditional parallel processing environments.
● Simulations are based on the data from
previous graphs and show a penalty factor
versus traditional environments for the
measured user load.
Amazon FPS penalty for payment
processing
Amazon SDB for Social Games
Google datastore for Social Games
Critical Analysis
● This kind of research should have been
repeated for each year.
● Authors conclusion is limited to variance due
to high user load.
● Vendors do not publish their increases in
capability, making inferences difficult.
Summary
● Cloud services may incur high performance
variability due to: o System size
o Workload variability
o Virtualization overhead
o Resource-time Sharing
● Seasonal cloud variance is present,
exhibiting yearly and daily patterns
● Performance variability varies greatly across
application types
Future Research
● Repeated research to verify seasonal
variance
● Techniques for minimizing variance
● Dynamic cloud resizing
Any Questions
Slides available at:
http://bit.ly/1vdpM6B