Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 1/23 Adap?ve Provisioning of Stream Processing Systems in the Cloud Javier Cerviño #1 , Eva Kalyvianaki *2 , Joaquín Salvachúa #3 , Peter Pietzuch *4 # Universidad Politécnica de Madrid, * Imperial College London 1 [email protected], 2 [email protected]3 [email protected], 4 [email protected]SMDB 2012
25
Embed
Adapative Provisioning of Stream Processing Systems in the Cloud
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 1/23
Adap?ve Provisioning of Stream Processing Systems in the Cloud
Javier Cerviño#1, Eva Kalyvianaki*2,
Joaquín Salvachúa#3, Peter Pietzuch*4
# Universidad Politécnica de Madrid, * Imperial College London
– Amazon EC2 • Public cloud provider • Infrastructure as a Service • Images and Virtual Machines
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 4/23
Related work
• Cloud Stream Processing [Kleiminger et al, SMDB’11]
• Cloud network performance – Cloud and Internet paths support streaming data into cloud DCs?
[Barker et al, MMSys’07], [Wang et al, INFOCOM’10], [Jackson et al, CLOUDCOM’10]
• Cloud computa?on performance – Best effort VMs support low-‐latency, low-‐jiier and high-‐throughput stream
processing? [Barker et al, MMSys’07]
– Computa?onal power of Amazon EC2 VMs for standard stream processes tasks? [Diirich et al, VLDB’10],
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 5/23
Contribu?ons
• Explore the suitability of cloud infrastructures for stream processing, (case study on Amazon EC2) – Measure network and processing latencies, jiier and throughput
• An adap?ve algorithm to allocate cloud resources on-‐demand – Resizes the number of VMs in a DSPS deployment
• Algorithm evalua?on – Deploying the algorithm as part of a DSPS on Amazon EC2
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 6/23
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 15/23
Adap?ve Cloud Stream Processing
• Elas?c stream processing system to scale the number of VMs to input stream rates
• Goals – Low-‐latency with a given throughput – Keep VMs opera?ng to their maximum processing capacity
• Workload is par??oned and balanced across mul?ple VMs • Many VMs available to scale up and down to workload demands • Collector gathers results from engines and process addi?onal queries
source 1
source 2
VM
engine
collector
Sub-‐query 1 Sub-‐query 2 Stream source
VM
engine
VM
engine
VM
engine
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 16/23
Adap?ve Cloud Stream Processing Algorithm I
VM
Esper Proc. Rate
VM
Esper Proc. Rate
VM
Esper Proc. Rate
Σ Tuple submiier
Input Rate -‐ Proc
Rate
N virtual machines
/
Extra Rate
Average Rate
• Gathering and calcula6on – Gathers processing rates from VMs – Obtains
• Total extra processing rate (Extra rate) • Average processing rate per VM (Average rate)
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 17/23
Extra Rate
Average Rate
Adap?ve Cloud Stream Processing Algorithm II
Extra Rate > 0 ?
No
Σ
/
Yes
N
Store Average Rate
/ Input Rate
Return
• Decision stage – Calculates new number of machines (N’) – Scale up
• Stores the average rate as maximum average rate – Scale down
• Uses last maximum average rate
N’
scale up
scale down
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 18/23
Fig. 5. Increase in throughput with different instance sizes on Amazon EC2(Different shades/colours correspond to different VMs.)
Fig. 6. Elastic DSPS with query partitioning across processing VMs
ture itself does not increase end-to-end latency significantly.Therefore, it is preferable to deploy stream processing enginesat cloud sites within close network proximity to sources.
Second, it is important to consider that jitter suffers fromhigh outliers that can be orders of magnitude above the aver-age. Typically systems compensate for jitter through bufferingor discarding of late-arriving data. In our experiments, dis-carding delayed data items would have resulted in a smallpercentage of lost data (approx. 3%).
In summary, when deploying a DSPS in a public cloud,it is necessary to understand the trade-offs when scaling todifferent numbers of VMs. A challenging issue is to decideon the right number of VMs and their instance types to supporta given stream processing workload. After deployment, it isnecessary to monitor the performance of processing VMs, andif they show decreasing throughput, to scale out to more VMs.
IV. ADAPTIVE CLOUD STREAM PROCESSING
We now present an adaptive algorithm to scale the numberof VMs required to deploy a DSPS in the cloud. Our goalis to build an elastic stream processing system that resizesthe number of VMs in response to input streams rates. Thegoal is to maintain low latency with a given throughput, whilekeeping VMs operating to their maximum processing capacity.We assume that a workload can be partitioned among multipleVMs, balancing streams equally across them. We also assumethat there are always sufficiently many VMs available to scaleup to workload demands.
As shown in Fig. 6, we assume that the DSPS executesa query, which can be decomposed across multiple VMs bysplitting the query into sub-queries, each processing a sub-stream on a given engine. The input stream can be equally
Algorithm 1 Adaptive provisioning of a cloud-based DSPSRequire: totalInRate , N , maxRatePerVM
Ensure: N
0 s.t. projRatePerVM ⇤N
0 = totalInRate
1: expRatePerVM = btotalInRate/Nc
2: totalExtraRateForVMs = 0; totalProcRate = 03: for all deployed VMs do
partitioned into sub-streams. For example, queries that com-pute aggregate and topK functions are naturally decomposablein this fashion. The results from sub-queries are then sent toa collector that merges them by executing another sub-query,emitting the overall query result. We further assume that loadshedding is employed by the DSPS in overloaded conditionsto sustain low-latency processing.
Our proposed provisioning algorithm uses a black-box ap-proach, i.e. it is independent of the specifics of queries runningin the DSPS. It scales the number of VMs used solely basedon measurements of input stream rates. It detects an overloadcondition when a decrease in the processing rate of inputdata occurs because of discarded data tuples due to load-shedding. The algorithm is invoked periodically and calculatesthe new number of VMs that are needed to support the currentworkload demand. This number can be larger than (when thesystem is overloaded and requires more resources), smallerthan (when the system has spare capacity) or equal to thecurrent number of engines. The aim is to maintain the requirednumber of VMs, operating almost at their maximum capacity.
A. Algorithm
We present the provisioning algorithm more formally inAlg. 1. The algorithm takes as input the aggregate rate of theinput stream, totalInRate , and the number of VMs currentlyused by the DSPS, N . It also takes maxRatePerVM , which isthe maximum rate that a single VM can process, from previousinvocations based on measurements in overload conditions.The algorithm takes a conservative approach, in which itgradually increases the number of VMs to reach the requiredset for sustainable processing. The output of the algorithm isthe number of VMs, N 0, that is needed to sustain totalInRate .In this case, totalInRate is divided equally among VMs andeach handles projRatePerVM .
The algorithm initially estimates the stream rate each VM
Javier Cerviño, Eva Kalyvianaki, Joaquín Salvachúa, Peter Pietzuch Adap?ve Provisioning of Stream Processing Systems in the Cloud 25/23