San Jose State University San Jose State University SJSU ScholarWorks SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-16-2019 Predictive Analysis for Cloud Infrastructure Metrics Predictive Analysis for Cloud Infrastructure Metrics Paridhi Agrawal San Jose State University Follow this and additional works at: https://scholarworks.sjsu.edu/etd_projects Part of the Artiο¬cial Intelligence and Robotics Commons, Databases and Information Systems Commons, and the OS and Networks Commons Recommended Citation Recommended Citation Agrawal, Paridhi, "Predictive Analysis for Cloud Infrastructure Metrics" (2019). Master's Projects. 672. DOI: https://doi.org/10.31979/etd.pyt6-p9j5 https://scholarworks.sjsu.edu/etd_projects/672 This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected].
67
Embed
Predictive Analysis for Cloud Infrastructure Metrics
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
San Jose State University San Jose State University
SJSU ScholarWorks SJSU ScholarWorks
Master's Projects Master's Theses and Graduate Research
Spring 5-16-2019
Predictive Analysis for Cloud Infrastructure Metrics Predictive Analysis for Cloud Infrastructure Metrics
Paridhi Agrawal San Jose State University
Follow this and additional works at: https://scholarworks.sjsu.edu/etd_projects
Part of the Artificial Intelligence and Robotics Commons, Databases and Information Systems
This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected].
If RMSE is zero, then there are no errors in the predictions.
In this research, Python libraries under sklearn.metrics were used to examine error
metrics for the experiments.
In the next chapter, details of implementation methodology is discussed with an
overview of the workflow diagram.
23
CHAPTER 4
Overview of the Implementation Design
This chapter provides an overview of the developed experimentation design. The
objective of this research is to perform predictive analysis on cloud infrastructure
metrics, in order to forecast the upcoming application workload or resource demands.
The result of this analysis, can be used by resource provisioning systems to make
in-time provisioning decisions. An overall system level design can be seen in Figure 14.
Figure 14: System level design
4.1 Implementation Methodology
In this section, a detailed overview is presented for implementing the prediction
model for resource provisioning system.
The general steps to implement a forecasting model are:
1. Loading the data : Load the collected dataset into the system. In this research,
real-world workload time-series data for Autodeskβs ACM and OSS service is
used.
2. Data Pre-processing: Data cleaning and pre-processing of the time series
must be done. Some of the pre-processing to be done are: making a uni-variate
24
series, getting the timestamps, and converting them to to correct datatype
needed by the model, etc.
3. Time series analysis: In this step, we analyze the given time series to identify
and rectify trends, seasonality, and noise from the series and form a stationary
time series.
(a) Stationarity test: It is easier to making accurate predictions for a stationary
time-series. Therefore, it is essential to perform stationarity test on the
given time series. If the series fails the stationarity test, then we proceed
to stationarize.
(b) Stationarize the series: In this step we stationarize the series using the
techniques discussed in Chapter 3. The added benefit of this step is that
we can determine the π value (Intergration order), if differencing is used to
stationarize the series.
4. Generate ACF and PACF plots: Plotting ACF and PACF graphs can help
in determining orders for the forecasting model. This step also provides insights
about the time-series.
5. Identify the values for model order, π and π : Using the ACF and PACF
graph, identify if the series is AR or MA and determine the π and π values for
modelling the forecast model.
6. Build the forecast model: Build a time-series forecasting model based on the
parameters calculated in the previous step. This step also involves splitting the
processed data into training and validation set. The model is fitted onto the
training set.
25
7. Forecast using validation set: In this step,we test the forecast model on the
validation set and predict the future values for the time series.
8. Evaluate the results: To verify the model performance, evaluated the predic-
tion results using MAE and RMSE metrics. If the results are bad, then repeat
the process from step 4-8 by hyper-parameter tuning.
The discussed methodology is depicted by a work-flow diagram in Figure 15.
In this research, the forecasting model are implemented using Python
π π‘ππ‘π ππππππ .π‘π π library, which provides a rich set of modules for performing time-series
analysis.
26
Figure 15: Implementation workflow27
CHAPTER 5
Experiments and Results
This chapter covers the experimental setup, details of the experiments performed,
and their results. The purpose of this research was to perform analysis of cloud
infrastructure metrics to understand the following:
β’ Can we accurately predict future workload/resource demands for a cloud hosted
service?
β’ Can predictive analysis on infrastructure metrics help in preventing SLA viola-
tions?
These questions are answered by the experimental results. The techniques used to
perform predictive analysis on the cloud metrics are - ARIMA, MA, and AR.
5.1 Experimental Design
Each cloud infrastructure metric time-series is analyzed and forecasted using the
above mentioned techniques. We limit the analysis to two sets of experiments per
series and display the results. In the first set of experiments, we analyze the time
series for stationarity and forecast the series using forecasting models. In the next set
of experiments, we apply stationarity techniques to the given time series and get the
forecasting results. In each set of experiments, we use RMSE and MAE metrics to
evaluate the prediction results.
Finally, we compare the results of each forecasting technique to understand which
time series model best applies to the given time series.
5.2 Experimental requirements
The requirements to implement the dynamic resource provisioning model are as
follows:
β’ An AWS cloud hosted application.
28
β’ A virtual machine, deployed using Unix operating system.
β’ Load balancing test scripts, to run load test on applicationβs cloud infrastructure.
β’ Cloudwatch to gather metrics data.
β’ Installed Python version 2.7
β’ statsmodels.tsa and sklearn.metrics libraries to implement time-series forecasting
techniques on the collected data.
5.3 Experiments
In this section, we will discuss the predictive analysis performed on the following
infrastructure metrics for ACM and OSS service.
5.3.1 Analysis on ACM CPU-utilization metrics:
The ACM CPU utilization dataset is an hourly time series. Figure 16 is a
visualization of this time series. The ACM series was tested for stationarity using
Figure 16: ACM CPU utilization workload
Rolling statistics plot, ADF and, KPSS techniques. Results for these test are given in
29
Figure 17, Figure 18, and Figure 19.
Figure 17: ACM CPU utilization Rolling statistics plot
Figure 18: ACM CPU utilization ADF test
Both ADF (Fig. 18) and KPSS (Fig. 19) show that the series is stationary as
the test statistic value is smaller than 1 % critical value. Thus, we can say with
99 % confidence that the time series is stationary. As per the Rolling mean test in
Figure 17, it can be seen that there is some trend in the series as the variance is
changing over time. Following experiments were performed to stationarize the series
and forecast the future values.
30
Figure 19: ACM CPU utilization KPSS test
Details of the experiments are given below.
1. Without applying any transformation: Since the series is stationary, it
can be used for forecasting directly. This experiment serves as the baseline for
the predictive analysis.
β’ Analyzing ACF and PACF Plots: The ACF and PACF plot seen in
Figure 20 shows that this series is a pure AR series as it has a slow decaying
Figure 20: ACF and PACF for ACM CPU utilization
ACF and the PACF shuts-off after lag = 2. The shaded part of the ACF
31
and PACF denotes the upper and lower bounds of 95% confidence interval.
Thus, π = 2 and π = 0. Using these model parameters, the forecasting
models were built. The series is split into training and test sets in a 70:30
ratio. Table 1 shows the training and testing data split.
Table 1: Data Split
Data Number of intervals usedTraining Set 672Testing Set 269
β’ Forecasting results:
(a) AR model: The AR model was developed using
π π‘ππ‘π ππππππ .π‘π π.πππππππ Python 2.7 library. The model fitted
on training data is validated by making predictions on test data.
Lag window, or p = 2. Figure 21 shows the results from this model.
Table 2. highlights the model performance against the evaluation
Figure 21: AR model predictions results
metrics.
32
Table 2: Prediction results for AR
Evaluation Metric ValueRMSE 5.838MAE 1.458
(b) MA model: The MA model was developed using π π‘ππ‘π ππππππ .π‘π π
Python 2.7 library. The model fitted on training data is validated by
making predictions on test data. Lag window, or q = 3 was chosen for
the experiment. Figure 22 shows the results from this model.
Figure 22: MA model predictions results
Table 3 highlights the model performance against the evaluation metrics.
Table 3: Prediction results for MA
Evaluation Metric ValueRMSE 3.425MAE 0.866
33
(c) ARIMA model: The ARIMA model was developed using
π π‘ππ‘π ππππππ .π‘π π.πππππ β πππππ Python 2.7 library. The model fit-
ted on training data is validated by making predictions on test data.
Model parameters used were, p = 2, d = 1, q = 0. Figure 23 shows
the results from this model. Table 4 highlights the model performance
Figure 23: ARIMA model predictions results
against the evaluation metrics.
Table 4: Prediction results for ARIMA
Evaluation Metric ValueRMSE 6.822MAE 1.473
All the models were able to forecast the future workload and show satisfactory
results as RMSE, MAE values closer to 0 are considered good prediction results.
De-trending and removing seasonality from that data will help to improve the
results.
34
2. Using Log transform: To reduce the affect of trend and seasonality, log
transform is applied to the time series. The startioanrity test for the resulting
time series are shown in Figure 24 and Figure 25. The data split remains the
same as the previous experiment.
Figure 24: ADF test on log transformed CPU metrics
Figure 25: KPSS test on log transformed CPU metrics
β’ Analyzing ACF and PACF Plots: The ACF and PACF plot seen in
Figure 26. show that this series is a pure AR series as it has a slow decaying
ACF and the PACF shuts-off after lag = 2. Thus, π = 2 and π = 0. Using
these model parameters, the forecasting models were built. The series is
split into training and test sets in a 70:30 ratio. Table 5 shows the training
and testing data split.
35
Figure 26: ACF and PACF for log transformed CPU utilization
Table 5: Data Split
Data Number of intervals usedTraining Set 672Testing Set 269
β’ Forecasting results:
(a) AR model: The AR model was developed using
π π‘ππ‘π ππππππ .π‘π π.πππππππ Python 2.7 library. The model fitted
on training data is validated by making predictions on test data.
Lag window, or p = 2. Figure 27 shows the results from this model.
Table 6 highlights the model performance against the evaluation
metrics.
36
Figure 27: AR model predictions results
Table 6: Prediction results for AR
Evaluation Metric ValueRMSE 0.418MAE 0.211
(b) MA model: The MA model was developed using π π‘ππ‘π ππππππ .π‘π π
Python 2.7 library. The model fitted on training data is validated by
making predictions on test data. Lag window, or q = 5 was chosen for
the experiment. Figure 28 shows the results from this model.
Table 7 highlights the model performance against the evaluation metrics.
Table 7: Prediction results for MA
Evaluation Metric ValueRMSE 0.425MAE 0.235
37
Figure 28: MA model predictions results
(c) ARIMA model: The ARIMA model was developed using
π π‘ππ‘π ππππππ .π‘π π.πππππ β πππππ Python 2.7 library. The model fit-
ted on training data is validated by making predictions on test data.
Model parameters used were, p=2, d=1, q=0. Figure 23 shows the
results from this model. Table 8 highlights the model performance
against the evaluation metrics.
Table 8: Prediction results for ARIMA
Evaluation Metric ValueRMSE 0.415MAE 0.197
The evaluation results for all the models show a tremendous improvement. This
is because log transform stabilizes the time series. This helps the forecasting
models to make more accurate predictions.
38
Figure 29: ARIMA model predictions results
The results from this experiment successfully answer the project goals. It can be
seen from the prediction plots that we can forecast the future resource workload for
any given infrastructure metrics by performing predictive analysis on them by using
the above machine learning techniques.
5.4 Comparison of Forecast models
The results from the model can be compared to see overall, which model performs
the best. Figure 30 shows the RMSE results of ARIMA vs. MA vs. AR for both
the experiments. It can be seen that AR and ARIMA have slight differences but
overall have accurate prediction results. Each of the model performed better after
log transformation. MA model performed better in the baseline experiment as well
compared to the other models. Figure 31 compares the MAE results of these models.
Results remain the same. Overall, ARIMA had the least errors in experiment 2 and
can be used as the machine learning technique to perform predictive analysis on cloud
infrastructure metrics.
39
Figure 30: Comparison of model RMSE results
Figure 31: MAE model predictions results
Since, these models were able to accurately predict the future workload forecast,
this prediction value can be provided to a resource provisioning system to make timely
decision about when to trigger the resource provisioning task. Hence, this can lead to
reduced SLA violations.
Additional experiments are available in Appendix A.
40
CHAPTER 6
Conclusion
The dynamic nature of the cloud poses the biggest challenge in resource man-
agement in cloud computing. Historically, both static and dynamic approaches have
been successful in the cloud industry. Though reactive approaches tend to be used
commonly throughout the industry, they fail to capture the heterogeneity of the
cloud infrastructure [10]. The literature agrees that pro-active resource provisioning
overcomes the problem of over-utilization and guarantees QoS with minimal SLA
violations as they are dynamic. The experiments in this literature extract real world
resource utilization traces, tests for stationarity, stationarize the series, and apply
forecasting techniques to predict resource utilization. These experiments showcase
that machine learning techniques, like time-series forecasting can be leveraged to
develop a pro-active resource provisioning system as they allow us to accurately make
resource demand predictions. Finally, machine learning techniques for time-series
prediction needs to be explored further and adopted industry-wide. Their impact can
be meaningful in this research area as they help to understand the inter-connections
between applications past workload balance and current QoS requirements. Such
a solution will help in balancing SLA violations by the cloud provider and QoS
requirements of a cloud user.
Future work on this research can be to use Supervised learning and Deep-learning
techniques to predict future workload. The Long Short-Term Memory neural network
is designed to interpolate hidden patterns in a long sequence of observations [30].
LSTMs can be used to model time-series data and help in uncovering hidden patterns
in the series. Time-series forecasting methods are unsupervised learning techniques.
In future, supervised learning techniques like SVMs can also be leveraged to model
temporal data and make meaningful predictions.
41
LIST OF REFERENCES
[1] F. P. Miller, A. F. Vandome, and J. McBrewster, Amazon Web Services. AlphaPress, 2010.
[2] H. Fernandez, G. Pierre, and T. Kielmann, ββAutoscaling web applications inheterogeneous cloud infrastructures,ββ in 2014 IEEE International Conference onCloud Engineering, March 2014, pp. 195--204.
[3] C. Wu, Y. Lee, K. Huang, and K. Lai, ββA framework for proactive resourceallocation in iaas clouds,ββ in 2017 International Conference on Applied SystemInnovation (ICASI), May 2017, pp. 496--499.
[4] H. S. Guruprasad and b. B H, ββResource provisioning techniques in cloud com-puting environment: A survey,ββ International Journal of Research in Computerand Communication Technology, vol. 3, pp. 395--401, 03 2014.
[5] N. Roy, A. Dubey, and A. Gokhale, ββEfficient autoscaling in the cloud usingpredictive models for workload forecasting,ββ in 2011 IEEE 4th InternationalConference on Cloud Computing, July 2011, pp. 500--507.
[6] F. J. BaldΓ‘n, S. RamΓrez-Gallego, C. Bergmeir, F. Herrera, and J. M. BenΓtez,ββA forecasting methodology for workload forecasting in cloud systems,ββ IEEETransactions on Cloud Computing, vol. 6, no. 4, pp. 929--941, Oct 2018.
[7] Y. Hu, B. Deng, and F. Pengand, ββAutoscaling prediction models for cloudresource provisioning,ββ in 2016 2nd IEEE International Conference on Computerand Communications (ICCC), Oct 2016, pp. 1364--1369.
[8] W. Iqbal, M. Dailey, and D. Carrera, ββSla-driven adaptive resource managementfor web applications on a heterogeneous compute cloud,ββ in Proceedings ofthe 1st International Conference on Cloud Computing, ser. CloudCom β09.Berlin, Heidelberg: Springer-Verlag, 2009, pp. 243--253. [Online]. Available:http://dx.doi.org/10.1007/978-3-642-10665-1_22
[9] ββ1998 world cup web site access logs.ββ http://ita.ee.lbl.gov/html/contrib/WorldCup.html.
[10] S. Islam, J. Keung, K. Lee, and A. Liu, ββEmpirical prediction modelsfor adaptive resource provisioning in the cloud,ββ Future Gener. Comput.Syst., vol. 28, no. 1, pp. 155--162, Jan. 2012. [Online]. Available:http://dx.doi.org/10.1016/j.future.2011.05.027
[11] C. Vecchiola, R. N. Calheiros, D. Karunamoorthy, and R. Buyya, ββDeadline-driven provisioning of resources for scientific applications in hybrid clouds withaneka,ββ Future Gener. Comput. Syst., vol. 28, no. 1, pp. 58--65, Jan. 2012.[Online]. Available: http://dx.doi.org/10.1016/j.future.2011.05.008
[12] C. Li and L. Y. Li, ββOptimal resource provisioning for cloud computingenvironment,ββ J. Supercomput., vol. 62, no. 2, pp. 989--1022, Nov. 2012. [Online].Available: http://dx.doi.org/10.1007/s11227-012-0775-9
[13] B. Stone, ββInauguration day on twitter,ββ https://blog.twitter.com/official/en_us/a/2009/inauguration-day-on-twitter.html, 2009.
[14] R. Hu, J. Jiang, G. Liu, and L. Wang, ββEfficient resources provisioning based onload forecasting in cloud,ββ The Scientific World Journal, vol. 2014, p. 321231, 022014.
[15] M. Mahjoub, A. Mdhaffar, R. B. Halima, and M. Jmaiel, ββA comparativestudy of the current cloud computing technologies and offers,ββ in Proceedingsof the 2011 First International Symposium on Network Cloud Computing andApplications, ser. NCCA β11. Washington, DC, USA: IEEE Computer Society,2011, pp. 131--134. [Online]. Available: https://doi.org/10.1109/NCCA.2011.28
[16] C. Bunch, V. Arora, N. Chohan, C. Krintz, S. Hegde, and A. Srivastava, ββApluggable autoscaling service for open cloud paas systems,ββ in Proceedingsof the 2012 IEEE/ACM Fifth International Conference on Utility and CloudComputing, ser. UCC β12. Washington, DC, USA: IEEE Computer Society,2012, pp. 191--194. [Online]. Available: http://dx.doi.org/10.1109/UCC.2012.12
[17] G. E. P. Box and G. Jenkins, Time Series Analysis, Forecasting and Control.San Francisco, CA, USA: Holden-Day, Inc., 1990.
[18] P. A. Dinda and D. R. OβHallaron, ββAn evaluation of linear models for hostload prediction,ββ in Proceedings. The Eighth International Symposium on HighPerformance Distributed Computing (Cat. No.99TH8469), Aug 1999, pp. 87--96.
[19] ββAutodesk Inc.ββ www.autodesk.com, (Accessed on 04/26/2019).
[23] Y. Cui, Q. Chen, and J. Yang, ββAutomatic in-vivo evolution of kernel policiesfor better performance,ββ CoRR, vol. abs/1508.06356, 2015.
[24] A. Jain, ββA comprehensive beginnerβs guide to create a time series fore-cast,ββ https://www.analyticsvidhya.com/blog/2016/02/time-series-forecasting-codes-python/, 02 2016, (Accessed on 04/06/2019).
[26] D. Kwiatkowski, P. Phillips, P. Schmidt, and Y. Shin, ββTesting the null hypothesisof stationarity against the alternative of a unit root. how sure are we that economictime series have unit root?ββ Journal of Econometrics, vol. 54, pp. 159--178, 101992.
[27] J. Salvi, ββSignificance of ACF and PACF Plots in Time Series analy-sis,ββ https://towardsdatascience.com/significance-of-acf-and-pacf-plots-in-time-series-analysis-2fa11a5d10a8.
[28] R. Vink, ββAlgorithm breakdown: AR, MA and ARIMA models,ββhttps://www.ritchievink.com/blog/2018/09/26/algorithm-breakdown-ar-ma-and-arima-models/, 09 2018, (Accessed on 02/28/2019).
[30] N. Sinha, ββUnderstanding-LSTM,ββ https://towardsdatascience.com/understanding-lstm-and-its-quick-implementation-in-keras-for-sentiment-analysis-af410fd85b47, (Accessed on 04/19/2019).