Research Article An Anomaly Detection Algorithm of Cloud ...downloads.hindawi.com/journals/mpe/2016/3570305.pdfResearch Article An Anomaly Detection Algorithm of Cloud Platform Based
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleAn Anomaly Detection Algorithm of Cloud PlatformBased on Self-Organizing Maps
Jun Liu1 Shuyu Chen2 Zhen Zhou1 and Tianshu Wu1
1College of Computer Science Chongqing University Chongqing 400044 China2College of Software Engineering Chongqing University Chongqing 400044 China
Correspondence should be addressed to Jun Liu liujuncqcs163com
Received 24 November 2015 Accepted 6 March 2016
Academic Editor Yassir T Makkawi
Copyright copy 2016 Jun Liu et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Virtual machines (VM) on a Cloud platform can be influenced by a variety of factors which can lead to decreased performanceand downtime affecting the reliability of the Cloud platform Traditional anomaly detection algorithms and strategies for Cloudplatforms have some flaws in their accuracy of detection detection speed and adaptability In this paper a dynamic and adaptiveanomaly detection algorithm based on Self-OrganizingMaps (SOM) for virtual machines is proposed A unifiedmodeling methodbased on SOM to detect the machine performance within the detection region is presented which avoids the cost of modelinga single virtual machine and enhances the detection speed and reliability of large-scale virtual machines in Cloud platform Theimportant parameters that affect the modeling speed are optimized in the SOM process to significantly improve the accuracy of theSOMmodeling and therefore the anomaly detection accuracy of the virtual machine
1 Introduction
As Cloud computing applications become increasinglymaturemore andmore industries and enterprises are deploy-ing increasing numbers of applications within Cloud plat-forms in order to improve efficiencies and on-demand ser-vices where resources are limited Virtual machines for com-puting and resource storage are core to a Cloud platform andare essential to ensure normal operation of various businesses[1 2] However as the number of applications increases thescale of the Cloud platform is expanding Resource competi-tion resource sharing and load balancing within the Cloudplatform reduce the stability of virtual machines which leadsdirectly to a decrease in the reliability of the entire Cloud plat-form [3ndash7]Therefore anomaly detection of virtual machinesis an important method for durable and reliable operation ona Cloud platform
At present themainmethods of virtualmachine anomalydetection on Cloud platforms are to collect system operationlogs and various performance metrics of the virtual machinestatus and then determine the anomaly using anomaly detec-tion methods such as statistics clustering classification andnearest neighbor
The statistical anomaly detection method is a statisticalmethod based on a probabilistic model This method makescertain assumptions about the conditions [8] However inreal Cloudplatforms the distribution of data is usually unpre-dictable which means that the statistics-based method haslow detection rates and thus may be unsuitable Clustering-basedmethods group similar virtualmachines states togetherand consider any states which are distant from the clustercenter to be abnormal [9 10] Since this method doesnot need a priori knowledge of the data distribution itsaccuracy is better than the statistics-based method Howeverit is difficult to choose a reasonable clustering algorithmin clustering-based methods Self-Organizing Maps (SOM)[11 12] 119896-means [13 14] and expectation maximization [15]are three commonly used clustering algorithms in anomalydetectionThe classification-based algorithmmainly includesneural networks [16 17] Bayesian networks [18 19] andsupport vector machines [20ndash22] The main drawback ofthese algorithms is the high training cost and the complexityof the implementationTheneighbor-based algorithmdetectsanomalies based on clustering or the similarity of the dataHowever the main disadvantage of this algorithm is that the
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2016 Article ID 3570305 9 pageshttpdxdoiorg10115520163570305
2 Mathematical Problems in Engineering
recognition rate decreases when the normal dataset that isbeing detected does not have enough neighbors
In a Cloud environment the performance and runningstatus of the virtual machine is represented mainly by theperformance metrics The performance metrics include fiveprimary metrics CPU memory disk network and process[23] These metrics can determine whether a virtual machineis abnormal Reference [23] has a more detailed explanationof the performance metrics of virtual machine
This paper proposes an SOM-based anomaly detectionalgorithm which is based on determining the various per-formance metrics of each virtual machine This algorithmis different from traditional strategies in that the detectiondomains of the virtual machines with similar running envi-ronments are divided and each domain is trained iteratively inthe SOM network This enables reasonable adaptation of thetraining of the large-scale virtual machines in the Cloud plat-form and overcomes the shortcomings of traditionalmethodswhere each virtual machine is treated as a training sampleIn addition two important parameters of the SOM networktraining are optimized which greatly reduce the training timeof the SOM network and the performance metrics of the vir-tual machine and enhances the efficiency and the accuracy ofanomaly detection of the virtual machine in the Cloudplatform
Various experiments were conducted in order to verifythe efficiency and accuracy of the SOM-based anomaly detec-tion algorithm The results show that the sample trainingspeed and detection accuracy are significantly improved bythe proposed algorithm
The rest of this paper is organized as follows Section 2describes existing anomaly detection methods Section 3describes the SOM-based virtual machine anomaly detectionalgorithm Section 4 shows the performance evaluation Andfinally Section 5 lists the conclusions derived from theexperiment
2 Related Work
Current anomaly detection methods are mainly based onclassification clustering statistics and nearest neighbormethods [24] These methods will now be introduced
The classification-based method obtains a classifiermodel from a set of selected data and then uses the modelto classify new data [25 26] Shin and Kim proposed a hybridclassification method that combines the One-Class SVM [2728] hybrid classification method with the nearest mean clas-sifier (NMC) [29] The highly flexible nonlinear correlationmodel can be easily classified by the nonlinear kernel functionin this method [30ndash32] This method introduces a featuresubset selection algorithm which not only reduces the num-ber of classification dimensions but also improves the per-formance of the classifier However the main disadvantageof this method is slow training and potential for misclassifi-cation
The clustering-basedmethod is an unsupervised learningmethod [26 33] SNN [34] ROCK [35] and DBSCAN [36]are three typical clustering-based anomaly detection meth-ods All of these three methods assume that normal samples
are within a single cluster within the dataset and abnormalsamples are outside any cluster However if a cluster is formedby the anomaly data after a period of clustering then theanomalies cannot be recognized properly Additionally it isimportant for the clustering that the width of the cluster isaccurately selected The advantages of the clustering-basedapproach are that a priori knowledge of the data distributionis not required and it can be used for incremental modelingFor example for anomaly detection of virtual machines anewly collected virtual machine sample can be analyzed bya model already known for anomaly detection
A typical nearest neighbor based approach is proposedby Breuniq et al [37] using a local outlier factor for the dataabnormality detection Any data that requires analysis is asso-ciated with a local outlier factor which is the ratio of the aver-age local density of the 119896 nearest neighbors to the data itselfThe local density is the volume of data-centric spheres of the 119896
smallest neighbors divided by 119896 If data is abnormal then itslocal density should be significantly different than the localdensity of its nearest neighbors
The statistics-based approach is an earlier anomaly detec-tionmethod which is usually based on an assumption that ananomaly is an observation point that is not generated by anassumedmodel and is partly or completely irrelevant [37] Yeand Chen [24 38] used 120594
2 statistics to detect anomalies in theoperating system Assuming that normal data under trainingis subject to a multivariate normal distribution then the 120594
2 is
1205942
=
119899
sum
119894=1
(119883119894minus 119864119894)2
119864119894
(1)
where 119883119894is the observed value of the 119894th variable 119864
119894is the
expected value of the 119894th variable (obtained from the trainingdata) and 119899 is the number of variables A large value of 120594
In a Cloud platform virtual machines with a similar runningenvironment have similar system performances A SOM-based virtual machine anomaly detection algorithm is aimedat Cloud platforms that have a large number of virtualmachines In this paper we partition virtual machines withsimilar running environments that is we assign a set of vir-tual machines with similar properties to the same detectionfieldThis avoids the requirement for SOMnetworkmodelingfor every single virtual machine significantly reducing themodeling time and the training cost For instance when theproposed method is not used 100 SOM network modelsneed to be built for 100 virtual machines however with theproposedmethod 100 virtual machines with similar runningenvironments only need one SOMnetworkmodel to be builtIn addition a SOM network can be trained more accuratelyby collecting 100 samples than by training using one sampleonly
After partition of the virtual machines SOM networktraining is used in every domain In this paper the two most
Mathematical Problems in Engineering 3
VM VMVM
VM
VMVM
VM VMVM
VM
VM
VM VMVM
VM
VMVMVM
VM
VMVM
Set of detected VMs
Detection domain partition based onsimilarities of the VM running environments
Collect running environment vectors (RE) of each VM
Collect system performance vector (SS)of each VM in the detection domain
Results of anomaly detection
Anomaly detection of VMbased on SOM
Anomaly detection of VMbased on SOM
middot middot middot
middot middot middot
VMVMVM VM VM VMVM
VMVM
Figure 1 SOM anomaly detection logic diagram
important parameters of training width and learning-ratefactor are optimized to enhance the training speed The flowchart of the anomaly detection algorithm is shown in Figure 1
31 SOM-Based State Modeling of the Virtual MachineBecause prior knowledge of similar performance for virtualmachine classification is unknown the 119896-medoids method isused in this paper for initial classification that is the VMson the Cloud platform are divided into multiple detectiondomains The reason the 119896-medoids method is chosen isthat comparedwith the 119896-means algorithm 119896-medoids is lesssusceptible to noise
The SOM network is generated in each detection domainusing the SOM algorithm The network is constructed asa two-dimensional (119873 times 119873) neuron array Each neuroncan be represented as 119899
119894119895 119894 = 1 2 3 119873 and each
neuron is related to a weight vector which is definedas 119882119894119895(1199081 1199082 1199083 119908
119898) 119895 is the column subscript The
dimensions of a weight vector 119898 are the same as the dimen-sions of the training set for training its SOM network Thetraining set used in this paper includes the CPU utilizationperformance which reflects the running state of the virtualmachine its memory utilization and its network throughputThese performance metrics are described by a vector definedas 119878119878(119904119904
1 1199041199042 1199041199043 119904119904
119898)
The modeling of a specific virtual machine-based detec-tion domain in SOM requires periodic measurements andadequate collection of the training data (performance 119909)Thecollected performance vector 119878119878 isin 119877
119898can be considered to
be a random variable within the performance sample spaceTheVMperformance samples collected within a certain timeseries can be expressed as 119878119878
119905(where 119905 = 1 2 3 119899) The
iterative training of the samples collected within this timeseries is the modeling process of the SOM virtual machine
Therefore the detection domain modeling algorithm canbe summarized as follows
Step 1 (initialization of the SOM network) SOM neurons arerepresented by a weight vector (119882
119894119895(0) 119894 119895 = 1 2 3 119873)
where 119894 and 119895 indicate the location of the neurons in the SOMnetwork In this paper the weight vector (119882
119894119895(0)) is initialized
randomly in the SOM network
Step 2 (defining the training space of the SOM network fortraining sample 119878119878
119905) When a training sample 119878119878
119905at time 119905 is
added to the SOM network the most suitable neuron needsto be found to be the training center of the neighborhood For119878119878119905at time 119905 the most suitable neuron 119862 can be found using
(2) and119862will be the training center in the SOMnetwork after119878119878119905is added
(119905 minus 1)10038171003817100381710038171003817 119905 = 2 3
(2)
After the training center 119862 is defined using (2) we needto set the training neighborhood According to the definitionof SOM to ensure convergence of the training process of theSOM network the training neighborhood can be defined as
119875 = 119867(119894119895)
119862(119905
1003817100381710038171003817119897119862
minus (119894 119895)1003817100381710038171003817) (3)
where 119875 is a function of the training neighborhood thatis a monotonically decreasing function of parameter 119897
119862minus
(119894 119895) and training iterations 119905 119897119862is the coordinate vector of
the training center 119862 in the SOM network and (119894 119895) is thecoordinate vector of neuron node 119899
119894119895in the SOM network
Due to its effective smoothing a Gaussian function is used
4 Mathematical Problems in Engineering
as the neighborhood training function in this paper which isdefined as follows
119867(119894119895)
119862= 120572 (119905) sdot exp(minus
1003817100381710038171003817119897119862
minus (119894 119895)10038171003817100381710038172
21205902 (119905)) (4)
In (4) 120572(119905) represents the learning-rate factor which deter-mines the fitting ability of the SOM network for the trainingsample 119878119878
119905in the training process 120590(119905) represents the width
of the neighborhood that determines the range of influence ofa single training sample 119878119878
119905on the SOM network According
to SOM related theory to ensure convergence of the trainingprocess 120572(119905) and 120590(119905) should be both monotonically decreas-ing functions of the number of training iterations 119905
Step 3 (SOM network training based on training sample119878119878119905) The training neighborhood has been defined in Step 2
The neurons which are within the training domain of theSOM network are trained based on the training sample 119878119878
119905
according to (5) The fitting equation is defined as follows
After the training process is completed using (5) theconvergence of the training process needs to be verifiedThe process is convergent if every neuron associated with itsweight vector in a SOM network is stabilized The method isdescribed in detail below
Assume that there is a neuron 119899119894119895in the SOMnetwork and
the time index of its latest training sample is 119905(119894119895)
119897 Meanwhile
assume that there is a sufficiently small real number 120576 and thatconvergence of the training process of the SOM network canbe checked using the following
In (6) 119889(119905) represents the average deviation between thelatest fitting state and the previous value for every neuronwith a weight vector in the SOMnetwork after 119905 training sam-ples are used in a training process Obviously when 119889(119905) lt 120576the neurons119882
119894119895with a weight vector are stabilized indicating
that the iterative training process can be stopped When119889(119905) gt 120576 further collection of the training samples is requiredand Steps 2 and 3 need to be repeated
32 Parameter Setting in the SOM-Based Modeling ProcessThe SOM network modeling process is an iterative fittingprocess that mainly consists of two stages the initial orderedstage and the convergence stage There are two importantparameters in the training neighborhood function 119867
(119894119895)
119862 the
width of the training neighborhood120590(119905) and the learning-ratefactor 120572(119905) Correct setting of these two parameters plays animportant role in preventing the SOMnetwork training fromgetting trapped in ametastable stateThe processes for settingthese two parameters are as follows
(1) Setting theWidth of the Training Neighborhood 120590(119905) Basedon the principle of SOM 120590(119905) is a monotonically decreasing
function of 119905 At the beginning of the training process thevalue of 120590(119905) should be set properly so that the radius of theneighborhood defined by 119867
(119894119895)
119862can reach at least half the
diameter of the SOM network [39] In this paper the value isset to 1198732
Since 119867(119894119895)
119862is a monotonically decreasing function of
119897119862
minus (119894 119895) it can be seen from (4) that when other variablesremain unchanged the 119867
(119894119895)
119862value is small if the neuronal
node is distant from the training center Additionally if 119867(119894119895)
119862
is smaller it has a lower influence on the neuronal node 119899119894119895in
the fitting process When the value of 119867(119894119895)
119862is small enough
the neuron node 119899119894119895is unaffectedTherefore although there is
no clear boundary for the training neighborhood defined bythe Gaussian function in this paper the influential range of asingle training sample 119878119878
119905on the training of the SOMnetwork
can still be limitedAssume that 119891 is a sufficiently small threshold of 119867
(119894119895)
119862
When 119867(119894119895)
119862lt 119891 the current iteration step has no influence
on neuronal node 119899119894119895 while when 119867
(119894119895)
119862gt 119891 the current
iteration step will influence 119899119894119895
Therefore when 119905 = 1 at the beginning of the SOMnetwork training process the lower bound of 120590(119905) can bedetermined based on the threshold 119891 and (4) The detailedderivation process is shown as follows
When 119905 = 1 assume that 119897119862
minus(119894 119895) = 1198732 and the lowerbound of 120590(1) is then determined by the following inequalityderivation process
120572 (1) sdot exp(minus(1198732)
2
21205902 (1)) ge 119891 997904rArr
ln120572 (1) minus(1198732)
2
21205902 (1)ge ln119891 997904rArr
(1198732)2
21205902 (1)le ln
119891
120572 (1)997904rArr
1205902
(1) ge1198732
8 sdot ln (119891120572 (1))997904rArr
120590 (1) le minus119873
2radicln (119891120572 (1))2
120590 (1) ge119873
2radicln (119891120572 (1))2
∵ 120590 (1) gt 0
there4 997904rArr 120590 (1) ge119873
2radicln (119891120572 (1))2
(7)
Based on this derivation the lower bound of 120590(1) can bedetermined by (7) where the threshold119891 = 005 in this paperThe following discussion will describe the value of 120572(1) used
Mathematical Problems in Engineering 5
for setting 120572(119905) According to (7) 120590(119905) in 119867(119894119895)
119862of the initial
ordered stage can be defined as follows
120590 (119905) =119873
2radicln (119891120572 (1))2
sdot exp(minus119905 minus 1
119905)
119905 = 1 2 3 1000
(8)
When the iteration of the SOM network training isgradually converging the size of the training neighborhooddefined by119867
(119894119895)
119862should be constant and can cover the nearest
neighborhood of the training center 119862 in the SOM networkIn this paper the nearest neighborhood that is the nearestfour neurons around neuron 119862 in all four directions (updown left and right) in the SOM network is shown inFigure 2
(2) Setting the Learning-Rate Factor 120572(119905) Since 120572(119905) is amonotonically decreasing function of 119905 the range of 120572(119905) is02 lt 120572(119905) lt 1 in the initial ordered stage of the SOM trainingprocess and 0 lt 120572(119905) lt 02 in the convergent stage of theSOM training process Then 120572(119905) can be set to
120572 (119905) =
exp (minus119905 minus 1
119905) 119905 = 1 2 3 1000
02 sdot exp (minus119905 minus 1
119905) 119905 gt 1000
(9)
33 VM Anomaly Recognition Based on SOM Model Themodeling method of VM status based on SOM is describedsufficiently in the previous section In this section we willdescribe the recognition of an anomaly using the trainedSOM network After several rounds of fitting iterations theSOM network can be used to effectively discover the normalstate of virtual machines The normal state is represented byneurons with weight vectors in the SOM network In otherwords a neuron associated with weight vectors in the SOMnetwork can be used to describe whether a class of similarvirtual machines is normal
In order to check whether the current state of a VMis an anomaly on a Cloud platform we can compare thecurrent running performance of virtual machines with theneurons with weight vectors in the SOM network In thispaper Euclidean distance is used to determine similarity Ifthe current state is similar to one of the neurons with weightvectors (assuming that the probability of anomaly is less thana given threshold 119905) the virtual machine will be identified tobe normal otherwise it will be considered to be abnormal
Let VM119909represent a virtual machine on a Cloud plat-
form The corresponding SOM network of VM119909is defined
as SOM(VM119909) The weight vector of each neuron can be
represented as 119882119878
119894119895 after the training iterations have finished
The currently measured performance value of VM119909is 119878119878
and the abnormal state of VM119909is VmStatus(VM
119909) Then the
W11 W12 W13 W14 W15
W21 W22 W23 W24 W25
W31 W32 C W34 W35
W41 W42 W43 W44 W45
W51 W52 W53 W54 W55
W1Nmiddot middot middot
middot middot middot
⋱
WNNWN1
Figure 2 Nearest neighborhood of neuron node 119862
abnormal state of the virtual machine can be determined bythe following equation
anomaly (VM119909)
=
true min 10038171003817100381710038171003817119878119878 minus 119882
119878
119894119895
10038171003817100381710038171003817| 119894 119895 = 1 2 3 119873 ge 120575
false min 10038171003817100381710038171003817119878119878 minus 119882
41 Experimental Environment and Setup In this paper theexperimental Cloud platform is built on an open sourceCloud platform OpenStack [40 41] The operating systemCentOS 63 is installed on the physical servers for the runningvirtual machines on which the hypervisor Xen-32 [42 43] isinstalledTheoperating systemCentOS 63 is also installed onthe physical servers for running the Cloud management pro-gram on which the Cloud management components Open-Stack are installed 100 virtualmachineswere deployed on thisCloud platform
The performance metrics of the virtual machines in thisexperiment were collected by tools such as libxenstat andlibvirt [44 45] For the fault injection method we used toolsto simulate system failures memory leak CPU Hog andnetwork Hog [46ndash48]
42 Experimental Program and Results
421 First Set of Experiments The impact of the SOMnetwork training neighborhood width and learning-ratefactor values on the performance of the anomaly detection
6 Mathematical Problems in Engineering
Table 1 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp96313 times 13 asymp97918 times 18 asymp97520 times 20 asymp97724 times 24 asymp978
Table 2The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
05 dsn asymp97804 dsn asymp93103 dsn asymp90402 dsn asymp89101 dsn asymp737Dsn indicates the diameter of the SOM network
Table 3 The impact of the initial value of the learning-rate factoron the accuracy of SOM
mechanism of the SOM-based dynamic adaptive virtualmachine was evaluated
Training Stage Firstly several virtual machines were selectedfrom 100 virtual machines One fault was then randomlyselected (a memory leak CPU Hog or network Hog) andthen injected 1000 virtualmachine systemperformancemea-surements were collected as training samples for the modeltraining during 10 rounds (one second per round) on the 100virtual machines
Anomaly Detection Stage In order to simulate an anomalyin the objects under detection one of the three faults wasrandomly injected in the 100 virtualmachinesThe anomaliesin each of the 100 virtual machines were then detected basedon the trained model The detection results were recorded
Several sets of experimental resultswith different parame-ter valueswere obtained It should be noted that the same faultwas injected in each experiment to exclude unnecessaryvariables
The experimental results are shown in Tables 1 2 and 3As can be seen from Table 1 there is no obvious change in
accuracy using the proposed detection method for differentSOM network sizes which means that the proposed anomalydetection method is not affected by the size of the SOMnetwork
Table 4 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp95813 times 13 asymp97118 times 18 asymp96720 times 20 asymp96924 times 24 asymp971
Table 5The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
As can be seen from Table 2 the size of the initial trainedneighborhood has a significant impact on the detection accu-racy The main reason is that if the training size is too smallit may cause a metastable state in the training process andfurther training iterations are required to achieve real steadystate
As can be seen from Table 3 as the initial value of thelearning-rate factor decreases the accuracy of the abnormal-ity detection significantly decreases The reason is that if theinitial value of the learning-rate factor is too small the contri-bution of each training sample in the SOM network trainingis small too Thus the fitting ability of the SOM network todetect an object is not sufficient which leads to poor qualityof model training hence decreasing the accuracy of the SOMnetwork-based anomaly detection
Analysis of the first set of experiments shows that betteranomaly detection results can be obtained in DA SOMwhenthe parameters are set as follows SOM network size = 13 times
13 initial size of training neighborhood = 05 dsn and initialvalue of learning-rate factor = 1
The above experiments have been carried out on thetraining data set To further demonstrate the effectivenessof the proposed algorithm the algorithm is tested on theuntrained anomaly set (disk Hog)
The experimental results about disk Hog are shown inTables 4 5 and 6
Mathematical Problems in Engineering 7
00 10
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
(a) Memory leak
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90False positive rate ()
Sens
itivi
ty (
)
DA_SOM
0
k-NNk-M
(b) CPU Hog
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
0
(c) Network Hog
Figure 3 Comparison of the three anomaly detection algorithms DA SOM 119896-NN and 119896-M
As can be seen from Tables 4 5 and 6 the accuracy of theproposed algorithm still has better accuracy in the untraineddata set The impact of three parameters (som net size train-ing neighborhood width and learning-rate factor) on theaccuracy is similar with the previous experiments
422 Second Set of Experiments The objective of thisset of experiments was to evaluate the effect of the VManomaly detection mechanism based on SOM (representedby DA SOM in the following sections) In order to comparethis with other approaches we use two typical unsupervisedanomaly detection techniques in the experiments (1) 119896-nearest neighbor based anomaly detection technique (called119896-NN) where prior training of the anomaly identification
model is not required (2) cluster-based anomaly detectiontechnique (called 119896-M) where training of the anomaly iden-tification model is required in advance
Several experiments for different techniques and differentparameters with the same aforementioned configuration andexperimental procedure are applied to obtain the correspond-ing results It should be noted that since the training processis not required for the 119896-NN technique it started directlyin the abnormality detection stage In addition to ensurecomparability the training process of the clustering-basedmethod is the same as the proposed method where ananomaly detectionmodel is built for 100 virtualmachines andthe training data set is the same as training SOM Experimen-tal results are shown in Figure 3
8 Mathematical Problems in Engineering
Figure 3 shows that compared to the other two injectedfailures the sensitivities of the three techniques to memoryleak failure are relatively low The main reason is that ananomaly does not immediately appear on the failed objectwhen there is fault introduced by a memory leak It takessome time for this fault to accumulate to eventually cause anobvious abnormality The consequence of this is that detec-tion systems tend tomistake these objects with an anomaly asnormal In contrast faults caused by CPU Hog and networkHog events will immediately lead to an abnormal statewithin the fault object thusminimizingmisjudgments whichenhances the sensitivity of all three anomaly detection tech-niques as shown in Figures 3(b) and 3(c)
Meanwhile as shown in each subgraph of Figure 3compared with the other two anomaly detection techniquesDA SOMmaintains a better balance between sensitivity andfalse alarm rate In other words with the same false alarmrate the sensitivity of DA SOM is better than that of the othertwo approaches showing a strong performance in improvingwarning effect and reducing the false alarm rate
Moreover the computational complexity of DA SOM ismuch lower than that of the 119896-NN in anomaly detection stagewhile the computational complexity of DA SOM is equiva-lent to the 119896-M technique Their complexity is constant withthe detected object size and with the parameter 119896 in the 119896-Mtechnique Meanwhile during the model training stage thetraining cost of 119896-M is higher than that of DA SOM for thesame size of training dataThemain reason is that iteration isrequired in 119896-M on the entire training data set (ie the clustercenters need to be updated and the training data set needs tobe reclassified according to the updated center point) whilethere is only one classification operation for each trainingsample in DA SOM
5 Conclusion
Ananomaly detection algorithmbased on SOMfor theCloudplatform with large-scale virtual machines is proposed Thevirtual machines are partitioned initially according to theirsimilarity and then based on the results of initial partitionthe SOM is modeled The proposed method has a hightraining speed which is not possible in traditional methodswhen there are a large number of virtual machines We alsooptimized the two main parameters in the SOM networkmodeling process which highly improved this process Theproposedmethod is verified on an incremental SOManomalydetection modelThe results showed strong improvements indetection accuracy and speed using the proposed anomalydetection method
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work of this paper is supported by National NaturalScience Foundation of China (Grants no 61272399 and no61572090) and Research Fund for the Doctoral Program ofHigher Education of China (Grant no 20110191110038)
References
[1] J Li Y Cui and Y Ma ldquoModeling message queueing serviceswith reliability guarantee in cloud computing environmentusing colored petri netsrdquoMathematical Problems in Engineeringvol 2015 Article ID 383846 20 pages 2015
[2] MA Rodriguez-Garcia R Valencia-Garcia F Garcia-Sanchezand J J Samper-Zapater ldquoOntology-based annotation andretrieval of services in the cloudrdquoKnowledge-Based Systems vol56 pp 15ndash25 2014
[3] C-C Chang C-Y Sun and T-F Cheng ldquoA dependable storageservice system in cloud environmentrdquo Security and Commu-nication Networks vol 8 no 4 pp 574ndash588 2015
[4] W He and L Xu ldquoA state-of-the-art survey of cloud manufac-turingrdquo International Journal of Computer Integrated Manufac-turing vol 28 no 3 pp 239ndash250 2015
[5] A F Barsoum and M Anwar Hasan ldquoProvable multicopydynamic data possession in cloud computing systemsrdquo IEEETransactions on Information Forensics and Security vol 10 no 3pp 485ndash497 2015
[6] J Subirats and J Guitart ldquoAssessing and forecasting energyefficiency on Cloud computing platformsrdquo Future GenerationComputer Systems vol 45 pp 70ndash94 2015
[7] S Ding S Yang Y Zhang C Liang and C Xia ldquoCombiningQoS prediction and customer satisfaction estimation to solvecloud service trustworthiness evaluation problemsrdquoKnowledge-Based Systems vol 56 pp 216ndash225 2014
[8] I C Paschalidis and Y Chen ldquoStatistical anomaly detectionwith sensor networksrdquo ACM Transactions on Sensor Networksvol 7 no 2 article 17 2010
[9] M GhasemiGol and A Ghaemi-Bafghi ldquoE-correlator anentropy-based alert correlation systemrdquo Security and Commu-nication Networks vol 8 no 5 pp 822ndash836 2015
[10] A K Jain and R C Dubes Algorithms for Clustering DataPrentice Hall Englewood Cliffs NJ USA 1988
[11] M Kourki and M A Riahi ldquoSeismic facies analysis from pre-stack data using self-organizing mapsrdquo Journal of Geophysicsand Engineering vol 11 no 6 Article ID 065005 2014
[12] L Feng and S LiQuan ldquoEnhanced dynamic self-organizingmaps for data clusterrdquo Information Technology Journal vol 26no 1 pp 70ndash81 2009
[13] Z Zhou S Chen M Lin G Wang and Q Yang ldquoMinimizingaverage startup latency of VMs by clustering-based templatecaching scheme in an IaaS systemrdquo International Journal of u-and e- Service Science and Technology vol 6 no 6 pp 145ndash1582013
[14] L Jing M K Ng and J Z Huang ldquoAn entropy weighting k-means algorithm for subspace clustering of high-dimensionalsparse datardquo IEEE Transactions on Knowledge and Data Engi-neering vol 19 no 8 pp 1026ndash1041 2007
[15] R Smith A Bivens M Embrechts C Palagiri and B Szy-manski ldquoClustering approaches for anomaly based intrusiondetectionrdquo in Proceedings of Intelligent Engineering Systemsthrough Artificial Neural Networks pp 579ndash584 2002
[16] Y Sani A Mohamedou K Ali A Farjamfar M Azman and SShamsuddin ldquoAn overview of neural networks use in anomalyintrusion detection systemsrdquo in Proceedings of the IEEE StudentConference on Research andDevelopment (SCOReD rsquo09) pp 89ndash92 Serdang Malaysia November 2009
[17] G P Zhang ldquoNeural networks for classification a surveyrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 30 no 4 pp 451ndash462 2000
Mathematical Problems in Engineering 9
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999
recognition rate decreases when the normal dataset that isbeing detected does not have enough neighbors
In a Cloud environment the performance and runningstatus of the virtual machine is represented mainly by theperformance metrics The performance metrics include fiveprimary metrics CPU memory disk network and process[23] These metrics can determine whether a virtual machineis abnormal Reference [23] has a more detailed explanationof the performance metrics of virtual machine
This paper proposes an SOM-based anomaly detectionalgorithm which is based on determining the various per-formance metrics of each virtual machine This algorithmis different from traditional strategies in that the detectiondomains of the virtual machines with similar running envi-ronments are divided and each domain is trained iteratively inthe SOM network This enables reasonable adaptation of thetraining of the large-scale virtual machines in the Cloud plat-form and overcomes the shortcomings of traditionalmethodswhere each virtual machine is treated as a training sampleIn addition two important parameters of the SOM networktraining are optimized which greatly reduce the training timeof the SOM network and the performance metrics of the vir-tual machine and enhances the efficiency and the accuracy ofanomaly detection of the virtual machine in the Cloudplatform
Various experiments were conducted in order to verifythe efficiency and accuracy of the SOM-based anomaly detec-tion algorithm The results show that the sample trainingspeed and detection accuracy are significantly improved bythe proposed algorithm
The rest of this paper is organized as follows Section 2describes existing anomaly detection methods Section 3describes the SOM-based virtual machine anomaly detectionalgorithm Section 4 shows the performance evaluation Andfinally Section 5 lists the conclusions derived from theexperiment
2 Related Work
Current anomaly detection methods are mainly based onclassification clustering statistics and nearest neighbormethods [24] These methods will now be introduced
The classification-based method obtains a classifiermodel from a set of selected data and then uses the modelto classify new data [25 26] Shin and Kim proposed a hybridclassification method that combines the One-Class SVM [2728] hybrid classification method with the nearest mean clas-sifier (NMC) [29] The highly flexible nonlinear correlationmodel can be easily classified by the nonlinear kernel functionin this method [30ndash32] This method introduces a featuresubset selection algorithm which not only reduces the num-ber of classification dimensions but also improves the per-formance of the classifier However the main disadvantageof this method is slow training and potential for misclassifi-cation
The clustering-basedmethod is an unsupervised learningmethod [26 33] SNN [34] ROCK [35] and DBSCAN [36]are three typical clustering-based anomaly detection meth-ods All of these three methods assume that normal samples
are within a single cluster within the dataset and abnormalsamples are outside any cluster However if a cluster is formedby the anomaly data after a period of clustering then theanomalies cannot be recognized properly Additionally it isimportant for the clustering that the width of the cluster isaccurately selected The advantages of the clustering-basedapproach are that a priori knowledge of the data distributionis not required and it can be used for incremental modelingFor example for anomaly detection of virtual machines anewly collected virtual machine sample can be analyzed bya model already known for anomaly detection
A typical nearest neighbor based approach is proposedby Breuniq et al [37] using a local outlier factor for the dataabnormality detection Any data that requires analysis is asso-ciated with a local outlier factor which is the ratio of the aver-age local density of the 119896 nearest neighbors to the data itselfThe local density is the volume of data-centric spheres of the 119896
smallest neighbors divided by 119896 If data is abnormal then itslocal density should be significantly different than the localdensity of its nearest neighbors
The statistics-based approach is an earlier anomaly detec-tionmethod which is usually based on an assumption that ananomaly is an observation point that is not generated by anassumedmodel and is partly or completely irrelevant [37] Yeand Chen [24 38] used 120594
2 statistics to detect anomalies in theoperating system Assuming that normal data under trainingis subject to a multivariate normal distribution then the 120594
2 is
1205942
=
119899
sum
119894=1
(119883119894minus 119864119894)2
119864119894
(1)
where 119883119894is the observed value of the 119894th variable 119864
119894is the
expected value of the 119894th variable (obtained from the trainingdata) and 119899 is the number of variables A large value of 120594
In a Cloud platform virtual machines with a similar runningenvironment have similar system performances A SOM-based virtual machine anomaly detection algorithm is aimedat Cloud platforms that have a large number of virtualmachines In this paper we partition virtual machines withsimilar running environments that is we assign a set of vir-tual machines with similar properties to the same detectionfieldThis avoids the requirement for SOMnetworkmodelingfor every single virtual machine significantly reducing themodeling time and the training cost For instance when theproposed method is not used 100 SOM network modelsneed to be built for 100 virtual machines however with theproposedmethod 100 virtual machines with similar runningenvironments only need one SOMnetworkmodel to be builtIn addition a SOM network can be trained more accuratelyby collecting 100 samples than by training using one sampleonly
After partition of the virtual machines SOM networktraining is used in every domain In this paper the two most
Mathematical Problems in Engineering 3
VM VMVM
VM
VMVM
VM VMVM
VM
VM
VM VMVM
VM
VMVMVM
VM
VMVM
Set of detected VMs
Detection domain partition based onsimilarities of the VM running environments
Collect running environment vectors (RE) of each VM
Collect system performance vector (SS)of each VM in the detection domain
Results of anomaly detection
Anomaly detection of VMbased on SOM
Anomaly detection of VMbased on SOM
middot middot middot
middot middot middot
VMVMVM VM VM VMVM
VMVM
Figure 1 SOM anomaly detection logic diagram
important parameters of training width and learning-ratefactor are optimized to enhance the training speed The flowchart of the anomaly detection algorithm is shown in Figure 1
31 SOM-Based State Modeling of the Virtual MachineBecause prior knowledge of similar performance for virtualmachine classification is unknown the 119896-medoids method isused in this paper for initial classification that is the VMson the Cloud platform are divided into multiple detectiondomains The reason the 119896-medoids method is chosen isthat comparedwith the 119896-means algorithm 119896-medoids is lesssusceptible to noise
The SOM network is generated in each detection domainusing the SOM algorithm The network is constructed asa two-dimensional (119873 times 119873) neuron array Each neuroncan be represented as 119899
119894119895 119894 = 1 2 3 119873 and each
neuron is related to a weight vector which is definedas 119882119894119895(1199081 1199082 1199083 119908
119898) 119895 is the column subscript The
dimensions of a weight vector 119898 are the same as the dimen-sions of the training set for training its SOM network Thetraining set used in this paper includes the CPU utilizationperformance which reflects the running state of the virtualmachine its memory utilization and its network throughputThese performance metrics are described by a vector definedas 119878119878(119904119904
1 1199041199042 1199041199043 119904119904
119898)
The modeling of a specific virtual machine-based detec-tion domain in SOM requires periodic measurements andadequate collection of the training data (performance 119909)Thecollected performance vector 119878119878 isin 119877
119898can be considered to
be a random variable within the performance sample spaceTheVMperformance samples collected within a certain timeseries can be expressed as 119878119878
119905(where 119905 = 1 2 3 119899) The
iterative training of the samples collected within this timeseries is the modeling process of the SOM virtual machine
Therefore the detection domain modeling algorithm canbe summarized as follows
Step 1 (initialization of the SOM network) SOM neurons arerepresented by a weight vector (119882
119894119895(0) 119894 119895 = 1 2 3 119873)
where 119894 and 119895 indicate the location of the neurons in the SOMnetwork In this paper the weight vector (119882
119894119895(0)) is initialized
randomly in the SOM network
Step 2 (defining the training space of the SOM network fortraining sample 119878119878
119905) When a training sample 119878119878
119905at time 119905 is
added to the SOM network the most suitable neuron needsto be found to be the training center of the neighborhood For119878119878119905at time 119905 the most suitable neuron 119862 can be found using
(2) and119862will be the training center in the SOMnetwork after119878119878119905is added
(119905 minus 1)10038171003817100381710038171003817 119905 = 2 3
(2)
After the training center 119862 is defined using (2) we needto set the training neighborhood According to the definitionof SOM to ensure convergence of the training process of theSOM network the training neighborhood can be defined as
119875 = 119867(119894119895)
119862(119905
1003817100381710038171003817119897119862
minus (119894 119895)1003817100381710038171003817) (3)
where 119875 is a function of the training neighborhood thatis a monotonically decreasing function of parameter 119897
119862minus
(119894 119895) and training iterations 119905 119897119862is the coordinate vector of
the training center 119862 in the SOM network and (119894 119895) is thecoordinate vector of neuron node 119899
119894119895in the SOM network
Due to its effective smoothing a Gaussian function is used
4 Mathematical Problems in Engineering
as the neighborhood training function in this paper which isdefined as follows
119867(119894119895)
119862= 120572 (119905) sdot exp(minus
1003817100381710038171003817119897119862
minus (119894 119895)10038171003817100381710038172
21205902 (119905)) (4)
In (4) 120572(119905) represents the learning-rate factor which deter-mines the fitting ability of the SOM network for the trainingsample 119878119878
119905in the training process 120590(119905) represents the width
of the neighborhood that determines the range of influence ofa single training sample 119878119878
119905on the SOM network According
to SOM related theory to ensure convergence of the trainingprocess 120572(119905) and 120590(119905) should be both monotonically decreas-ing functions of the number of training iterations 119905
Step 3 (SOM network training based on training sample119878119878119905) The training neighborhood has been defined in Step 2
The neurons which are within the training domain of theSOM network are trained based on the training sample 119878119878
119905
according to (5) The fitting equation is defined as follows
After the training process is completed using (5) theconvergence of the training process needs to be verifiedThe process is convergent if every neuron associated with itsweight vector in a SOM network is stabilized The method isdescribed in detail below
Assume that there is a neuron 119899119894119895in the SOMnetwork and
the time index of its latest training sample is 119905(119894119895)
119897 Meanwhile
assume that there is a sufficiently small real number 120576 and thatconvergence of the training process of the SOM network canbe checked using the following
In (6) 119889(119905) represents the average deviation between thelatest fitting state and the previous value for every neuronwith a weight vector in the SOMnetwork after 119905 training sam-ples are used in a training process Obviously when 119889(119905) lt 120576the neurons119882
119894119895with a weight vector are stabilized indicating
that the iterative training process can be stopped When119889(119905) gt 120576 further collection of the training samples is requiredand Steps 2 and 3 need to be repeated
32 Parameter Setting in the SOM-Based Modeling ProcessThe SOM network modeling process is an iterative fittingprocess that mainly consists of two stages the initial orderedstage and the convergence stage There are two importantparameters in the training neighborhood function 119867
(119894119895)
119862 the
width of the training neighborhood120590(119905) and the learning-ratefactor 120572(119905) Correct setting of these two parameters plays animportant role in preventing the SOMnetwork training fromgetting trapped in ametastable stateThe processes for settingthese two parameters are as follows
(1) Setting theWidth of the Training Neighborhood 120590(119905) Basedon the principle of SOM 120590(119905) is a monotonically decreasing
function of 119905 At the beginning of the training process thevalue of 120590(119905) should be set properly so that the radius of theneighborhood defined by 119867
(119894119895)
119862can reach at least half the
diameter of the SOM network [39] In this paper the value isset to 1198732
Since 119867(119894119895)
119862is a monotonically decreasing function of
119897119862
minus (119894 119895) it can be seen from (4) that when other variablesremain unchanged the 119867
(119894119895)
119862value is small if the neuronal
node is distant from the training center Additionally if 119867(119894119895)
119862
is smaller it has a lower influence on the neuronal node 119899119894119895in
the fitting process When the value of 119867(119894119895)
119862is small enough
the neuron node 119899119894119895is unaffectedTherefore although there is
no clear boundary for the training neighborhood defined bythe Gaussian function in this paper the influential range of asingle training sample 119878119878
119905on the training of the SOMnetwork
can still be limitedAssume that 119891 is a sufficiently small threshold of 119867
(119894119895)
119862
When 119867(119894119895)
119862lt 119891 the current iteration step has no influence
on neuronal node 119899119894119895 while when 119867
(119894119895)
119862gt 119891 the current
iteration step will influence 119899119894119895
Therefore when 119905 = 1 at the beginning of the SOMnetwork training process the lower bound of 120590(119905) can bedetermined based on the threshold 119891 and (4) The detailedderivation process is shown as follows
When 119905 = 1 assume that 119897119862
minus(119894 119895) = 1198732 and the lowerbound of 120590(1) is then determined by the following inequalityderivation process
120572 (1) sdot exp(minus(1198732)
2
21205902 (1)) ge 119891 997904rArr
ln120572 (1) minus(1198732)
2
21205902 (1)ge ln119891 997904rArr
(1198732)2
21205902 (1)le ln
119891
120572 (1)997904rArr
1205902
(1) ge1198732
8 sdot ln (119891120572 (1))997904rArr
120590 (1) le minus119873
2radicln (119891120572 (1))2
120590 (1) ge119873
2radicln (119891120572 (1))2
∵ 120590 (1) gt 0
there4 997904rArr 120590 (1) ge119873
2radicln (119891120572 (1))2
(7)
Based on this derivation the lower bound of 120590(1) can bedetermined by (7) where the threshold119891 = 005 in this paperThe following discussion will describe the value of 120572(1) used
Mathematical Problems in Engineering 5
for setting 120572(119905) According to (7) 120590(119905) in 119867(119894119895)
119862of the initial
ordered stage can be defined as follows
120590 (119905) =119873
2radicln (119891120572 (1))2
sdot exp(minus119905 minus 1
119905)
119905 = 1 2 3 1000
(8)
When the iteration of the SOM network training isgradually converging the size of the training neighborhooddefined by119867
(119894119895)
119862should be constant and can cover the nearest
neighborhood of the training center 119862 in the SOM networkIn this paper the nearest neighborhood that is the nearestfour neurons around neuron 119862 in all four directions (updown left and right) in the SOM network is shown inFigure 2
(2) Setting the Learning-Rate Factor 120572(119905) Since 120572(119905) is amonotonically decreasing function of 119905 the range of 120572(119905) is02 lt 120572(119905) lt 1 in the initial ordered stage of the SOM trainingprocess and 0 lt 120572(119905) lt 02 in the convergent stage of theSOM training process Then 120572(119905) can be set to
120572 (119905) =
exp (minus119905 minus 1
119905) 119905 = 1 2 3 1000
02 sdot exp (minus119905 minus 1
119905) 119905 gt 1000
(9)
33 VM Anomaly Recognition Based on SOM Model Themodeling method of VM status based on SOM is describedsufficiently in the previous section In this section we willdescribe the recognition of an anomaly using the trainedSOM network After several rounds of fitting iterations theSOM network can be used to effectively discover the normalstate of virtual machines The normal state is represented byneurons with weight vectors in the SOM network In otherwords a neuron associated with weight vectors in the SOMnetwork can be used to describe whether a class of similarvirtual machines is normal
In order to check whether the current state of a VMis an anomaly on a Cloud platform we can compare thecurrent running performance of virtual machines with theneurons with weight vectors in the SOM network In thispaper Euclidean distance is used to determine similarity Ifthe current state is similar to one of the neurons with weightvectors (assuming that the probability of anomaly is less thana given threshold 119905) the virtual machine will be identified tobe normal otherwise it will be considered to be abnormal
Let VM119909represent a virtual machine on a Cloud plat-
form The corresponding SOM network of VM119909is defined
as SOM(VM119909) The weight vector of each neuron can be
represented as 119882119878
119894119895 after the training iterations have finished
The currently measured performance value of VM119909is 119878119878
and the abnormal state of VM119909is VmStatus(VM
119909) Then the
W11 W12 W13 W14 W15
W21 W22 W23 W24 W25
W31 W32 C W34 W35
W41 W42 W43 W44 W45
W51 W52 W53 W54 W55
W1Nmiddot middot middot
middot middot middot
⋱
WNNWN1
Figure 2 Nearest neighborhood of neuron node 119862
abnormal state of the virtual machine can be determined bythe following equation
anomaly (VM119909)
=
true min 10038171003817100381710038171003817119878119878 minus 119882
119878
119894119895
10038171003817100381710038171003817| 119894 119895 = 1 2 3 119873 ge 120575
false min 10038171003817100381710038171003817119878119878 minus 119882
41 Experimental Environment and Setup In this paper theexperimental Cloud platform is built on an open sourceCloud platform OpenStack [40 41] The operating systemCentOS 63 is installed on the physical servers for the runningvirtual machines on which the hypervisor Xen-32 [42 43] isinstalledTheoperating systemCentOS 63 is also installed onthe physical servers for running the Cloud management pro-gram on which the Cloud management components Open-Stack are installed 100 virtualmachineswere deployed on thisCloud platform
The performance metrics of the virtual machines in thisexperiment were collected by tools such as libxenstat andlibvirt [44 45] For the fault injection method we used toolsto simulate system failures memory leak CPU Hog andnetwork Hog [46ndash48]
42 Experimental Program and Results
421 First Set of Experiments The impact of the SOMnetwork training neighborhood width and learning-ratefactor values on the performance of the anomaly detection
6 Mathematical Problems in Engineering
Table 1 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp96313 times 13 asymp97918 times 18 asymp97520 times 20 asymp97724 times 24 asymp978
Table 2The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
05 dsn asymp97804 dsn asymp93103 dsn asymp90402 dsn asymp89101 dsn asymp737Dsn indicates the diameter of the SOM network
Table 3 The impact of the initial value of the learning-rate factoron the accuracy of SOM
mechanism of the SOM-based dynamic adaptive virtualmachine was evaluated
Training Stage Firstly several virtual machines were selectedfrom 100 virtual machines One fault was then randomlyselected (a memory leak CPU Hog or network Hog) andthen injected 1000 virtualmachine systemperformancemea-surements were collected as training samples for the modeltraining during 10 rounds (one second per round) on the 100virtual machines
Anomaly Detection Stage In order to simulate an anomalyin the objects under detection one of the three faults wasrandomly injected in the 100 virtualmachinesThe anomaliesin each of the 100 virtual machines were then detected basedon the trained model The detection results were recorded
Several sets of experimental resultswith different parame-ter valueswere obtained It should be noted that the same faultwas injected in each experiment to exclude unnecessaryvariables
The experimental results are shown in Tables 1 2 and 3As can be seen from Table 1 there is no obvious change in
accuracy using the proposed detection method for differentSOM network sizes which means that the proposed anomalydetection method is not affected by the size of the SOMnetwork
Table 4 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp95813 times 13 asymp97118 times 18 asymp96720 times 20 asymp96924 times 24 asymp971
Table 5The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
As can be seen from Table 2 the size of the initial trainedneighborhood has a significant impact on the detection accu-racy The main reason is that if the training size is too smallit may cause a metastable state in the training process andfurther training iterations are required to achieve real steadystate
As can be seen from Table 3 as the initial value of thelearning-rate factor decreases the accuracy of the abnormal-ity detection significantly decreases The reason is that if theinitial value of the learning-rate factor is too small the contri-bution of each training sample in the SOM network trainingis small too Thus the fitting ability of the SOM network todetect an object is not sufficient which leads to poor qualityof model training hence decreasing the accuracy of the SOMnetwork-based anomaly detection
Analysis of the first set of experiments shows that betteranomaly detection results can be obtained in DA SOMwhenthe parameters are set as follows SOM network size = 13 times
13 initial size of training neighborhood = 05 dsn and initialvalue of learning-rate factor = 1
The above experiments have been carried out on thetraining data set To further demonstrate the effectivenessof the proposed algorithm the algorithm is tested on theuntrained anomaly set (disk Hog)
The experimental results about disk Hog are shown inTables 4 5 and 6
Mathematical Problems in Engineering 7
00 10
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
(a) Memory leak
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90False positive rate ()
Sens
itivi
ty (
)
DA_SOM
0
k-NNk-M
(b) CPU Hog
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
0
(c) Network Hog
Figure 3 Comparison of the three anomaly detection algorithms DA SOM 119896-NN and 119896-M
As can be seen from Tables 4 5 and 6 the accuracy of theproposed algorithm still has better accuracy in the untraineddata set The impact of three parameters (som net size train-ing neighborhood width and learning-rate factor) on theaccuracy is similar with the previous experiments
422 Second Set of Experiments The objective of thisset of experiments was to evaluate the effect of the VManomaly detection mechanism based on SOM (representedby DA SOM in the following sections) In order to comparethis with other approaches we use two typical unsupervisedanomaly detection techniques in the experiments (1) 119896-nearest neighbor based anomaly detection technique (called119896-NN) where prior training of the anomaly identification
model is not required (2) cluster-based anomaly detectiontechnique (called 119896-M) where training of the anomaly iden-tification model is required in advance
Several experiments for different techniques and differentparameters with the same aforementioned configuration andexperimental procedure are applied to obtain the correspond-ing results It should be noted that since the training processis not required for the 119896-NN technique it started directlyin the abnormality detection stage In addition to ensurecomparability the training process of the clustering-basedmethod is the same as the proposed method where ananomaly detectionmodel is built for 100 virtualmachines andthe training data set is the same as training SOM Experimen-tal results are shown in Figure 3
8 Mathematical Problems in Engineering
Figure 3 shows that compared to the other two injectedfailures the sensitivities of the three techniques to memoryleak failure are relatively low The main reason is that ananomaly does not immediately appear on the failed objectwhen there is fault introduced by a memory leak It takessome time for this fault to accumulate to eventually cause anobvious abnormality The consequence of this is that detec-tion systems tend tomistake these objects with an anomaly asnormal In contrast faults caused by CPU Hog and networkHog events will immediately lead to an abnormal statewithin the fault object thusminimizingmisjudgments whichenhances the sensitivity of all three anomaly detection tech-niques as shown in Figures 3(b) and 3(c)
Meanwhile as shown in each subgraph of Figure 3compared with the other two anomaly detection techniquesDA SOMmaintains a better balance between sensitivity andfalse alarm rate In other words with the same false alarmrate the sensitivity of DA SOM is better than that of the othertwo approaches showing a strong performance in improvingwarning effect and reducing the false alarm rate
Moreover the computational complexity of DA SOM ismuch lower than that of the 119896-NN in anomaly detection stagewhile the computational complexity of DA SOM is equiva-lent to the 119896-M technique Their complexity is constant withthe detected object size and with the parameter 119896 in the 119896-Mtechnique Meanwhile during the model training stage thetraining cost of 119896-M is higher than that of DA SOM for thesame size of training dataThemain reason is that iteration isrequired in 119896-M on the entire training data set (ie the clustercenters need to be updated and the training data set needs tobe reclassified according to the updated center point) whilethere is only one classification operation for each trainingsample in DA SOM
5 Conclusion
Ananomaly detection algorithmbased on SOMfor theCloudplatform with large-scale virtual machines is proposed Thevirtual machines are partitioned initially according to theirsimilarity and then based on the results of initial partitionthe SOM is modeled The proposed method has a hightraining speed which is not possible in traditional methodswhen there are a large number of virtual machines We alsooptimized the two main parameters in the SOM networkmodeling process which highly improved this process Theproposedmethod is verified on an incremental SOManomalydetection modelThe results showed strong improvements indetection accuracy and speed using the proposed anomalydetection method
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work of this paper is supported by National NaturalScience Foundation of China (Grants no 61272399 and no61572090) and Research Fund for the Doctoral Program ofHigher Education of China (Grant no 20110191110038)
References
[1] J Li Y Cui and Y Ma ldquoModeling message queueing serviceswith reliability guarantee in cloud computing environmentusing colored petri netsrdquoMathematical Problems in Engineeringvol 2015 Article ID 383846 20 pages 2015
[2] MA Rodriguez-Garcia R Valencia-Garcia F Garcia-Sanchezand J J Samper-Zapater ldquoOntology-based annotation andretrieval of services in the cloudrdquoKnowledge-Based Systems vol56 pp 15ndash25 2014
[3] C-C Chang C-Y Sun and T-F Cheng ldquoA dependable storageservice system in cloud environmentrdquo Security and Commu-nication Networks vol 8 no 4 pp 574ndash588 2015
[4] W He and L Xu ldquoA state-of-the-art survey of cloud manufac-turingrdquo International Journal of Computer Integrated Manufac-turing vol 28 no 3 pp 239ndash250 2015
[5] A F Barsoum and M Anwar Hasan ldquoProvable multicopydynamic data possession in cloud computing systemsrdquo IEEETransactions on Information Forensics and Security vol 10 no 3pp 485ndash497 2015
[6] J Subirats and J Guitart ldquoAssessing and forecasting energyefficiency on Cloud computing platformsrdquo Future GenerationComputer Systems vol 45 pp 70ndash94 2015
[7] S Ding S Yang Y Zhang C Liang and C Xia ldquoCombiningQoS prediction and customer satisfaction estimation to solvecloud service trustworthiness evaluation problemsrdquoKnowledge-Based Systems vol 56 pp 216ndash225 2014
[8] I C Paschalidis and Y Chen ldquoStatistical anomaly detectionwith sensor networksrdquo ACM Transactions on Sensor Networksvol 7 no 2 article 17 2010
[9] M GhasemiGol and A Ghaemi-Bafghi ldquoE-correlator anentropy-based alert correlation systemrdquo Security and Commu-nication Networks vol 8 no 5 pp 822ndash836 2015
[10] A K Jain and R C Dubes Algorithms for Clustering DataPrentice Hall Englewood Cliffs NJ USA 1988
[11] M Kourki and M A Riahi ldquoSeismic facies analysis from pre-stack data using self-organizing mapsrdquo Journal of Geophysicsand Engineering vol 11 no 6 Article ID 065005 2014
[12] L Feng and S LiQuan ldquoEnhanced dynamic self-organizingmaps for data clusterrdquo Information Technology Journal vol 26no 1 pp 70ndash81 2009
[13] Z Zhou S Chen M Lin G Wang and Q Yang ldquoMinimizingaverage startup latency of VMs by clustering-based templatecaching scheme in an IaaS systemrdquo International Journal of u-and e- Service Science and Technology vol 6 no 6 pp 145ndash1582013
[14] L Jing M K Ng and J Z Huang ldquoAn entropy weighting k-means algorithm for subspace clustering of high-dimensionalsparse datardquo IEEE Transactions on Knowledge and Data Engi-neering vol 19 no 8 pp 1026ndash1041 2007
[15] R Smith A Bivens M Embrechts C Palagiri and B Szy-manski ldquoClustering approaches for anomaly based intrusiondetectionrdquo in Proceedings of Intelligent Engineering Systemsthrough Artificial Neural Networks pp 579ndash584 2002
[16] Y Sani A Mohamedou K Ali A Farjamfar M Azman and SShamsuddin ldquoAn overview of neural networks use in anomalyintrusion detection systemsrdquo in Proceedings of the IEEE StudentConference on Research andDevelopment (SCOReD rsquo09) pp 89ndash92 Serdang Malaysia November 2009
[17] G P Zhang ldquoNeural networks for classification a surveyrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 30 no 4 pp 451ndash462 2000
Mathematical Problems in Engineering 9
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999
Collect system performance vector (SS)of each VM in the detection domain
Results of anomaly detection
Anomaly detection of VMbased on SOM
Anomaly detection of VMbased on SOM
middot middot middot
middot middot middot
VMVMVM VM VM VMVM
VMVM
Figure 1 SOM anomaly detection logic diagram
important parameters of training width and learning-ratefactor are optimized to enhance the training speed The flowchart of the anomaly detection algorithm is shown in Figure 1
31 SOM-Based State Modeling of the Virtual MachineBecause prior knowledge of similar performance for virtualmachine classification is unknown the 119896-medoids method isused in this paper for initial classification that is the VMson the Cloud platform are divided into multiple detectiondomains The reason the 119896-medoids method is chosen isthat comparedwith the 119896-means algorithm 119896-medoids is lesssusceptible to noise
The SOM network is generated in each detection domainusing the SOM algorithm The network is constructed asa two-dimensional (119873 times 119873) neuron array Each neuroncan be represented as 119899
119894119895 119894 = 1 2 3 119873 and each
neuron is related to a weight vector which is definedas 119882119894119895(1199081 1199082 1199083 119908
119898) 119895 is the column subscript The
dimensions of a weight vector 119898 are the same as the dimen-sions of the training set for training its SOM network Thetraining set used in this paper includes the CPU utilizationperformance which reflects the running state of the virtualmachine its memory utilization and its network throughputThese performance metrics are described by a vector definedas 119878119878(119904119904
1 1199041199042 1199041199043 119904119904
119898)
The modeling of a specific virtual machine-based detec-tion domain in SOM requires periodic measurements andadequate collection of the training data (performance 119909)Thecollected performance vector 119878119878 isin 119877
119898can be considered to
be a random variable within the performance sample spaceTheVMperformance samples collected within a certain timeseries can be expressed as 119878119878
119905(where 119905 = 1 2 3 119899) The
iterative training of the samples collected within this timeseries is the modeling process of the SOM virtual machine
Therefore the detection domain modeling algorithm canbe summarized as follows
Step 1 (initialization of the SOM network) SOM neurons arerepresented by a weight vector (119882
119894119895(0) 119894 119895 = 1 2 3 119873)
where 119894 and 119895 indicate the location of the neurons in the SOMnetwork In this paper the weight vector (119882
119894119895(0)) is initialized
randomly in the SOM network
Step 2 (defining the training space of the SOM network fortraining sample 119878119878
119905) When a training sample 119878119878
119905at time 119905 is
added to the SOM network the most suitable neuron needsto be found to be the training center of the neighborhood For119878119878119905at time 119905 the most suitable neuron 119862 can be found using
(2) and119862will be the training center in the SOMnetwork after119878119878119905is added
(119905 minus 1)10038171003817100381710038171003817 119905 = 2 3
(2)
After the training center 119862 is defined using (2) we needto set the training neighborhood According to the definitionof SOM to ensure convergence of the training process of theSOM network the training neighborhood can be defined as
119875 = 119867(119894119895)
119862(119905
1003817100381710038171003817119897119862
minus (119894 119895)1003817100381710038171003817) (3)
where 119875 is a function of the training neighborhood thatis a monotonically decreasing function of parameter 119897
119862minus
(119894 119895) and training iterations 119905 119897119862is the coordinate vector of
the training center 119862 in the SOM network and (119894 119895) is thecoordinate vector of neuron node 119899
119894119895in the SOM network
Due to its effective smoothing a Gaussian function is used
4 Mathematical Problems in Engineering
as the neighborhood training function in this paper which isdefined as follows
119867(119894119895)
119862= 120572 (119905) sdot exp(minus
1003817100381710038171003817119897119862
minus (119894 119895)10038171003817100381710038172
21205902 (119905)) (4)
In (4) 120572(119905) represents the learning-rate factor which deter-mines the fitting ability of the SOM network for the trainingsample 119878119878
119905in the training process 120590(119905) represents the width
of the neighborhood that determines the range of influence ofa single training sample 119878119878
119905on the SOM network According
to SOM related theory to ensure convergence of the trainingprocess 120572(119905) and 120590(119905) should be both monotonically decreas-ing functions of the number of training iterations 119905
Step 3 (SOM network training based on training sample119878119878119905) The training neighborhood has been defined in Step 2
The neurons which are within the training domain of theSOM network are trained based on the training sample 119878119878
119905
according to (5) The fitting equation is defined as follows
After the training process is completed using (5) theconvergence of the training process needs to be verifiedThe process is convergent if every neuron associated with itsweight vector in a SOM network is stabilized The method isdescribed in detail below
Assume that there is a neuron 119899119894119895in the SOMnetwork and
the time index of its latest training sample is 119905(119894119895)
119897 Meanwhile
assume that there is a sufficiently small real number 120576 and thatconvergence of the training process of the SOM network canbe checked using the following
In (6) 119889(119905) represents the average deviation between thelatest fitting state and the previous value for every neuronwith a weight vector in the SOMnetwork after 119905 training sam-ples are used in a training process Obviously when 119889(119905) lt 120576the neurons119882
119894119895with a weight vector are stabilized indicating
that the iterative training process can be stopped When119889(119905) gt 120576 further collection of the training samples is requiredand Steps 2 and 3 need to be repeated
32 Parameter Setting in the SOM-Based Modeling ProcessThe SOM network modeling process is an iterative fittingprocess that mainly consists of two stages the initial orderedstage and the convergence stage There are two importantparameters in the training neighborhood function 119867
(119894119895)
119862 the
width of the training neighborhood120590(119905) and the learning-ratefactor 120572(119905) Correct setting of these two parameters plays animportant role in preventing the SOMnetwork training fromgetting trapped in ametastable stateThe processes for settingthese two parameters are as follows
(1) Setting theWidth of the Training Neighborhood 120590(119905) Basedon the principle of SOM 120590(119905) is a monotonically decreasing
function of 119905 At the beginning of the training process thevalue of 120590(119905) should be set properly so that the radius of theneighborhood defined by 119867
(119894119895)
119862can reach at least half the
diameter of the SOM network [39] In this paper the value isset to 1198732
Since 119867(119894119895)
119862is a monotonically decreasing function of
119897119862
minus (119894 119895) it can be seen from (4) that when other variablesremain unchanged the 119867
(119894119895)
119862value is small if the neuronal
node is distant from the training center Additionally if 119867(119894119895)
119862
is smaller it has a lower influence on the neuronal node 119899119894119895in
the fitting process When the value of 119867(119894119895)
119862is small enough
the neuron node 119899119894119895is unaffectedTherefore although there is
no clear boundary for the training neighborhood defined bythe Gaussian function in this paper the influential range of asingle training sample 119878119878
119905on the training of the SOMnetwork
can still be limitedAssume that 119891 is a sufficiently small threshold of 119867
(119894119895)
119862
When 119867(119894119895)
119862lt 119891 the current iteration step has no influence
on neuronal node 119899119894119895 while when 119867
(119894119895)
119862gt 119891 the current
iteration step will influence 119899119894119895
Therefore when 119905 = 1 at the beginning of the SOMnetwork training process the lower bound of 120590(119905) can bedetermined based on the threshold 119891 and (4) The detailedderivation process is shown as follows
When 119905 = 1 assume that 119897119862
minus(119894 119895) = 1198732 and the lowerbound of 120590(1) is then determined by the following inequalityderivation process
120572 (1) sdot exp(minus(1198732)
2
21205902 (1)) ge 119891 997904rArr
ln120572 (1) minus(1198732)
2
21205902 (1)ge ln119891 997904rArr
(1198732)2
21205902 (1)le ln
119891
120572 (1)997904rArr
1205902
(1) ge1198732
8 sdot ln (119891120572 (1))997904rArr
120590 (1) le minus119873
2radicln (119891120572 (1))2
120590 (1) ge119873
2radicln (119891120572 (1))2
∵ 120590 (1) gt 0
there4 997904rArr 120590 (1) ge119873
2radicln (119891120572 (1))2
(7)
Based on this derivation the lower bound of 120590(1) can bedetermined by (7) where the threshold119891 = 005 in this paperThe following discussion will describe the value of 120572(1) used
Mathematical Problems in Engineering 5
for setting 120572(119905) According to (7) 120590(119905) in 119867(119894119895)
119862of the initial
ordered stage can be defined as follows
120590 (119905) =119873
2radicln (119891120572 (1))2
sdot exp(minus119905 minus 1
119905)
119905 = 1 2 3 1000
(8)
When the iteration of the SOM network training isgradually converging the size of the training neighborhooddefined by119867
(119894119895)
119862should be constant and can cover the nearest
neighborhood of the training center 119862 in the SOM networkIn this paper the nearest neighborhood that is the nearestfour neurons around neuron 119862 in all four directions (updown left and right) in the SOM network is shown inFigure 2
(2) Setting the Learning-Rate Factor 120572(119905) Since 120572(119905) is amonotonically decreasing function of 119905 the range of 120572(119905) is02 lt 120572(119905) lt 1 in the initial ordered stage of the SOM trainingprocess and 0 lt 120572(119905) lt 02 in the convergent stage of theSOM training process Then 120572(119905) can be set to
120572 (119905) =
exp (minus119905 minus 1
119905) 119905 = 1 2 3 1000
02 sdot exp (minus119905 minus 1
119905) 119905 gt 1000
(9)
33 VM Anomaly Recognition Based on SOM Model Themodeling method of VM status based on SOM is describedsufficiently in the previous section In this section we willdescribe the recognition of an anomaly using the trainedSOM network After several rounds of fitting iterations theSOM network can be used to effectively discover the normalstate of virtual machines The normal state is represented byneurons with weight vectors in the SOM network In otherwords a neuron associated with weight vectors in the SOMnetwork can be used to describe whether a class of similarvirtual machines is normal
In order to check whether the current state of a VMis an anomaly on a Cloud platform we can compare thecurrent running performance of virtual machines with theneurons with weight vectors in the SOM network In thispaper Euclidean distance is used to determine similarity Ifthe current state is similar to one of the neurons with weightvectors (assuming that the probability of anomaly is less thana given threshold 119905) the virtual machine will be identified tobe normal otherwise it will be considered to be abnormal
Let VM119909represent a virtual machine on a Cloud plat-
form The corresponding SOM network of VM119909is defined
as SOM(VM119909) The weight vector of each neuron can be
represented as 119882119878
119894119895 after the training iterations have finished
The currently measured performance value of VM119909is 119878119878
and the abnormal state of VM119909is VmStatus(VM
119909) Then the
W11 W12 W13 W14 W15
W21 W22 W23 W24 W25
W31 W32 C W34 W35
W41 W42 W43 W44 W45
W51 W52 W53 W54 W55
W1Nmiddot middot middot
middot middot middot
⋱
WNNWN1
Figure 2 Nearest neighborhood of neuron node 119862
abnormal state of the virtual machine can be determined bythe following equation
anomaly (VM119909)
=
true min 10038171003817100381710038171003817119878119878 minus 119882
119878
119894119895
10038171003817100381710038171003817| 119894 119895 = 1 2 3 119873 ge 120575
false min 10038171003817100381710038171003817119878119878 minus 119882
41 Experimental Environment and Setup In this paper theexperimental Cloud platform is built on an open sourceCloud platform OpenStack [40 41] The operating systemCentOS 63 is installed on the physical servers for the runningvirtual machines on which the hypervisor Xen-32 [42 43] isinstalledTheoperating systemCentOS 63 is also installed onthe physical servers for running the Cloud management pro-gram on which the Cloud management components Open-Stack are installed 100 virtualmachineswere deployed on thisCloud platform
The performance metrics of the virtual machines in thisexperiment were collected by tools such as libxenstat andlibvirt [44 45] For the fault injection method we used toolsto simulate system failures memory leak CPU Hog andnetwork Hog [46ndash48]
42 Experimental Program and Results
421 First Set of Experiments The impact of the SOMnetwork training neighborhood width and learning-ratefactor values on the performance of the anomaly detection
6 Mathematical Problems in Engineering
Table 1 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp96313 times 13 asymp97918 times 18 asymp97520 times 20 asymp97724 times 24 asymp978
Table 2The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
05 dsn asymp97804 dsn asymp93103 dsn asymp90402 dsn asymp89101 dsn asymp737Dsn indicates the diameter of the SOM network
Table 3 The impact of the initial value of the learning-rate factoron the accuracy of SOM
mechanism of the SOM-based dynamic adaptive virtualmachine was evaluated
Training Stage Firstly several virtual machines were selectedfrom 100 virtual machines One fault was then randomlyselected (a memory leak CPU Hog or network Hog) andthen injected 1000 virtualmachine systemperformancemea-surements were collected as training samples for the modeltraining during 10 rounds (one second per round) on the 100virtual machines
Anomaly Detection Stage In order to simulate an anomalyin the objects under detection one of the three faults wasrandomly injected in the 100 virtualmachinesThe anomaliesin each of the 100 virtual machines were then detected basedon the trained model The detection results were recorded
Several sets of experimental resultswith different parame-ter valueswere obtained It should be noted that the same faultwas injected in each experiment to exclude unnecessaryvariables
The experimental results are shown in Tables 1 2 and 3As can be seen from Table 1 there is no obvious change in
accuracy using the proposed detection method for differentSOM network sizes which means that the proposed anomalydetection method is not affected by the size of the SOMnetwork
Table 4 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp95813 times 13 asymp97118 times 18 asymp96720 times 20 asymp96924 times 24 asymp971
Table 5The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
As can be seen from Table 2 the size of the initial trainedneighborhood has a significant impact on the detection accu-racy The main reason is that if the training size is too smallit may cause a metastable state in the training process andfurther training iterations are required to achieve real steadystate
As can be seen from Table 3 as the initial value of thelearning-rate factor decreases the accuracy of the abnormal-ity detection significantly decreases The reason is that if theinitial value of the learning-rate factor is too small the contri-bution of each training sample in the SOM network trainingis small too Thus the fitting ability of the SOM network todetect an object is not sufficient which leads to poor qualityof model training hence decreasing the accuracy of the SOMnetwork-based anomaly detection
Analysis of the first set of experiments shows that betteranomaly detection results can be obtained in DA SOMwhenthe parameters are set as follows SOM network size = 13 times
13 initial size of training neighborhood = 05 dsn and initialvalue of learning-rate factor = 1
The above experiments have been carried out on thetraining data set To further demonstrate the effectivenessof the proposed algorithm the algorithm is tested on theuntrained anomaly set (disk Hog)
The experimental results about disk Hog are shown inTables 4 5 and 6
Mathematical Problems in Engineering 7
00 10
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
(a) Memory leak
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90False positive rate ()
Sens
itivi
ty (
)
DA_SOM
0
k-NNk-M
(b) CPU Hog
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
0
(c) Network Hog
Figure 3 Comparison of the three anomaly detection algorithms DA SOM 119896-NN and 119896-M
As can be seen from Tables 4 5 and 6 the accuracy of theproposed algorithm still has better accuracy in the untraineddata set The impact of three parameters (som net size train-ing neighborhood width and learning-rate factor) on theaccuracy is similar with the previous experiments
422 Second Set of Experiments The objective of thisset of experiments was to evaluate the effect of the VManomaly detection mechanism based on SOM (representedby DA SOM in the following sections) In order to comparethis with other approaches we use two typical unsupervisedanomaly detection techniques in the experiments (1) 119896-nearest neighbor based anomaly detection technique (called119896-NN) where prior training of the anomaly identification
model is not required (2) cluster-based anomaly detectiontechnique (called 119896-M) where training of the anomaly iden-tification model is required in advance
Several experiments for different techniques and differentparameters with the same aforementioned configuration andexperimental procedure are applied to obtain the correspond-ing results It should be noted that since the training processis not required for the 119896-NN technique it started directlyin the abnormality detection stage In addition to ensurecomparability the training process of the clustering-basedmethod is the same as the proposed method where ananomaly detectionmodel is built for 100 virtualmachines andthe training data set is the same as training SOM Experimen-tal results are shown in Figure 3
8 Mathematical Problems in Engineering
Figure 3 shows that compared to the other two injectedfailures the sensitivities of the three techniques to memoryleak failure are relatively low The main reason is that ananomaly does not immediately appear on the failed objectwhen there is fault introduced by a memory leak It takessome time for this fault to accumulate to eventually cause anobvious abnormality The consequence of this is that detec-tion systems tend tomistake these objects with an anomaly asnormal In contrast faults caused by CPU Hog and networkHog events will immediately lead to an abnormal statewithin the fault object thusminimizingmisjudgments whichenhances the sensitivity of all three anomaly detection tech-niques as shown in Figures 3(b) and 3(c)
Meanwhile as shown in each subgraph of Figure 3compared with the other two anomaly detection techniquesDA SOMmaintains a better balance between sensitivity andfalse alarm rate In other words with the same false alarmrate the sensitivity of DA SOM is better than that of the othertwo approaches showing a strong performance in improvingwarning effect and reducing the false alarm rate
Moreover the computational complexity of DA SOM ismuch lower than that of the 119896-NN in anomaly detection stagewhile the computational complexity of DA SOM is equiva-lent to the 119896-M technique Their complexity is constant withthe detected object size and with the parameter 119896 in the 119896-Mtechnique Meanwhile during the model training stage thetraining cost of 119896-M is higher than that of DA SOM for thesame size of training dataThemain reason is that iteration isrequired in 119896-M on the entire training data set (ie the clustercenters need to be updated and the training data set needs tobe reclassified according to the updated center point) whilethere is only one classification operation for each trainingsample in DA SOM
5 Conclusion
Ananomaly detection algorithmbased on SOMfor theCloudplatform with large-scale virtual machines is proposed Thevirtual machines are partitioned initially according to theirsimilarity and then based on the results of initial partitionthe SOM is modeled The proposed method has a hightraining speed which is not possible in traditional methodswhen there are a large number of virtual machines We alsooptimized the two main parameters in the SOM networkmodeling process which highly improved this process Theproposedmethod is verified on an incremental SOManomalydetection modelThe results showed strong improvements indetection accuracy and speed using the proposed anomalydetection method
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work of this paper is supported by National NaturalScience Foundation of China (Grants no 61272399 and no61572090) and Research Fund for the Doctoral Program ofHigher Education of China (Grant no 20110191110038)
References
[1] J Li Y Cui and Y Ma ldquoModeling message queueing serviceswith reliability guarantee in cloud computing environmentusing colored petri netsrdquoMathematical Problems in Engineeringvol 2015 Article ID 383846 20 pages 2015
[2] MA Rodriguez-Garcia R Valencia-Garcia F Garcia-Sanchezand J J Samper-Zapater ldquoOntology-based annotation andretrieval of services in the cloudrdquoKnowledge-Based Systems vol56 pp 15ndash25 2014
[3] C-C Chang C-Y Sun and T-F Cheng ldquoA dependable storageservice system in cloud environmentrdquo Security and Commu-nication Networks vol 8 no 4 pp 574ndash588 2015
[4] W He and L Xu ldquoA state-of-the-art survey of cloud manufac-turingrdquo International Journal of Computer Integrated Manufac-turing vol 28 no 3 pp 239ndash250 2015
[5] A F Barsoum and M Anwar Hasan ldquoProvable multicopydynamic data possession in cloud computing systemsrdquo IEEETransactions on Information Forensics and Security vol 10 no 3pp 485ndash497 2015
[6] J Subirats and J Guitart ldquoAssessing and forecasting energyefficiency on Cloud computing platformsrdquo Future GenerationComputer Systems vol 45 pp 70ndash94 2015
[7] S Ding S Yang Y Zhang C Liang and C Xia ldquoCombiningQoS prediction and customer satisfaction estimation to solvecloud service trustworthiness evaluation problemsrdquoKnowledge-Based Systems vol 56 pp 216ndash225 2014
[8] I C Paschalidis and Y Chen ldquoStatistical anomaly detectionwith sensor networksrdquo ACM Transactions on Sensor Networksvol 7 no 2 article 17 2010
[9] M GhasemiGol and A Ghaemi-Bafghi ldquoE-correlator anentropy-based alert correlation systemrdquo Security and Commu-nication Networks vol 8 no 5 pp 822ndash836 2015
[10] A K Jain and R C Dubes Algorithms for Clustering DataPrentice Hall Englewood Cliffs NJ USA 1988
[11] M Kourki and M A Riahi ldquoSeismic facies analysis from pre-stack data using self-organizing mapsrdquo Journal of Geophysicsand Engineering vol 11 no 6 Article ID 065005 2014
[12] L Feng and S LiQuan ldquoEnhanced dynamic self-organizingmaps for data clusterrdquo Information Technology Journal vol 26no 1 pp 70ndash81 2009
[13] Z Zhou S Chen M Lin G Wang and Q Yang ldquoMinimizingaverage startup latency of VMs by clustering-based templatecaching scheme in an IaaS systemrdquo International Journal of u-and e- Service Science and Technology vol 6 no 6 pp 145ndash1582013
[14] L Jing M K Ng and J Z Huang ldquoAn entropy weighting k-means algorithm for subspace clustering of high-dimensionalsparse datardquo IEEE Transactions on Knowledge and Data Engi-neering vol 19 no 8 pp 1026ndash1041 2007
[15] R Smith A Bivens M Embrechts C Palagiri and B Szy-manski ldquoClustering approaches for anomaly based intrusiondetectionrdquo in Proceedings of Intelligent Engineering Systemsthrough Artificial Neural Networks pp 579ndash584 2002
[16] Y Sani A Mohamedou K Ali A Farjamfar M Azman and SShamsuddin ldquoAn overview of neural networks use in anomalyintrusion detection systemsrdquo in Proceedings of the IEEE StudentConference on Research andDevelopment (SCOReD rsquo09) pp 89ndash92 Serdang Malaysia November 2009
[17] G P Zhang ldquoNeural networks for classification a surveyrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 30 no 4 pp 451ndash462 2000
Mathematical Problems in Engineering 9
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999
as the neighborhood training function in this paper which isdefined as follows
119867(119894119895)
119862= 120572 (119905) sdot exp(minus
1003817100381710038171003817119897119862
minus (119894 119895)10038171003817100381710038172
21205902 (119905)) (4)
In (4) 120572(119905) represents the learning-rate factor which deter-mines the fitting ability of the SOM network for the trainingsample 119878119878
119905in the training process 120590(119905) represents the width
of the neighborhood that determines the range of influence ofa single training sample 119878119878
119905on the SOM network According
to SOM related theory to ensure convergence of the trainingprocess 120572(119905) and 120590(119905) should be both monotonically decreas-ing functions of the number of training iterations 119905
Step 3 (SOM network training based on training sample119878119878119905) The training neighborhood has been defined in Step 2
The neurons which are within the training domain of theSOM network are trained based on the training sample 119878119878
119905
according to (5) The fitting equation is defined as follows
After the training process is completed using (5) theconvergence of the training process needs to be verifiedThe process is convergent if every neuron associated with itsweight vector in a SOM network is stabilized The method isdescribed in detail below
Assume that there is a neuron 119899119894119895in the SOMnetwork and
the time index of its latest training sample is 119905(119894119895)
119897 Meanwhile
assume that there is a sufficiently small real number 120576 and thatconvergence of the training process of the SOM network canbe checked using the following
In (6) 119889(119905) represents the average deviation between thelatest fitting state and the previous value for every neuronwith a weight vector in the SOMnetwork after 119905 training sam-ples are used in a training process Obviously when 119889(119905) lt 120576the neurons119882
119894119895with a weight vector are stabilized indicating
that the iterative training process can be stopped When119889(119905) gt 120576 further collection of the training samples is requiredand Steps 2 and 3 need to be repeated
32 Parameter Setting in the SOM-Based Modeling ProcessThe SOM network modeling process is an iterative fittingprocess that mainly consists of two stages the initial orderedstage and the convergence stage There are two importantparameters in the training neighborhood function 119867
(119894119895)
119862 the
width of the training neighborhood120590(119905) and the learning-ratefactor 120572(119905) Correct setting of these two parameters plays animportant role in preventing the SOMnetwork training fromgetting trapped in ametastable stateThe processes for settingthese two parameters are as follows
(1) Setting theWidth of the Training Neighborhood 120590(119905) Basedon the principle of SOM 120590(119905) is a monotonically decreasing
function of 119905 At the beginning of the training process thevalue of 120590(119905) should be set properly so that the radius of theneighborhood defined by 119867
(119894119895)
119862can reach at least half the
diameter of the SOM network [39] In this paper the value isset to 1198732
Since 119867(119894119895)
119862is a monotonically decreasing function of
119897119862
minus (119894 119895) it can be seen from (4) that when other variablesremain unchanged the 119867
(119894119895)
119862value is small if the neuronal
node is distant from the training center Additionally if 119867(119894119895)
119862
is smaller it has a lower influence on the neuronal node 119899119894119895in
the fitting process When the value of 119867(119894119895)
119862is small enough
the neuron node 119899119894119895is unaffectedTherefore although there is
no clear boundary for the training neighborhood defined bythe Gaussian function in this paper the influential range of asingle training sample 119878119878
119905on the training of the SOMnetwork
can still be limitedAssume that 119891 is a sufficiently small threshold of 119867
(119894119895)
119862
When 119867(119894119895)
119862lt 119891 the current iteration step has no influence
on neuronal node 119899119894119895 while when 119867
(119894119895)
119862gt 119891 the current
iteration step will influence 119899119894119895
Therefore when 119905 = 1 at the beginning of the SOMnetwork training process the lower bound of 120590(119905) can bedetermined based on the threshold 119891 and (4) The detailedderivation process is shown as follows
When 119905 = 1 assume that 119897119862
minus(119894 119895) = 1198732 and the lowerbound of 120590(1) is then determined by the following inequalityderivation process
120572 (1) sdot exp(minus(1198732)
2
21205902 (1)) ge 119891 997904rArr
ln120572 (1) minus(1198732)
2
21205902 (1)ge ln119891 997904rArr
(1198732)2
21205902 (1)le ln
119891
120572 (1)997904rArr
1205902
(1) ge1198732
8 sdot ln (119891120572 (1))997904rArr
120590 (1) le minus119873
2radicln (119891120572 (1))2
120590 (1) ge119873
2radicln (119891120572 (1))2
∵ 120590 (1) gt 0
there4 997904rArr 120590 (1) ge119873
2radicln (119891120572 (1))2
(7)
Based on this derivation the lower bound of 120590(1) can bedetermined by (7) where the threshold119891 = 005 in this paperThe following discussion will describe the value of 120572(1) used
Mathematical Problems in Engineering 5
for setting 120572(119905) According to (7) 120590(119905) in 119867(119894119895)
119862of the initial
ordered stage can be defined as follows
120590 (119905) =119873
2radicln (119891120572 (1))2
sdot exp(minus119905 minus 1
119905)
119905 = 1 2 3 1000
(8)
When the iteration of the SOM network training isgradually converging the size of the training neighborhooddefined by119867
(119894119895)
119862should be constant and can cover the nearest
neighborhood of the training center 119862 in the SOM networkIn this paper the nearest neighborhood that is the nearestfour neurons around neuron 119862 in all four directions (updown left and right) in the SOM network is shown inFigure 2
(2) Setting the Learning-Rate Factor 120572(119905) Since 120572(119905) is amonotonically decreasing function of 119905 the range of 120572(119905) is02 lt 120572(119905) lt 1 in the initial ordered stage of the SOM trainingprocess and 0 lt 120572(119905) lt 02 in the convergent stage of theSOM training process Then 120572(119905) can be set to
120572 (119905) =
exp (minus119905 minus 1
119905) 119905 = 1 2 3 1000
02 sdot exp (minus119905 minus 1
119905) 119905 gt 1000
(9)
33 VM Anomaly Recognition Based on SOM Model Themodeling method of VM status based on SOM is describedsufficiently in the previous section In this section we willdescribe the recognition of an anomaly using the trainedSOM network After several rounds of fitting iterations theSOM network can be used to effectively discover the normalstate of virtual machines The normal state is represented byneurons with weight vectors in the SOM network In otherwords a neuron associated with weight vectors in the SOMnetwork can be used to describe whether a class of similarvirtual machines is normal
In order to check whether the current state of a VMis an anomaly on a Cloud platform we can compare thecurrent running performance of virtual machines with theneurons with weight vectors in the SOM network In thispaper Euclidean distance is used to determine similarity Ifthe current state is similar to one of the neurons with weightvectors (assuming that the probability of anomaly is less thana given threshold 119905) the virtual machine will be identified tobe normal otherwise it will be considered to be abnormal
Let VM119909represent a virtual machine on a Cloud plat-
form The corresponding SOM network of VM119909is defined
as SOM(VM119909) The weight vector of each neuron can be
represented as 119882119878
119894119895 after the training iterations have finished
The currently measured performance value of VM119909is 119878119878
and the abnormal state of VM119909is VmStatus(VM
119909) Then the
W11 W12 W13 W14 W15
W21 W22 W23 W24 W25
W31 W32 C W34 W35
W41 W42 W43 W44 W45
W51 W52 W53 W54 W55
W1Nmiddot middot middot
middot middot middot
⋱
WNNWN1
Figure 2 Nearest neighborhood of neuron node 119862
abnormal state of the virtual machine can be determined bythe following equation
anomaly (VM119909)
=
true min 10038171003817100381710038171003817119878119878 minus 119882
119878
119894119895
10038171003817100381710038171003817| 119894 119895 = 1 2 3 119873 ge 120575
false min 10038171003817100381710038171003817119878119878 minus 119882
41 Experimental Environment and Setup In this paper theexperimental Cloud platform is built on an open sourceCloud platform OpenStack [40 41] The operating systemCentOS 63 is installed on the physical servers for the runningvirtual machines on which the hypervisor Xen-32 [42 43] isinstalledTheoperating systemCentOS 63 is also installed onthe physical servers for running the Cloud management pro-gram on which the Cloud management components Open-Stack are installed 100 virtualmachineswere deployed on thisCloud platform
The performance metrics of the virtual machines in thisexperiment were collected by tools such as libxenstat andlibvirt [44 45] For the fault injection method we used toolsto simulate system failures memory leak CPU Hog andnetwork Hog [46ndash48]
42 Experimental Program and Results
421 First Set of Experiments The impact of the SOMnetwork training neighborhood width and learning-ratefactor values on the performance of the anomaly detection
6 Mathematical Problems in Engineering
Table 1 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp96313 times 13 asymp97918 times 18 asymp97520 times 20 asymp97724 times 24 asymp978
Table 2The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
05 dsn asymp97804 dsn asymp93103 dsn asymp90402 dsn asymp89101 dsn asymp737Dsn indicates the diameter of the SOM network
Table 3 The impact of the initial value of the learning-rate factoron the accuracy of SOM
mechanism of the SOM-based dynamic adaptive virtualmachine was evaluated
Training Stage Firstly several virtual machines were selectedfrom 100 virtual machines One fault was then randomlyselected (a memory leak CPU Hog or network Hog) andthen injected 1000 virtualmachine systemperformancemea-surements were collected as training samples for the modeltraining during 10 rounds (one second per round) on the 100virtual machines
Anomaly Detection Stage In order to simulate an anomalyin the objects under detection one of the three faults wasrandomly injected in the 100 virtualmachinesThe anomaliesin each of the 100 virtual machines were then detected basedon the trained model The detection results were recorded
Several sets of experimental resultswith different parame-ter valueswere obtained It should be noted that the same faultwas injected in each experiment to exclude unnecessaryvariables
The experimental results are shown in Tables 1 2 and 3As can be seen from Table 1 there is no obvious change in
accuracy using the proposed detection method for differentSOM network sizes which means that the proposed anomalydetection method is not affected by the size of the SOMnetwork
Table 4 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp95813 times 13 asymp97118 times 18 asymp96720 times 20 asymp96924 times 24 asymp971
Table 5The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
As can be seen from Table 2 the size of the initial trainedneighborhood has a significant impact on the detection accu-racy The main reason is that if the training size is too smallit may cause a metastable state in the training process andfurther training iterations are required to achieve real steadystate
As can be seen from Table 3 as the initial value of thelearning-rate factor decreases the accuracy of the abnormal-ity detection significantly decreases The reason is that if theinitial value of the learning-rate factor is too small the contri-bution of each training sample in the SOM network trainingis small too Thus the fitting ability of the SOM network todetect an object is not sufficient which leads to poor qualityof model training hence decreasing the accuracy of the SOMnetwork-based anomaly detection
Analysis of the first set of experiments shows that betteranomaly detection results can be obtained in DA SOMwhenthe parameters are set as follows SOM network size = 13 times
13 initial size of training neighborhood = 05 dsn and initialvalue of learning-rate factor = 1
The above experiments have been carried out on thetraining data set To further demonstrate the effectivenessof the proposed algorithm the algorithm is tested on theuntrained anomaly set (disk Hog)
The experimental results about disk Hog are shown inTables 4 5 and 6
Mathematical Problems in Engineering 7
00 10
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
(a) Memory leak
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90False positive rate ()
Sens
itivi
ty (
)
DA_SOM
0
k-NNk-M
(b) CPU Hog
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
0
(c) Network Hog
Figure 3 Comparison of the three anomaly detection algorithms DA SOM 119896-NN and 119896-M
As can be seen from Tables 4 5 and 6 the accuracy of theproposed algorithm still has better accuracy in the untraineddata set The impact of three parameters (som net size train-ing neighborhood width and learning-rate factor) on theaccuracy is similar with the previous experiments
422 Second Set of Experiments The objective of thisset of experiments was to evaluate the effect of the VManomaly detection mechanism based on SOM (representedby DA SOM in the following sections) In order to comparethis with other approaches we use two typical unsupervisedanomaly detection techniques in the experiments (1) 119896-nearest neighbor based anomaly detection technique (called119896-NN) where prior training of the anomaly identification
model is not required (2) cluster-based anomaly detectiontechnique (called 119896-M) where training of the anomaly iden-tification model is required in advance
Several experiments for different techniques and differentparameters with the same aforementioned configuration andexperimental procedure are applied to obtain the correspond-ing results It should be noted that since the training processis not required for the 119896-NN technique it started directlyin the abnormality detection stage In addition to ensurecomparability the training process of the clustering-basedmethod is the same as the proposed method where ananomaly detectionmodel is built for 100 virtualmachines andthe training data set is the same as training SOM Experimen-tal results are shown in Figure 3
8 Mathematical Problems in Engineering
Figure 3 shows that compared to the other two injectedfailures the sensitivities of the three techniques to memoryleak failure are relatively low The main reason is that ananomaly does not immediately appear on the failed objectwhen there is fault introduced by a memory leak It takessome time for this fault to accumulate to eventually cause anobvious abnormality The consequence of this is that detec-tion systems tend tomistake these objects with an anomaly asnormal In contrast faults caused by CPU Hog and networkHog events will immediately lead to an abnormal statewithin the fault object thusminimizingmisjudgments whichenhances the sensitivity of all three anomaly detection tech-niques as shown in Figures 3(b) and 3(c)
Meanwhile as shown in each subgraph of Figure 3compared with the other two anomaly detection techniquesDA SOMmaintains a better balance between sensitivity andfalse alarm rate In other words with the same false alarmrate the sensitivity of DA SOM is better than that of the othertwo approaches showing a strong performance in improvingwarning effect and reducing the false alarm rate
Moreover the computational complexity of DA SOM ismuch lower than that of the 119896-NN in anomaly detection stagewhile the computational complexity of DA SOM is equiva-lent to the 119896-M technique Their complexity is constant withthe detected object size and with the parameter 119896 in the 119896-Mtechnique Meanwhile during the model training stage thetraining cost of 119896-M is higher than that of DA SOM for thesame size of training dataThemain reason is that iteration isrequired in 119896-M on the entire training data set (ie the clustercenters need to be updated and the training data set needs tobe reclassified according to the updated center point) whilethere is only one classification operation for each trainingsample in DA SOM
5 Conclusion
Ananomaly detection algorithmbased on SOMfor theCloudplatform with large-scale virtual machines is proposed Thevirtual machines are partitioned initially according to theirsimilarity and then based on the results of initial partitionthe SOM is modeled The proposed method has a hightraining speed which is not possible in traditional methodswhen there are a large number of virtual machines We alsooptimized the two main parameters in the SOM networkmodeling process which highly improved this process Theproposedmethod is verified on an incremental SOManomalydetection modelThe results showed strong improvements indetection accuracy and speed using the proposed anomalydetection method
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work of this paper is supported by National NaturalScience Foundation of China (Grants no 61272399 and no61572090) and Research Fund for the Doctoral Program ofHigher Education of China (Grant no 20110191110038)
References
[1] J Li Y Cui and Y Ma ldquoModeling message queueing serviceswith reliability guarantee in cloud computing environmentusing colored petri netsrdquoMathematical Problems in Engineeringvol 2015 Article ID 383846 20 pages 2015
[2] MA Rodriguez-Garcia R Valencia-Garcia F Garcia-Sanchezand J J Samper-Zapater ldquoOntology-based annotation andretrieval of services in the cloudrdquoKnowledge-Based Systems vol56 pp 15ndash25 2014
[3] C-C Chang C-Y Sun and T-F Cheng ldquoA dependable storageservice system in cloud environmentrdquo Security and Commu-nication Networks vol 8 no 4 pp 574ndash588 2015
[4] W He and L Xu ldquoA state-of-the-art survey of cloud manufac-turingrdquo International Journal of Computer Integrated Manufac-turing vol 28 no 3 pp 239ndash250 2015
[5] A F Barsoum and M Anwar Hasan ldquoProvable multicopydynamic data possession in cloud computing systemsrdquo IEEETransactions on Information Forensics and Security vol 10 no 3pp 485ndash497 2015
[6] J Subirats and J Guitart ldquoAssessing and forecasting energyefficiency on Cloud computing platformsrdquo Future GenerationComputer Systems vol 45 pp 70ndash94 2015
[7] S Ding S Yang Y Zhang C Liang and C Xia ldquoCombiningQoS prediction and customer satisfaction estimation to solvecloud service trustworthiness evaluation problemsrdquoKnowledge-Based Systems vol 56 pp 216ndash225 2014
[8] I C Paschalidis and Y Chen ldquoStatistical anomaly detectionwith sensor networksrdquo ACM Transactions on Sensor Networksvol 7 no 2 article 17 2010
[9] M GhasemiGol and A Ghaemi-Bafghi ldquoE-correlator anentropy-based alert correlation systemrdquo Security and Commu-nication Networks vol 8 no 5 pp 822ndash836 2015
[10] A K Jain and R C Dubes Algorithms for Clustering DataPrentice Hall Englewood Cliffs NJ USA 1988
[11] M Kourki and M A Riahi ldquoSeismic facies analysis from pre-stack data using self-organizing mapsrdquo Journal of Geophysicsand Engineering vol 11 no 6 Article ID 065005 2014
[12] L Feng and S LiQuan ldquoEnhanced dynamic self-organizingmaps for data clusterrdquo Information Technology Journal vol 26no 1 pp 70ndash81 2009
[13] Z Zhou S Chen M Lin G Wang and Q Yang ldquoMinimizingaverage startup latency of VMs by clustering-based templatecaching scheme in an IaaS systemrdquo International Journal of u-and e- Service Science and Technology vol 6 no 6 pp 145ndash1582013
[14] L Jing M K Ng and J Z Huang ldquoAn entropy weighting k-means algorithm for subspace clustering of high-dimensionalsparse datardquo IEEE Transactions on Knowledge and Data Engi-neering vol 19 no 8 pp 1026ndash1041 2007
[15] R Smith A Bivens M Embrechts C Palagiri and B Szy-manski ldquoClustering approaches for anomaly based intrusiondetectionrdquo in Proceedings of Intelligent Engineering Systemsthrough Artificial Neural Networks pp 579ndash584 2002
[16] Y Sani A Mohamedou K Ali A Farjamfar M Azman and SShamsuddin ldquoAn overview of neural networks use in anomalyintrusion detection systemsrdquo in Proceedings of the IEEE StudentConference on Research andDevelopment (SCOReD rsquo09) pp 89ndash92 Serdang Malaysia November 2009
[17] G P Zhang ldquoNeural networks for classification a surveyrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 30 no 4 pp 451ndash462 2000
Mathematical Problems in Engineering 9
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999
for setting 120572(119905) According to (7) 120590(119905) in 119867(119894119895)
119862of the initial
ordered stage can be defined as follows
120590 (119905) =119873
2radicln (119891120572 (1))2
sdot exp(minus119905 minus 1
119905)
119905 = 1 2 3 1000
(8)
When the iteration of the SOM network training isgradually converging the size of the training neighborhooddefined by119867
(119894119895)
119862should be constant and can cover the nearest
neighborhood of the training center 119862 in the SOM networkIn this paper the nearest neighborhood that is the nearestfour neurons around neuron 119862 in all four directions (updown left and right) in the SOM network is shown inFigure 2
(2) Setting the Learning-Rate Factor 120572(119905) Since 120572(119905) is amonotonically decreasing function of 119905 the range of 120572(119905) is02 lt 120572(119905) lt 1 in the initial ordered stage of the SOM trainingprocess and 0 lt 120572(119905) lt 02 in the convergent stage of theSOM training process Then 120572(119905) can be set to
120572 (119905) =
exp (minus119905 minus 1
119905) 119905 = 1 2 3 1000
02 sdot exp (minus119905 minus 1
119905) 119905 gt 1000
(9)
33 VM Anomaly Recognition Based on SOM Model Themodeling method of VM status based on SOM is describedsufficiently in the previous section In this section we willdescribe the recognition of an anomaly using the trainedSOM network After several rounds of fitting iterations theSOM network can be used to effectively discover the normalstate of virtual machines The normal state is represented byneurons with weight vectors in the SOM network In otherwords a neuron associated with weight vectors in the SOMnetwork can be used to describe whether a class of similarvirtual machines is normal
In order to check whether the current state of a VMis an anomaly on a Cloud platform we can compare thecurrent running performance of virtual machines with theneurons with weight vectors in the SOM network In thispaper Euclidean distance is used to determine similarity Ifthe current state is similar to one of the neurons with weightvectors (assuming that the probability of anomaly is less thana given threshold 119905) the virtual machine will be identified tobe normal otherwise it will be considered to be abnormal
Let VM119909represent a virtual machine on a Cloud plat-
form The corresponding SOM network of VM119909is defined
as SOM(VM119909) The weight vector of each neuron can be
represented as 119882119878
119894119895 after the training iterations have finished
The currently measured performance value of VM119909is 119878119878
and the abnormal state of VM119909is VmStatus(VM
119909) Then the
W11 W12 W13 W14 W15
W21 W22 W23 W24 W25
W31 W32 C W34 W35
W41 W42 W43 W44 W45
W51 W52 W53 W54 W55
W1Nmiddot middot middot
middot middot middot
⋱
WNNWN1
Figure 2 Nearest neighborhood of neuron node 119862
abnormal state of the virtual machine can be determined bythe following equation
anomaly (VM119909)
=
true min 10038171003817100381710038171003817119878119878 minus 119882
119878
119894119895
10038171003817100381710038171003817| 119894 119895 = 1 2 3 119873 ge 120575
false min 10038171003817100381710038171003817119878119878 minus 119882
41 Experimental Environment and Setup In this paper theexperimental Cloud platform is built on an open sourceCloud platform OpenStack [40 41] The operating systemCentOS 63 is installed on the physical servers for the runningvirtual machines on which the hypervisor Xen-32 [42 43] isinstalledTheoperating systemCentOS 63 is also installed onthe physical servers for running the Cloud management pro-gram on which the Cloud management components Open-Stack are installed 100 virtualmachineswere deployed on thisCloud platform
The performance metrics of the virtual machines in thisexperiment were collected by tools such as libxenstat andlibvirt [44 45] For the fault injection method we used toolsto simulate system failures memory leak CPU Hog andnetwork Hog [46ndash48]
42 Experimental Program and Results
421 First Set of Experiments The impact of the SOMnetwork training neighborhood width and learning-ratefactor values on the performance of the anomaly detection
6 Mathematical Problems in Engineering
Table 1 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp96313 times 13 asymp97918 times 18 asymp97520 times 20 asymp97724 times 24 asymp978
Table 2The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
05 dsn asymp97804 dsn asymp93103 dsn asymp90402 dsn asymp89101 dsn asymp737Dsn indicates the diameter of the SOM network
Table 3 The impact of the initial value of the learning-rate factoron the accuracy of SOM
mechanism of the SOM-based dynamic adaptive virtualmachine was evaluated
Training Stage Firstly several virtual machines were selectedfrom 100 virtual machines One fault was then randomlyselected (a memory leak CPU Hog or network Hog) andthen injected 1000 virtualmachine systemperformancemea-surements were collected as training samples for the modeltraining during 10 rounds (one second per round) on the 100virtual machines
Anomaly Detection Stage In order to simulate an anomalyin the objects under detection one of the three faults wasrandomly injected in the 100 virtualmachinesThe anomaliesin each of the 100 virtual machines were then detected basedon the trained model The detection results were recorded
Several sets of experimental resultswith different parame-ter valueswere obtained It should be noted that the same faultwas injected in each experiment to exclude unnecessaryvariables
The experimental results are shown in Tables 1 2 and 3As can be seen from Table 1 there is no obvious change in
accuracy using the proposed detection method for differentSOM network sizes which means that the proposed anomalydetection method is not affected by the size of the SOMnetwork
Table 4 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp95813 times 13 asymp97118 times 18 asymp96720 times 20 asymp96924 times 24 asymp971
Table 5The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
As can be seen from Table 2 the size of the initial trainedneighborhood has a significant impact on the detection accu-racy The main reason is that if the training size is too smallit may cause a metastable state in the training process andfurther training iterations are required to achieve real steadystate
As can be seen from Table 3 as the initial value of thelearning-rate factor decreases the accuracy of the abnormal-ity detection significantly decreases The reason is that if theinitial value of the learning-rate factor is too small the contri-bution of each training sample in the SOM network trainingis small too Thus the fitting ability of the SOM network todetect an object is not sufficient which leads to poor qualityof model training hence decreasing the accuracy of the SOMnetwork-based anomaly detection
Analysis of the first set of experiments shows that betteranomaly detection results can be obtained in DA SOMwhenthe parameters are set as follows SOM network size = 13 times
13 initial size of training neighborhood = 05 dsn and initialvalue of learning-rate factor = 1
The above experiments have been carried out on thetraining data set To further demonstrate the effectivenessof the proposed algorithm the algorithm is tested on theuntrained anomaly set (disk Hog)
The experimental results about disk Hog are shown inTables 4 5 and 6
Mathematical Problems in Engineering 7
00 10
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
(a) Memory leak
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90False positive rate ()
Sens
itivi
ty (
)
DA_SOM
0
k-NNk-M
(b) CPU Hog
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
0
(c) Network Hog
Figure 3 Comparison of the three anomaly detection algorithms DA SOM 119896-NN and 119896-M
As can be seen from Tables 4 5 and 6 the accuracy of theproposed algorithm still has better accuracy in the untraineddata set The impact of three parameters (som net size train-ing neighborhood width and learning-rate factor) on theaccuracy is similar with the previous experiments
422 Second Set of Experiments The objective of thisset of experiments was to evaluate the effect of the VManomaly detection mechanism based on SOM (representedby DA SOM in the following sections) In order to comparethis with other approaches we use two typical unsupervisedanomaly detection techniques in the experiments (1) 119896-nearest neighbor based anomaly detection technique (called119896-NN) where prior training of the anomaly identification
model is not required (2) cluster-based anomaly detectiontechnique (called 119896-M) where training of the anomaly iden-tification model is required in advance
Several experiments for different techniques and differentparameters with the same aforementioned configuration andexperimental procedure are applied to obtain the correspond-ing results It should be noted that since the training processis not required for the 119896-NN technique it started directlyin the abnormality detection stage In addition to ensurecomparability the training process of the clustering-basedmethod is the same as the proposed method where ananomaly detectionmodel is built for 100 virtualmachines andthe training data set is the same as training SOM Experimen-tal results are shown in Figure 3
8 Mathematical Problems in Engineering
Figure 3 shows that compared to the other two injectedfailures the sensitivities of the three techniques to memoryleak failure are relatively low The main reason is that ananomaly does not immediately appear on the failed objectwhen there is fault introduced by a memory leak It takessome time for this fault to accumulate to eventually cause anobvious abnormality The consequence of this is that detec-tion systems tend tomistake these objects with an anomaly asnormal In contrast faults caused by CPU Hog and networkHog events will immediately lead to an abnormal statewithin the fault object thusminimizingmisjudgments whichenhances the sensitivity of all three anomaly detection tech-niques as shown in Figures 3(b) and 3(c)
Meanwhile as shown in each subgraph of Figure 3compared with the other two anomaly detection techniquesDA SOMmaintains a better balance between sensitivity andfalse alarm rate In other words with the same false alarmrate the sensitivity of DA SOM is better than that of the othertwo approaches showing a strong performance in improvingwarning effect and reducing the false alarm rate
Moreover the computational complexity of DA SOM ismuch lower than that of the 119896-NN in anomaly detection stagewhile the computational complexity of DA SOM is equiva-lent to the 119896-M technique Their complexity is constant withthe detected object size and with the parameter 119896 in the 119896-Mtechnique Meanwhile during the model training stage thetraining cost of 119896-M is higher than that of DA SOM for thesame size of training dataThemain reason is that iteration isrequired in 119896-M on the entire training data set (ie the clustercenters need to be updated and the training data set needs tobe reclassified according to the updated center point) whilethere is only one classification operation for each trainingsample in DA SOM
5 Conclusion
Ananomaly detection algorithmbased on SOMfor theCloudplatform with large-scale virtual machines is proposed Thevirtual machines are partitioned initially according to theirsimilarity and then based on the results of initial partitionthe SOM is modeled The proposed method has a hightraining speed which is not possible in traditional methodswhen there are a large number of virtual machines We alsooptimized the two main parameters in the SOM networkmodeling process which highly improved this process Theproposedmethod is verified on an incremental SOManomalydetection modelThe results showed strong improvements indetection accuracy and speed using the proposed anomalydetection method
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work of this paper is supported by National NaturalScience Foundation of China (Grants no 61272399 and no61572090) and Research Fund for the Doctoral Program ofHigher Education of China (Grant no 20110191110038)
References
[1] J Li Y Cui and Y Ma ldquoModeling message queueing serviceswith reliability guarantee in cloud computing environmentusing colored petri netsrdquoMathematical Problems in Engineeringvol 2015 Article ID 383846 20 pages 2015
[2] MA Rodriguez-Garcia R Valencia-Garcia F Garcia-Sanchezand J J Samper-Zapater ldquoOntology-based annotation andretrieval of services in the cloudrdquoKnowledge-Based Systems vol56 pp 15ndash25 2014
[3] C-C Chang C-Y Sun and T-F Cheng ldquoA dependable storageservice system in cloud environmentrdquo Security and Commu-nication Networks vol 8 no 4 pp 574ndash588 2015
[4] W He and L Xu ldquoA state-of-the-art survey of cloud manufac-turingrdquo International Journal of Computer Integrated Manufac-turing vol 28 no 3 pp 239ndash250 2015
[5] A F Barsoum and M Anwar Hasan ldquoProvable multicopydynamic data possession in cloud computing systemsrdquo IEEETransactions on Information Forensics and Security vol 10 no 3pp 485ndash497 2015
[6] J Subirats and J Guitart ldquoAssessing and forecasting energyefficiency on Cloud computing platformsrdquo Future GenerationComputer Systems vol 45 pp 70ndash94 2015
[7] S Ding S Yang Y Zhang C Liang and C Xia ldquoCombiningQoS prediction and customer satisfaction estimation to solvecloud service trustworthiness evaluation problemsrdquoKnowledge-Based Systems vol 56 pp 216ndash225 2014
[8] I C Paschalidis and Y Chen ldquoStatistical anomaly detectionwith sensor networksrdquo ACM Transactions on Sensor Networksvol 7 no 2 article 17 2010
[9] M GhasemiGol and A Ghaemi-Bafghi ldquoE-correlator anentropy-based alert correlation systemrdquo Security and Commu-nication Networks vol 8 no 5 pp 822ndash836 2015
[10] A K Jain and R C Dubes Algorithms for Clustering DataPrentice Hall Englewood Cliffs NJ USA 1988
[11] M Kourki and M A Riahi ldquoSeismic facies analysis from pre-stack data using self-organizing mapsrdquo Journal of Geophysicsand Engineering vol 11 no 6 Article ID 065005 2014
[12] L Feng and S LiQuan ldquoEnhanced dynamic self-organizingmaps for data clusterrdquo Information Technology Journal vol 26no 1 pp 70ndash81 2009
[13] Z Zhou S Chen M Lin G Wang and Q Yang ldquoMinimizingaverage startup latency of VMs by clustering-based templatecaching scheme in an IaaS systemrdquo International Journal of u-and e- Service Science and Technology vol 6 no 6 pp 145ndash1582013
[14] L Jing M K Ng and J Z Huang ldquoAn entropy weighting k-means algorithm for subspace clustering of high-dimensionalsparse datardquo IEEE Transactions on Knowledge and Data Engi-neering vol 19 no 8 pp 1026ndash1041 2007
[15] R Smith A Bivens M Embrechts C Palagiri and B Szy-manski ldquoClustering approaches for anomaly based intrusiondetectionrdquo in Proceedings of Intelligent Engineering Systemsthrough Artificial Neural Networks pp 579ndash584 2002
[16] Y Sani A Mohamedou K Ali A Farjamfar M Azman and SShamsuddin ldquoAn overview of neural networks use in anomalyintrusion detection systemsrdquo in Proceedings of the IEEE StudentConference on Research andDevelopment (SCOReD rsquo09) pp 89ndash92 Serdang Malaysia November 2009
[17] G P Zhang ldquoNeural networks for classification a surveyrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 30 no 4 pp 451ndash462 2000
Mathematical Problems in Engineering 9
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999
mechanism of the SOM-based dynamic adaptive virtualmachine was evaluated
Training Stage Firstly several virtual machines were selectedfrom 100 virtual machines One fault was then randomlyselected (a memory leak CPU Hog or network Hog) andthen injected 1000 virtualmachine systemperformancemea-surements were collected as training samples for the modeltraining during 10 rounds (one second per round) on the 100virtual machines
Anomaly Detection Stage In order to simulate an anomalyin the objects under detection one of the three faults wasrandomly injected in the 100 virtualmachinesThe anomaliesin each of the 100 virtual machines were then detected basedon the trained model The detection results were recorded
Several sets of experimental resultswith different parame-ter valueswere obtained It should be noted that the same faultwas injected in each experiment to exclude unnecessaryvariables
The experimental results are shown in Tables 1 2 and 3As can be seen from Table 1 there is no obvious change in
accuracy using the proposed detection method for differentSOM network sizes which means that the proposed anomalydetection method is not affected by the size of the SOMnetwork
Table 4 The impact of SOM net size on the detection accuracy
Size of SOM net Accuracy rate ()8 times 8 asymp95813 times 13 asymp97118 times 18 asymp96720 times 20 asymp96924 times 24 asymp971
Table 5The impact of the initial training neighborhood size on theaccuracy of SOM
The initial width for thetraining neighborhood Detection accuracy
As can be seen from Table 2 the size of the initial trainedneighborhood has a significant impact on the detection accu-racy The main reason is that if the training size is too smallit may cause a metastable state in the training process andfurther training iterations are required to achieve real steadystate
As can be seen from Table 3 as the initial value of thelearning-rate factor decreases the accuracy of the abnormal-ity detection significantly decreases The reason is that if theinitial value of the learning-rate factor is too small the contri-bution of each training sample in the SOM network trainingis small too Thus the fitting ability of the SOM network todetect an object is not sufficient which leads to poor qualityof model training hence decreasing the accuracy of the SOMnetwork-based anomaly detection
Analysis of the first set of experiments shows that betteranomaly detection results can be obtained in DA SOMwhenthe parameters are set as follows SOM network size = 13 times
13 initial size of training neighborhood = 05 dsn and initialvalue of learning-rate factor = 1
The above experiments have been carried out on thetraining data set To further demonstrate the effectivenessof the proposed algorithm the algorithm is tested on theuntrained anomaly set (disk Hog)
The experimental results about disk Hog are shown inTables 4 5 and 6
Mathematical Problems in Engineering 7
00 10
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
(a) Memory leak
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90False positive rate ()
Sens
itivi
ty (
)
DA_SOM
0
k-NNk-M
(b) CPU Hog
010
10
20
20
30
40
50
60
70
80
90
100
30 40 50 60 70 80 90 100False positive rate ()
Sens
itivi
ty (
)
DA_SOMk-NNk-M
0
(c) Network Hog
Figure 3 Comparison of the three anomaly detection algorithms DA SOM 119896-NN and 119896-M
As can be seen from Tables 4 5 and 6 the accuracy of theproposed algorithm still has better accuracy in the untraineddata set The impact of three parameters (som net size train-ing neighborhood width and learning-rate factor) on theaccuracy is similar with the previous experiments
422 Second Set of Experiments The objective of thisset of experiments was to evaluate the effect of the VManomaly detection mechanism based on SOM (representedby DA SOM in the following sections) In order to comparethis with other approaches we use two typical unsupervisedanomaly detection techniques in the experiments (1) 119896-nearest neighbor based anomaly detection technique (called119896-NN) where prior training of the anomaly identification
model is not required (2) cluster-based anomaly detectiontechnique (called 119896-M) where training of the anomaly iden-tification model is required in advance
Several experiments for different techniques and differentparameters with the same aforementioned configuration andexperimental procedure are applied to obtain the correspond-ing results It should be noted that since the training processis not required for the 119896-NN technique it started directlyin the abnormality detection stage In addition to ensurecomparability the training process of the clustering-basedmethod is the same as the proposed method where ananomaly detectionmodel is built for 100 virtualmachines andthe training data set is the same as training SOM Experimen-tal results are shown in Figure 3
8 Mathematical Problems in Engineering
Figure 3 shows that compared to the other two injectedfailures the sensitivities of the three techniques to memoryleak failure are relatively low The main reason is that ananomaly does not immediately appear on the failed objectwhen there is fault introduced by a memory leak It takessome time for this fault to accumulate to eventually cause anobvious abnormality The consequence of this is that detec-tion systems tend tomistake these objects with an anomaly asnormal In contrast faults caused by CPU Hog and networkHog events will immediately lead to an abnormal statewithin the fault object thusminimizingmisjudgments whichenhances the sensitivity of all three anomaly detection tech-niques as shown in Figures 3(b) and 3(c)
Meanwhile as shown in each subgraph of Figure 3compared with the other two anomaly detection techniquesDA SOMmaintains a better balance between sensitivity andfalse alarm rate In other words with the same false alarmrate the sensitivity of DA SOM is better than that of the othertwo approaches showing a strong performance in improvingwarning effect and reducing the false alarm rate
Moreover the computational complexity of DA SOM ismuch lower than that of the 119896-NN in anomaly detection stagewhile the computational complexity of DA SOM is equiva-lent to the 119896-M technique Their complexity is constant withthe detected object size and with the parameter 119896 in the 119896-Mtechnique Meanwhile during the model training stage thetraining cost of 119896-M is higher than that of DA SOM for thesame size of training dataThemain reason is that iteration isrequired in 119896-M on the entire training data set (ie the clustercenters need to be updated and the training data set needs tobe reclassified according to the updated center point) whilethere is only one classification operation for each trainingsample in DA SOM
5 Conclusion
Ananomaly detection algorithmbased on SOMfor theCloudplatform with large-scale virtual machines is proposed Thevirtual machines are partitioned initially according to theirsimilarity and then based on the results of initial partitionthe SOM is modeled The proposed method has a hightraining speed which is not possible in traditional methodswhen there are a large number of virtual machines We alsooptimized the two main parameters in the SOM networkmodeling process which highly improved this process Theproposedmethod is verified on an incremental SOManomalydetection modelThe results showed strong improvements indetection accuracy and speed using the proposed anomalydetection method
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work of this paper is supported by National NaturalScience Foundation of China (Grants no 61272399 and no61572090) and Research Fund for the Doctoral Program ofHigher Education of China (Grant no 20110191110038)
References
[1] J Li Y Cui and Y Ma ldquoModeling message queueing serviceswith reliability guarantee in cloud computing environmentusing colored petri netsrdquoMathematical Problems in Engineeringvol 2015 Article ID 383846 20 pages 2015
[2] MA Rodriguez-Garcia R Valencia-Garcia F Garcia-Sanchezand J J Samper-Zapater ldquoOntology-based annotation andretrieval of services in the cloudrdquoKnowledge-Based Systems vol56 pp 15ndash25 2014
[3] C-C Chang C-Y Sun and T-F Cheng ldquoA dependable storageservice system in cloud environmentrdquo Security and Commu-nication Networks vol 8 no 4 pp 574ndash588 2015
[4] W He and L Xu ldquoA state-of-the-art survey of cloud manufac-turingrdquo International Journal of Computer Integrated Manufac-turing vol 28 no 3 pp 239ndash250 2015
[5] A F Barsoum and M Anwar Hasan ldquoProvable multicopydynamic data possession in cloud computing systemsrdquo IEEETransactions on Information Forensics and Security vol 10 no 3pp 485ndash497 2015
[6] J Subirats and J Guitart ldquoAssessing and forecasting energyefficiency on Cloud computing platformsrdquo Future GenerationComputer Systems vol 45 pp 70ndash94 2015
[7] S Ding S Yang Y Zhang C Liang and C Xia ldquoCombiningQoS prediction and customer satisfaction estimation to solvecloud service trustworthiness evaluation problemsrdquoKnowledge-Based Systems vol 56 pp 216ndash225 2014
[8] I C Paschalidis and Y Chen ldquoStatistical anomaly detectionwith sensor networksrdquo ACM Transactions on Sensor Networksvol 7 no 2 article 17 2010
[9] M GhasemiGol and A Ghaemi-Bafghi ldquoE-correlator anentropy-based alert correlation systemrdquo Security and Commu-nication Networks vol 8 no 5 pp 822ndash836 2015
[10] A K Jain and R C Dubes Algorithms for Clustering DataPrentice Hall Englewood Cliffs NJ USA 1988
[11] M Kourki and M A Riahi ldquoSeismic facies analysis from pre-stack data using self-organizing mapsrdquo Journal of Geophysicsand Engineering vol 11 no 6 Article ID 065005 2014
[12] L Feng and S LiQuan ldquoEnhanced dynamic self-organizingmaps for data clusterrdquo Information Technology Journal vol 26no 1 pp 70ndash81 2009
[13] Z Zhou S Chen M Lin G Wang and Q Yang ldquoMinimizingaverage startup latency of VMs by clustering-based templatecaching scheme in an IaaS systemrdquo International Journal of u-and e- Service Science and Technology vol 6 no 6 pp 145ndash1582013
[14] L Jing M K Ng and J Z Huang ldquoAn entropy weighting k-means algorithm for subspace clustering of high-dimensionalsparse datardquo IEEE Transactions on Knowledge and Data Engi-neering vol 19 no 8 pp 1026ndash1041 2007
[15] R Smith A Bivens M Embrechts C Palagiri and B Szy-manski ldquoClustering approaches for anomaly based intrusiondetectionrdquo in Proceedings of Intelligent Engineering Systemsthrough Artificial Neural Networks pp 579ndash584 2002
[16] Y Sani A Mohamedou K Ali A Farjamfar M Azman and SShamsuddin ldquoAn overview of neural networks use in anomalyintrusion detection systemsrdquo in Proceedings of the IEEE StudentConference on Research andDevelopment (SCOReD rsquo09) pp 89ndash92 Serdang Malaysia November 2009
[17] G P Zhang ldquoNeural networks for classification a surveyrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 30 no 4 pp 451ndash462 2000
Mathematical Problems in Engineering 9
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999
Figure 3 Comparison of the three anomaly detection algorithms DA SOM 119896-NN and 119896-M
As can be seen from Tables 4 5 and 6 the accuracy of theproposed algorithm still has better accuracy in the untraineddata set The impact of three parameters (som net size train-ing neighborhood width and learning-rate factor) on theaccuracy is similar with the previous experiments
422 Second Set of Experiments The objective of thisset of experiments was to evaluate the effect of the VManomaly detection mechanism based on SOM (representedby DA SOM in the following sections) In order to comparethis with other approaches we use two typical unsupervisedanomaly detection techniques in the experiments (1) 119896-nearest neighbor based anomaly detection technique (called119896-NN) where prior training of the anomaly identification
model is not required (2) cluster-based anomaly detectiontechnique (called 119896-M) where training of the anomaly iden-tification model is required in advance
Several experiments for different techniques and differentparameters with the same aforementioned configuration andexperimental procedure are applied to obtain the correspond-ing results It should be noted that since the training processis not required for the 119896-NN technique it started directlyin the abnormality detection stage In addition to ensurecomparability the training process of the clustering-basedmethod is the same as the proposed method where ananomaly detectionmodel is built for 100 virtualmachines andthe training data set is the same as training SOM Experimen-tal results are shown in Figure 3
8 Mathematical Problems in Engineering
Figure 3 shows that compared to the other two injectedfailures the sensitivities of the three techniques to memoryleak failure are relatively low The main reason is that ananomaly does not immediately appear on the failed objectwhen there is fault introduced by a memory leak It takessome time for this fault to accumulate to eventually cause anobvious abnormality The consequence of this is that detec-tion systems tend tomistake these objects with an anomaly asnormal In contrast faults caused by CPU Hog and networkHog events will immediately lead to an abnormal statewithin the fault object thusminimizingmisjudgments whichenhances the sensitivity of all three anomaly detection tech-niques as shown in Figures 3(b) and 3(c)
Meanwhile as shown in each subgraph of Figure 3compared with the other two anomaly detection techniquesDA SOMmaintains a better balance between sensitivity andfalse alarm rate In other words with the same false alarmrate the sensitivity of DA SOM is better than that of the othertwo approaches showing a strong performance in improvingwarning effect and reducing the false alarm rate
Moreover the computational complexity of DA SOM ismuch lower than that of the 119896-NN in anomaly detection stagewhile the computational complexity of DA SOM is equiva-lent to the 119896-M technique Their complexity is constant withthe detected object size and with the parameter 119896 in the 119896-Mtechnique Meanwhile during the model training stage thetraining cost of 119896-M is higher than that of DA SOM for thesame size of training dataThemain reason is that iteration isrequired in 119896-M on the entire training data set (ie the clustercenters need to be updated and the training data set needs tobe reclassified according to the updated center point) whilethere is only one classification operation for each trainingsample in DA SOM
5 Conclusion
Ananomaly detection algorithmbased on SOMfor theCloudplatform with large-scale virtual machines is proposed Thevirtual machines are partitioned initially according to theirsimilarity and then based on the results of initial partitionthe SOM is modeled The proposed method has a hightraining speed which is not possible in traditional methodswhen there are a large number of virtual machines We alsooptimized the two main parameters in the SOM networkmodeling process which highly improved this process Theproposedmethod is verified on an incremental SOManomalydetection modelThe results showed strong improvements indetection accuracy and speed using the proposed anomalydetection method
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work of this paper is supported by National NaturalScience Foundation of China (Grants no 61272399 and no61572090) and Research Fund for the Doctoral Program ofHigher Education of China (Grant no 20110191110038)
References
[1] J Li Y Cui and Y Ma ldquoModeling message queueing serviceswith reliability guarantee in cloud computing environmentusing colored petri netsrdquoMathematical Problems in Engineeringvol 2015 Article ID 383846 20 pages 2015
[2] MA Rodriguez-Garcia R Valencia-Garcia F Garcia-Sanchezand J J Samper-Zapater ldquoOntology-based annotation andretrieval of services in the cloudrdquoKnowledge-Based Systems vol56 pp 15ndash25 2014
[3] C-C Chang C-Y Sun and T-F Cheng ldquoA dependable storageservice system in cloud environmentrdquo Security and Commu-nication Networks vol 8 no 4 pp 574ndash588 2015
[4] W He and L Xu ldquoA state-of-the-art survey of cloud manufac-turingrdquo International Journal of Computer Integrated Manufac-turing vol 28 no 3 pp 239ndash250 2015
[5] A F Barsoum and M Anwar Hasan ldquoProvable multicopydynamic data possession in cloud computing systemsrdquo IEEETransactions on Information Forensics and Security vol 10 no 3pp 485ndash497 2015
[6] J Subirats and J Guitart ldquoAssessing and forecasting energyefficiency on Cloud computing platformsrdquo Future GenerationComputer Systems vol 45 pp 70ndash94 2015
[7] S Ding S Yang Y Zhang C Liang and C Xia ldquoCombiningQoS prediction and customer satisfaction estimation to solvecloud service trustworthiness evaluation problemsrdquoKnowledge-Based Systems vol 56 pp 216ndash225 2014
[8] I C Paschalidis and Y Chen ldquoStatistical anomaly detectionwith sensor networksrdquo ACM Transactions on Sensor Networksvol 7 no 2 article 17 2010
[9] M GhasemiGol and A Ghaemi-Bafghi ldquoE-correlator anentropy-based alert correlation systemrdquo Security and Commu-nication Networks vol 8 no 5 pp 822ndash836 2015
[10] A K Jain and R C Dubes Algorithms for Clustering DataPrentice Hall Englewood Cliffs NJ USA 1988
[11] M Kourki and M A Riahi ldquoSeismic facies analysis from pre-stack data using self-organizing mapsrdquo Journal of Geophysicsand Engineering vol 11 no 6 Article ID 065005 2014
[12] L Feng and S LiQuan ldquoEnhanced dynamic self-organizingmaps for data clusterrdquo Information Technology Journal vol 26no 1 pp 70ndash81 2009
[13] Z Zhou S Chen M Lin G Wang and Q Yang ldquoMinimizingaverage startup latency of VMs by clustering-based templatecaching scheme in an IaaS systemrdquo International Journal of u-and e- Service Science and Technology vol 6 no 6 pp 145ndash1582013
[14] L Jing M K Ng and J Z Huang ldquoAn entropy weighting k-means algorithm for subspace clustering of high-dimensionalsparse datardquo IEEE Transactions on Knowledge and Data Engi-neering vol 19 no 8 pp 1026ndash1041 2007
[15] R Smith A Bivens M Embrechts C Palagiri and B Szy-manski ldquoClustering approaches for anomaly based intrusiondetectionrdquo in Proceedings of Intelligent Engineering Systemsthrough Artificial Neural Networks pp 579ndash584 2002
[16] Y Sani A Mohamedou K Ali A Farjamfar M Azman and SShamsuddin ldquoAn overview of neural networks use in anomalyintrusion detection systemsrdquo in Proceedings of the IEEE StudentConference on Research andDevelopment (SCOReD rsquo09) pp 89ndash92 Serdang Malaysia November 2009
[17] G P Zhang ldquoNeural networks for classification a surveyrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 30 no 4 pp 451ndash462 2000
Mathematical Problems in Engineering 9
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999
Figure 3 shows that compared to the other two injectedfailures the sensitivities of the three techniques to memoryleak failure are relatively low The main reason is that ananomaly does not immediately appear on the failed objectwhen there is fault introduced by a memory leak It takessome time for this fault to accumulate to eventually cause anobvious abnormality The consequence of this is that detec-tion systems tend tomistake these objects with an anomaly asnormal In contrast faults caused by CPU Hog and networkHog events will immediately lead to an abnormal statewithin the fault object thusminimizingmisjudgments whichenhances the sensitivity of all three anomaly detection tech-niques as shown in Figures 3(b) and 3(c)
Meanwhile as shown in each subgraph of Figure 3compared with the other two anomaly detection techniquesDA SOMmaintains a better balance between sensitivity andfalse alarm rate In other words with the same false alarmrate the sensitivity of DA SOM is better than that of the othertwo approaches showing a strong performance in improvingwarning effect and reducing the false alarm rate
Moreover the computational complexity of DA SOM ismuch lower than that of the 119896-NN in anomaly detection stagewhile the computational complexity of DA SOM is equiva-lent to the 119896-M technique Their complexity is constant withthe detected object size and with the parameter 119896 in the 119896-Mtechnique Meanwhile during the model training stage thetraining cost of 119896-M is higher than that of DA SOM for thesame size of training dataThemain reason is that iteration isrequired in 119896-M on the entire training data set (ie the clustercenters need to be updated and the training data set needs tobe reclassified according to the updated center point) whilethere is only one classification operation for each trainingsample in DA SOM
5 Conclusion
Ananomaly detection algorithmbased on SOMfor theCloudplatform with large-scale virtual machines is proposed Thevirtual machines are partitioned initially according to theirsimilarity and then based on the results of initial partitionthe SOM is modeled The proposed method has a hightraining speed which is not possible in traditional methodswhen there are a large number of virtual machines We alsooptimized the two main parameters in the SOM networkmodeling process which highly improved this process Theproposedmethod is verified on an incremental SOManomalydetection modelThe results showed strong improvements indetection accuracy and speed using the proposed anomalydetection method
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work of this paper is supported by National NaturalScience Foundation of China (Grants no 61272399 and no61572090) and Research Fund for the Doctoral Program ofHigher Education of China (Grant no 20110191110038)
References
[1] J Li Y Cui and Y Ma ldquoModeling message queueing serviceswith reliability guarantee in cloud computing environmentusing colored petri netsrdquoMathematical Problems in Engineeringvol 2015 Article ID 383846 20 pages 2015
[2] MA Rodriguez-Garcia R Valencia-Garcia F Garcia-Sanchezand J J Samper-Zapater ldquoOntology-based annotation andretrieval of services in the cloudrdquoKnowledge-Based Systems vol56 pp 15ndash25 2014
[3] C-C Chang C-Y Sun and T-F Cheng ldquoA dependable storageservice system in cloud environmentrdquo Security and Commu-nication Networks vol 8 no 4 pp 574ndash588 2015
[4] W He and L Xu ldquoA state-of-the-art survey of cloud manufac-turingrdquo International Journal of Computer Integrated Manufac-turing vol 28 no 3 pp 239ndash250 2015
[5] A F Barsoum and M Anwar Hasan ldquoProvable multicopydynamic data possession in cloud computing systemsrdquo IEEETransactions on Information Forensics and Security vol 10 no 3pp 485ndash497 2015
[6] J Subirats and J Guitart ldquoAssessing and forecasting energyefficiency on Cloud computing platformsrdquo Future GenerationComputer Systems vol 45 pp 70ndash94 2015
[7] S Ding S Yang Y Zhang C Liang and C Xia ldquoCombiningQoS prediction and customer satisfaction estimation to solvecloud service trustworthiness evaluation problemsrdquoKnowledge-Based Systems vol 56 pp 216ndash225 2014
[8] I C Paschalidis and Y Chen ldquoStatistical anomaly detectionwith sensor networksrdquo ACM Transactions on Sensor Networksvol 7 no 2 article 17 2010
[9] M GhasemiGol and A Ghaemi-Bafghi ldquoE-correlator anentropy-based alert correlation systemrdquo Security and Commu-nication Networks vol 8 no 5 pp 822ndash836 2015
[10] A K Jain and R C Dubes Algorithms for Clustering DataPrentice Hall Englewood Cliffs NJ USA 1988
[11] M Kourki and M A Riahi ldquoSeismic facies analysis from pre-stack data using self-organizing mapsrdquo Journal of Geophysicsand Engineering vol 11 no 6 Article ID 065005 2014
[12] L Feng and S LiQuan ldquoEnhanced dynamic self-organizingmaps for data clusterrdquo Information Technology Journal vol 26no 1 pp 70ndash81 2009
[13] Z Zhou S Chen M Lin G Wang and Q Yang ldquoMinimizingaverage startup latency of VMs by clustering-based templatecaching scheme in an IaaS systemrdquo International Journal of u-and e- Service Science and Technology vol 6 no 6 pp 145ndash1582013
[14] L Jing M K Ng and J Z Huang ldquoAn entropy weighting k-means algorithm for subspace clustering of high-dimensionalsparse datardquo IEEE Transactions on Knowledge and Data Engi-neering vol 19 no 8 pp 1026ndash1041 2007
[15] R Smith A Bivens M Embrechts C Palagiri and B Szy-manski ldquoClustering approaches for anomaly based intrusiondetectionrdquo in Proceedings of Intelligent Engineering Systemsthrough Artificial Neural Networks pp 579ndash584 2002
[16] Y Sani A Mohamedou K Ali A Farjamfar M Azman and SShamsuddin ldquoAn overview of neural networks use in anomalyintrusion detection systemsrdquo in Proceedings of the IEEE StudentConference on Research andDevelopment (SCOReD rsquo09) pp 89ndash92 Serdang Malaysia November 2009
[17] G P Zhang ldquoNeural networks for classification a surveyrdquoIEEE Transactions on Systems Man and Cybernetics Part CApplications and Reviews vol 30 no 4 pp 451ndash462 2000
Mathematical Problems in Engineering 9
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999
[18] W Tylman ldquoAnomaly-based intrusion detection using Bayesiannetworksrdquo in Proceedings of the International Conference onDependability of Computer Systems pp 211ndash218 SzklarskaPoręba Poland 2008
[19] W Pedrycz V Loia and S Senatore ldquoP-FCM a proximity-based fuzzy clusteringrdquo Fuzzy Sets and Systems vol 148 no 1pp 21ndash41 2004
[20] G Ratsch S Mika B Scholkopf and K-R Muller ldquoConstruct-ing boosting algorithms from SVMs an application to one-class classificationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 24 no 9 pp 1184ndash1199 2002
[21] DM J Tax andR PWDuin ldquoSupport vector data descriptionrdquoMachine Learning vol 54 no 1 pp 45ndash66 2004
[22] B Scholkopf R Williamson A J Smola J Shawe-Taylor andJ C Piatt ldquoSupport vector method for novelty detectionrdquo inProceedings of the 13th Annual Neural Information ProcessingSystems Conference (NIPS rsquo99) pp 582ndash588 December 1999
[23] G Wang S Chen Z Zhou and M Lin ldquoA dependablemonitoringmechanism combining static and dynamic anomalydetection for network systemsrdquo International Journal of FutureGeneration Communication and Networking vol 7 no 1 pp 1ndash18 2014
[24] V Chandola A Banerjee and V Kumar ldquoAnomaly detection asurveyrdquo ACM Computing Surveys vol 41 no 3 article 15 2009
[25] P N Tan M Steinbach and V Kumar Introduction to DataMining Addison-Wesley Reading Mass USA 2005
[26] R O Duda E P Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2nd edition 2000
[27] D Shin and S Kim ldquoNearest mean classification via one-classSVMrdquo in Proceedings of the International Joint Conference onComputational Sciences and Optimization (CSO rsquo09) pp 593ndash596 Sanya China April 2009
[28] T-S Li and C-L Huang ldquoDefect spatial pattern recognitionusing a hybrid SOM-SVM approach in semiconductor manu-facturingrdquo Expert Systems with Applications vol 36 no 1 pp374ndash385 2009
[29] B Scholkopf J C Platt J Shawe-Taylor A J Smola and RC Williamson ldquoEstimating the support of a high-dimensionaldistributionrdquo Neural Computation vol 13 no 7 pp 1443ndash14712001
[30] C J C Burges ldquoA tutorial on support vector machines forpattern recognitionrdquo Data Mining and Knowledge Discoveryvol 2 no 2 pp 121ndash167 1998
[31] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge Mass USA 2000
[32] B Liu Y Xiao Y Zhang and Z Hao ldquoAn efficient approachto boost support vector data descriptionrdquo in Proceedings of the2012 International Conference on Cybernetics and Informaticsvol 163 of Lecture Notes in Electrical Engineering pp 2231ndash2238Springer New York NY USA 2014
[33] A K Jain and R C Dubes Algorithms for Clustering DataPrentice-Hall New York NY USA 1988
[34] L Ertoz M Steinbach and V Kumar ldquoFinding topics incollections of documents a shared nearest neighbor approachrdquoinClustering and Information Retrieval vol 11 pp 83ndash103 2003
[35] S Guha R Rastogi and K Shim ldquoRock a robust clusteringalgorithm for categorical attributesrdquo Information Systems vol25 no 5 pp 345ndash366 2000
[36] M Ester H P Kriegel J Sander and X Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases
with noiserdquo in Proceedings of 2nd International Conference onKnowledge Discovery and DataMining E Simoudis J Han andU Fayyad Eds pp 226ndash231 AAAI Press Portland Ore USAAugust 1996
[37] M M Breuniq H-P Kriegel R T Ng and J Sander ldquoLOFidentifying density-based local outliersrdquoProceedings of theACMSIGMOD International Conference onManagement of Data vol29 no 2 pp 93ndash104 2000
[38] N Ye and Q Chen ldquoAn anomaly detection technique based ona chi-square statistic for detecting intrusions into informationsystemsrdquo Quality and Reliability Engineering International vol17 no 2 pp 105ndash112 2001
[39] T Kohonen Self-Organizing Maps Springer New York NYUSA 1997
[40] J M Alcaraz Calero and J G Aguado ldquoMonPaaS an adap-tive monitoring platformas a service for cloud computinginfrastructures and servicesrdquo IEEE Transactions on ServicesComputing vol 8 no 1 pp 65ndash78 2015
[41] D Milojicic I M Llorente and R S Montero ldquoOpenNebula acloud management toolrdquo IEEE Internet Computing vol 15 no2 pp 11ndash14 2011
[42] H Jin H Qin S Wu and X Guo ldquoCCAP a cache contention-aware virtual machine placement approach for HPC cloudrdquoInternational Journal of Parallel Programming vol 43 no 3 pp403ndash420 2015
[43] B Egger E Gustafsson C Jo and J Son ldquoEfficiently restoringvirtual machinesrdquo International Journal of Parallel Program-ming vol 43 no 3 pp 421ndash439 2015
[44] Y Cho J Choi J Choi and M Lee ldquoTowards an integratedmanagement system based on abstraction of heterogeneousvirtual resourcesrdquo Cluster Computing vol 17 no 4 pp 1215ndash1223 2014
[45] J Li Y Jia L Liu and T Wo ldquoCyberLiveApp a secure sharingandmigration approach for live virtual desktop applications in acloud environmentrdquo Future Generation Computer Systems vol29 no 1 pp 330ndash340 2013
[46] Z Xu J Zhang and Z Xu ldquoMelton a practical and precisememory leak detection tool for C programsrdquo Frontiers ofComputer Science vol 9 no 1 pp 34ndash54 2015
[47] P Dollar CWojek B Schiele and P Perona ldquoPedestrian detec-tion an evaluation of the state of the artrdquo IEEE Transactions onPatternAnalysis andMachine Intelligence vol 34 no 4 pp 743ndash761 2012
[48] Y-J Chiu and T Berger ldquoA software-only videocodec usingpixelwise conditional differential replenishment and perceptualenhancementsrdquo IEEE Transactions on Circuits and Systems forVideo Technology vol 9 no 3 pp 438ndash450 1999