Using DBSCAN Clustering Algorithm in Detecting DDoS Attack...alarms (FA) , detecting this attack in real time and making use of pattern in the train stage to increase detection ratio.

Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015

4141

Using DBSCAN Clustering Algorithm in Detecting DDoS Attack

Safaa O. Al-Mamory

Assistant Professor, college of Information Technology, University of Babylon [email protected]

Zahraa Mohammed Ali

Department of computer science, University of Kufa

[email protected]

Abstract Distributed Denial of Service (DDoS) attack, has become one of the major threats to the Internet. It

makes a victim to deny providing normal services in the Internet by generate huge useless packets by a

large number of agents and can easily exhaust the computing and communication resources of a victim .In

this paper we develop method to detect DDoS attacks accurately and proactively. This can be achieved

using entropy concept to measure abnormal change in traffic according to the phases of the attack , and

then these traffics are clustered using DBSCAN algorithm. The patterns for DDoS traffic is created based

on extracted centroid points from each cluster, which are used in testing phase using Distances-based

classification . This system is characterized processing and analyzing of high-speed network traffic (based

on entropy approach ), discovering and accurately identifying new types of DDoS attack to reduce the false

alarms (FA) , detecting this attack in real time and making use of pattern in the train stage to increase

detection ratio.

Keywords : DDoS , Proactive detection , Clustering , DBSCAN

1.Introduction Distributed denial of service (DDoS) attack was first seen in early 1998

(CERT,1998). In February 2000, a number of the World’s largest e-commerce sites

included Yahoo.com, Amazon.com, Excite, E*Trade, eBay, CNN.com, Buy.com, and

ZDNet were brought offline for days by this kind of attack, even though they were

designed to offer high availability. The outages had caused a huge economic loss to both

the victim sites and their users (Wan, 2001).

The overarching aim of this paper is to develop method to detect DDoS attacks

accurately and proactively . This can be achieved using entropy concept to measure

abnormal change in traffic according to the phases of the attack , and then this traffic is

clustered using DBSCAN algorithm, and the pattern for DDoS traffic is created based on

the output cluster set. This system is characterized processing and analyzing of high-

mailto:[email protected]

mailto:[email protected]


4141

speed network traffic (based on entropy approach), discovering and accurately

identifying new types of DDoS attack to reduce the false alarms (FA), detecting the

intrusion in real time and making use of pattern in the train stage to increase detection

ratio .

In Section 2 describes The related works. In Section3 proposed system is explan .

The experimental results are discussed in Section4.Conclusions are given in Section5.

2.Related Works There have been done lots of researches relevant to DDoS attack . To detect this

attack proactively , Feinstein et al. (2003) presented statistical approaches to identify

DDoS attacks by computing entropy and frequency-sorted distributions of selected packet

attributes. The DDoS attacks show anomalies in the characteristics of the selected packet

attributes. The detection accuracy and performance are analyzed using live traffic traces

from a variety of network environments ranging from points in the core of the Internet to

those inside an edge network. The results indicate that these methods can be effective

against current attacks and suggest directions for improving detection of more stealthy

attacks.

Jin et al. (2004) proposed a covariance analysis model for detecting SYN flooding

attacks. the correlations among the features may provide additional essential information.

In terms of correlation, the normal patterns will be different from the abnormal patterns.

In this sense detecting the correlation changes among different features could determine

the occurrence of the anomalies. A two variables covariance model is presented in this

paper as a possible approach to detecting the DDoS attacks.

Gavrili, et al. (2005) proposed Radial-Basis-Function neural network (RBF-NN) to

recognize DDoS attacks from the normal traffic . RBF-NN detector is a two layer neural

network. It uses nine packet parameters, and the frequencies of these parameters are

estimated. Based on the frequencies, RBF-NN classifies traffic into attack or normal

class.

Lee et al. (2008) proposed a method for proactive detection of DDoS attack by

exploiting its architecture which consists of the selection of handlers and agents, the

communication and compromise, and attack. The features are selected based on the

procedures of DDoS attack. After that, cluster analysis performed for proactive detection

of the attack. The experiment is performed with 2000 DARPA Intrusion Detection

Scenario Specific Data Set in order to evaluate our method. The results show that each

phase of the attack scenario is partitioned well and this method can detect precursors of

DDoS attack as well as the attack itself.

Rahmani et al. (2009) presented entropy-based anomaly detection, using joint

entropy analysis of multiple traffic distributions. That observed the time series of IP-flow

number and aggregate traffic size are strongly statistically dependent. The occurrence of

attack affects this dependence and causes a rupture in time series of joint entropy values.

Xia et al. (2010) presented a method that can identify the occurrence of the DDoS

flood attack and determine its intensity using the fuzzy logic. This process consists of two

stages: (i) statistical analysis of the network traffic time series using discrete wavelet

transform (DWT) and Schwarz information criterion (SIC) to find out the change point of

Hurst parameter resulting from DDoS flood attack, and then (ii) adaptively decide the


4141

intensity of the DDoS flood attack by using the intelligent fuzzy logic technology to

analyze the Hurst parameter and its changing rate.

Zhong et al. (2010) presented a DDoS attack detection model based on data mining

algorithm. FCM cluster algorithm and Apriori association algorithm used to extracts

network traffic model and network packet protocol status model. Apriori association

algorithm is used in mining of packet protocol status. The packet protocol status

appearing frequently in the network could be combined into one association record. The

data collected continuously for a period is used to calculate the packet protocol status

threshold through the FCM cluster algorithm.

Liu et al. (2013) proposed an anomaly detection method for DDoS at-tacks based

on Gini coefficient. First, Gini coefficient is introduced to measure the inequalities of

packet attribution (IP addresses and ports) distributions during attacks. Then, an

improved (Transductive Confidence Machines for K-Nearest Neighbors) TCM-KNN

algorithm is applied to identify attacks by classifying the Gini coefficient samples

extracted from real-time network traffic. Experiment was made on the DDoS attacks

dataset (LLDoS 2.0.2) from MIT Lincoln Laboratory.

Chen et al. (2013) proposed a detection model based on conditional random fields

(CRF). The CRF based model incorporates the signature based and anomaly-based

detection methods to a hybrid system. The selected features include source IP entropy,

destination IP entropy, source port entropy, destination port entropy, protocol number and

etc. The CRF based model combines these IP flow entropies and other fingerprints into a

normalize entropy as the feature vectors to depict the states of the monitoring traffic. The

training method of the detection model uses the L-BFGS algorithm

3.Problem Formulation and Methodology

In order to satisfy early detection of DDoS attack , we employ entropy concept and

cluster analysis. the idea of this research is separate each phase of DDoS attack,

DBSCAN clustering algorithm used in training phase and then the corresponding cluster

centroids (average of each cluster) are used as patterns for efficient distance-based

detection in testing phase . Figure 1 illustrates the proposed system flow chart .


4141

Figure (1) : The general architecture of proposed system.

3.1 Extraction of the detection features

According to DDoS architecture , the DDoS attack is performed by following steps :

(Douligeris C. et al., 2004)

Selection of handlers and agents

Compromise

Communication

Attack From this procedure of a DDoS we can find out traffic parameters which change abnormally in

each step. Lee et al. (Lee et al.,2008) presented nine features based on the analysis of DDoS

attack's characteristics . we will use these features in our method . "In the first step, real attacker sends ICMP Echo Request packets to find handlers and

agents that help attack, which is called IPsweep"(Lee et al.,2008). A lot of ICMP traffics

are generated , therefore the occurrence rate of ICMP packets may be abnormally high

compared to normal traffics. Also In this period, destination IP address in network flow

would be distributed randomly.

Start

data set

Features Extraction for each sample of

consecutive packets

Clustering by DBSCAN

Extract set of centroid

points (mean of each

cluster)

as pattern

distance-based

classification

end

System validation

Testing

Training

Generate a data base from the extracted

features


4141

In second and third steps , a specific traffic type such as ICMP,UDP and TCP SYN

packets can be used for message exchange. Hence ,the occurrence rates of these types of

packets can indicate the preparation for launching a DDoS attack (Zi et al.,2010).

Under DDoS attack , the agents randomly generate the source IP addresses of attack

packets to hide their real addresses. They also randomize the destination and source port

numbers depending on the attack type, therefore this randomize can provide useful

information to detection DDoS attack . In order to measure the degree of divergence , Lee

et al. (Lee et al.,2008) suggest to use the concept of entropy .

Let an information source has n independent symbols each with probability of choice Pi.

Then, the entropy H is defined as follows (Shannon, 1948):

…(1)

Entropy would compute on a sample of consecutive packets. Comparing the value for

entropy of sample with other provides a mechanism for detecting changes in the

randomness (Lee et al.,2008).

In the IPsweep phase, the entropy value of source IP address becomes small and

that of destination IP address increases. In the attack phase, attack packets have diverse

source IP addresses and a target destination IP address. The entropy value of source IP

address increases and that of destination IP address converges to a very small value.

Similarly, the entropy values of source and destination port numbers can be useable for

DDoS detection since some types of DDoS attacks use random port numbers in the

attack. In addition, one DDoS attack may use a specific type of packets, the entropy value

of packet type may be useful. If the entropy value of packet type is very small, it is

possible that some kind of DDoS attack is being launched.

In our experiments, we use the same nine features which were presented in (Lee et

al., 2008). The features are :

Entropy of source IP address and port number.

Entropy of destination IP address and port number.

Entropy of packet type.

Number of packets.

Occurrence rate of packet type (ICMP, UDP, TCP SYN).

3.2 Clustering analysis by DBSCAN (Training phase)

Clustering is method by which the large sets of data are grouped into clusters of

similar data . By using cluster analysis, we can separate normal traffic and each phase of

the DDoS attack into partitioned groups if variables involved to form cluster have

dissimilarities among them. Hence, in this paper, we apply cluster analysis to separate

each phase of the DDoS attack. We first employ a clustering algorithm to partition a

training data set to clusters that represented normal and each phase of DDoS attack then

extracted pattern from these clusters to use it in online detection.

We adopt DBSCAN algorithm for clustering purpose. Density-Based Spatial

Clustering and Application with Noise (DBSCAN) was a clustering algorithm based on

density. It did clustering through growing high density area, and it can find any shape of

clustering. The basic idea of using DBSCAN in DDoS attack detection is that most of the

data is normal traffic while the attack data is very few , and different with normal data .

In training mode , we need to modify DBSCAN algorithm by adding new step that

compute the centroid µ of each cluster as following :


4141

This centroids representing a pattern to detect the DDoS attack phases in online mode .

The modified DBSCAN algorithm steps is shown below :

Algorithm 1 DBSCAN ( D, ε , MinPts)

Input : training data set D , neighbourhood radius ε , density threshold MinPts

Output : labels the data with cluster id (or NOISE) , centriod points set µk

1 Begin

2 label all data x ∈ D as UNCLASSIFIED

3 initialize cluster counter cid =0

4 foreach x ∈ D

5 if x is labelled as UNCLASSIFIED

6 if expand (D , x , cid , ε , MinPts )

7 increment cluster counter cid = cid +1

8 end end

9 end

10 foreach cluster k

11 µk=averge of points assigned to cluster k

12 end

13 return set of µ

14 End

Algorithem expand (D , x , cid , ε , MinPts ) : bool

Input : data set D , x∈ D , currently unused cluster – id cid , neighbourhood radius ε ,

density threshold MinPts

Output : returens true iff a new cluster has been found

1 Begin

2 let S = { y ∈ D | || x - y || ≤ ε} (range query)

3 if not enough data in neighborhood of x (| S | < MinPts )

4 re-lable x as NOISE and return false

5 end

6 foreach x' ∈ S

7 re-lable x' with current cluster-ID cid

8 remove x from S

9 end

10 foreach x' ∈ S

11 T = { y ∈ D | ||x' – y|| ≤ ε } ( range query )

12 if not enough data in neighborhood of x' (| T | ≥ MinPts )

13 foreach y ∈ T

14 if y dose not belong to a cluster(labeled as NOISE or UNCLASSFED)

15 if y is labelled UNCLASSFED : insert y into S

16 re-lable y with cluster- countr cid

17 end

18 end

19 remove x' from S

20 end

21 return true

22 End


4141

3.3 Distances-based detection (Classification phase)

By clustering , we got the centroid points set µk , and from the descriptions of training

data set , we can examine which cluster corresponds to specific phase of DDoS attack .

The distances to the cluster centroids of the corresponding traffic class are calculated

using the Euclidean distance function. An object is classified as normal if it is closer to

the normal cluster centroid or to the anomalous specific phase of DDoS attack. This is

illustrated in Figure 2 with a two-dimensional feature space: Object P is closer to the

normal cluster, therefore P is normal. This distance-based classification allows detecting

known kinds of anomalies, i.e. anomalous traffic with similar characteristics as in the

training datasets.

Figure (2): Distances-based classification for tow centroids .

The distances-based classification algorithm is shown below:

Algorithm 2 distances-based classification ( D, µk)

Input : testing data set D , centriod points set µk with ther class

Output : labels the data with class

1 Begin

2 foreach x ∈ D

3 foreach c ∈ µk

4 compute distances

5 end

6 labels x with same class to center have min distances with x

7 end

8 End


4141

3.4. Performance Metrics

"The performance measures for intrusion detection can be calculated by a confusion

matrix " (Tsai C. F. et al.,2010). . Confusion Matrix: This may be used to summarize the

predictive performance of a classifier on test data. It is commonly in a two-class format,

but can be generated for any number of classes(Mukherjee S. et al.,2012).. A confusion

matrix for two classes is shown in following table :

Table 1 confusion matrix (Tsai C. F. et al.,2010)

Actual Predicted

Normal Attack

Normal TN FP

Attack FN TP

To evaluate the results, we have used standard metrics such detection rate DR , false

alarm FA and accuracy.

A successful anomaly detection algorithm should achieve high DR, high accuracy and

low FP.

4.Experimental Results This research work has implemented in C# language and executed in the processor

Intel(R) Celeron(R) CPU 2.00 GHZ processor and 2.GB main memory under the

Windows 7 Ultimate operating system.

4.2 DARPA 2000 Datasets Description DARPA (Defense Advanced Research Project Agency) 2000 is intrusion detection

evaluation data set . This data set is mainly designed to evaluate the detection probability

and false detection probability for every network security system under test especially in

intrusion detection research field (MITLLab, 2000).

The DARPA 2000 datasets contains multiple specific network attacking scenarios. In this

research, LLDDoS 1.0 scenario will be used to evaluate the proposed model. Figure 4.1

shows the network structure of this data set .


4111

Figure (3) Network structure used in DARPA 2000 dataset

LLDDoS 1.0 scenario from 2000 DARPA data sets includes a DDoS attack run

by a novice attacker . This attack scenario is carried out over multiple network and audit

sessions. These sessions have been grouped into five attack phases. The five phases are as

follows:

1. IPsweep to the DMZ hosts from a remote site.

2. Probe of live IP’s to look for the sadmind daemon running on Solaris hosts.

3. Breaks-in via the sadmind vulnerability, both successful and unsuccessful on

those hosts.

4. Installation of the Trojan mstream DDoS software on three hosts in the DMZ.

5. Launching the DDoS.

This Data Set has two types of Tcpdump file. One is DMZ Tcpdump which is collected at

the sniffer on the DMZ network, the other is inside Tcpdump which is collected at the

sniffer on the inside network. In this attack scenario, the attacker only communicates

with agent hosts in the DMZ network and cannot communicate with the victim host in the

inside network. For this reason, we use the DMZ Tcpdump file to detect the DDoS attack

in early phases. "In phase 5 of the attack, packets collected to DMZ Tcpdump are not the

attack packet but the response packets to the spoofed IP of the attack packets"(Lee K. et

al.,2008).

4.3 Features Extraction

In proposed DDoS detection method , the features are extracted based on analyzing

DDoS procedure . Under this attack the source IP address will be generated randomly ,

they also randomize the destination and source port number depending on the attack type

. In order to measure this randomness entropy would compute on a sample of

consecutive packets. In our experiment , each input variable is calculated in certain time interval which is 1 second .

Firewall

DMZ

network

Firewall Attacker

Inside

network


4114

Figure (4) : Entropy values of a) destination IP b) source IP c) destination port d) source

port e)number of packets f) packet type g)occurrence of TCP SYN h) ICMP i)UDP

ICMP Flooding DDoS Flooding


4111

4.4 Clustering by DBSCAN

The basic idea of DDos detection based on DBSCAN is that most of the data is

normal, and normal data will be gathered together into a high-density cluster, while the

invasion data is few, and different with normal data. The DBSCAN algorithm is based on

the concepts of density reach ability and density-connectivity. These concepts depend on

two input parameters: epsilon (Eps) and minimum number of points (MinPts). Epsilion is

the distance around an object that defines its eps- neighborhood.

It is difficult to determine the accurate values of Eps and MinPts. However, we can

use a probable range for their values by means of experience. We have tried 40 values of

Eps between (0.1 - 0.5), the step is 0.01 . The values of MinPts are 2,3,4,5,6,7,8 and 9.

Through the implementation of the algorithm 320 times with different Eps and

MinPts ,we found that the best accuracy and maximum number of DDoS phase get it

when the Eps = 0.12 , and MinPts = 3 , The results of the experiment are graphically

described in Figure 5.

Figure (5) : : Experiments to determining values of Eps and MinPts.

After determine the best value of Eps and MinPts , We found 10 sets and by

information provided by DARPA 200 dataset about each phase in DDoS attack has

been adopted to evaluate the result of algorithm . Table 2 and 3 show the confusion

matrix for each six classes and binary classes ( normal and attack ), respectively . the

accuracy of training phase is 99.1611% ,detection rate and false alarm are 51.16279%

and 0.362632% , respectively.

Table 2 : confusion matrix for DBSCAN algorithm result for the six classes

Actual Predicted

Normal Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Total

Normal 5770 0 19 2 0 0 5791

Phase 1 0 4 0 0 0 0 4

Phase 2 0 0 13 0 0 0 13

Phase 3 8 0 0 2 0 0 10

Phase 4 8 0 0 0 0 0 8

Phase 5 12 0 0 0 0 3 15

Total 5798 4 32 4 0 3 5841


4111

Table 3 : confusion matrix for DBSCAN algorithm result for normal and attack

classes

Actual Predicted

Normal attack

Normal 5770 21

Attack 28 22

4.5 Classification

By clustering , we got the centroid points set as pattern to detect the attack phases in

DDoS attack and normal . To evaluate the proposed classification method we use 1/3 part

from DARPA 2000 data set .

Table 4 and 5 show the confusion matrix for each six classes and binary classes

(normal and attack), respectively . the accuracy of training phase is 99.24633% ,detection

rate and false alarm are 61.11111% and 0.280224%, respectively.

Table 4 : confusion matrix for classification algorithm result for the six classes

Actual Predicted

Normal Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Total

Normal 2491 0 7 0 0 0 2498

Phase 1 0 7 0 0 0 0 7

Phase 2 0 0 2 0 0 0 2

Phase 3 4 0 0 0 0 0 4

Phase 4 0 0 0 0 0 0 0

Phase 5 8 0 0 0 0 2 10

Total 2503 7 9 0 0 2 2521

Table 5 : confusion matrix for classification algorithm result for normal and attack

classes

Act

ual

Predicted

Normal Attack

Normal 2491 7

Attack 12 11

5.Conclusions

In this paper we present a suitable method for the early detection of DDoS attacks

using the cluster analysis. Many studies on DDoS attack detection have been carried out;

however, they focus only on the change in network traffic. The methods based data

mining are suitable for the detection. Our method first selects nine features of

packet/traffic that are widely found in various phases of the attack. Then, the current

network status is classified to determine the class to which it belongs to. Hence, our

method can classify the network status well to detect DDoS attacks early.


4111

6.References

CERT. (1998, January 5). CERT Advisory CA-1998-01 Smurf IP Denial-of-Service

Attacks. Retrieved January 5, 2014, from http://www.cert.org/advisories/CA-

1998-01.html.

Chen, S. W., Wu, J. X., Ye, X. L., & Tong, G. U. O. (2013). Distributed Denial of

Service Attacks Detection Method Based on Conditional Random Fields. Journal of

Networks, 8(4).

Douligeris, C., & Mitrokotsa, A. (2004). DDoS attacks and defense mechanisms:

classification and state-of-the-art. Computer Networks, 44(5), 643-666.

Feinstein, L., Schnackenberg, D., Balupari, R., & Kindred, D. (2003, April). Statistical

approaches to DDoS attack detection and response. In DARPA Information

Survivability Conference and Exposition, 2003. Proceedings (Vol. 1, pp. 303-314).

Gavrilis, D., & Dermatas, E. (2005). Real-time detection of distributed denial-of-service

attacks using RBF networks and statistical features. Computer Networks, 48(2),

235-245.

Jin, S., & Yeung, D. S. (2004, June). A covariance analysis model for DDoS attack

detection. In Communications, 2004 IEEE International Conference on (Vol. 4, pp.

1882-1886). IEEE.

Lee, K., Kim, J., Kwon, K. H., Han, Y., & Kim, S. (2008). DDoS attack detection method

using cluster analysis. Expert Systems with Applications, 34(3), 1659-1665.

Liu, Y., Jiang, S., & Huang, J. (2013, August). Anomaly Detection for DDoS Attacks

Based on Gini Coefficient. In 2013 International Conference on Advanced ICT and

Education (ICAICTE-13). Atlantis Press.

MITLLab (2000). 2000 darpa intrusion detection scenario specific datasets, Retrieved

July 30, 2012,

http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000data.ht

ml.

Mukherjee, S., & Sharma, N. (2012). Intrusion detection using naive Bayes classifier

with feature reduction. Procedia Technology, 4, 119-128.

Rahmani, H., Sahli, N., & Kammoun, F. (2009, August). Joint entropy analysis model for

DDoS attack detection. In Information Assurance and Security, 2009. IAS'09. Fifth

International Conference on (Vol. 2, pp. 267-271). IEEE.

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical

Journal, vol. 27, pp. 379-423 & 623-656.

Tsai, C. F., & Lin, C. Y. (2010). A triangle area based nearest neighbors approach to

intrusion detection. Pattern Recognition, 43(1), 222-229.

Wan, K. (2001). An infrastructure to defend against distributed denial-of-service attack

(M. Sc. Thesis, The Hong Kong Polytechnic University).

Xia, Z., Lu, S., Li, J., & Tang, J. (2010). Enhancing DDoS flood attack detection via

intelligent fuzzy logic. Informatica (Slovenia), 34(4), 497-507.

Zhong, R., & Yue, G. (2010, April). DDoS detection system based on data mining. In

Proceedings of the Second International Symposium on Networking and Network

Security, Jinggangshan, China (pp. 62-65).

Zi, L., Yearwood, J., & Wu, X. W. (2010, September). Adaptive clustering with feature

ranking for ddos attacks detection. In Network and System Security (NSS), 2010 4th

International Conference on (pp. 281-286). IEEE.

http://www.cert.org/advisories/CA-1998-01.html

http://www.cert.org/advisories/CA-1998-01.html

Using DBSCAN Clustering Algorithm in Detecting DDoS Attack...alarms (FA) , detecting this attack in real time and making use of pattern in the train stage to increase detection ratio.

Documents