powerpoint feb

Post on 14-Feb-2017

192 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

Ensemble of K-Nearest Neighbour Neighbour Classifiers for Intrusion Detection

Presented By

Imran Ahmed Malik

M.Tech CSE Networking Final Year

Sys ID 2014016942

Under the Guidance

of

Mrs. Amrita

Asst. Professor

SHARDA UNIVERSITY, GREATER NOIDA

Contents

• Objective

• Problem Statement

• Proposed system

• Introduction to implemented algorithm.

• Results and Graphs

• Conclusion

Objective

• Can GP based numeric classifier show optimized performance than individual K-NN classifiers?

• Can GP based combination technique produce a higher performance OCC as compared to K-NN component classifiers?

Problem Statement

OPTIMIZATION AND COMBINATION OF KNN CLASSIFIERS USING GENETIC PROGRAMMING FOR INTRUSION DETECTION SYSTEM

Proposed Model

KDD CUP 1999 data set

K-NN Classifiers

Import KDD Dataset

Select Initial K-Nearest Neighbors

OptimizationPossible?

Set GA Parameters

Generate initial random population

Evaluate fitness of each classifier

Parent selection for next generation

Crossover

Is optimization met?

End

YES

No

No

Figure 3 shows the operations of a general genetic algorithm according to which GA is implemented into our system.

GP Based Learning AlgorithmTraining Pseudo Code Stst , St represents the test and training data. C(x): class of x instance OCC: a composite classifier Ck : kth component classifier Ck (x): Prediction of Ck

Train-Composite Classifier (St ,OCC)Step 1: All input data examples x S∈ t are given to K component

classifiers.Step 2: Collect [C1 (x),C2 ( x), ,Ck (x)] for all x S∈ t to form a set of

prediction ClassStep 3: Start GP combining method, while using predictions as unary

function in GP tree. Threshold T is used as a variable to compute ROC curve.

GP Based Learning Algorithm………

Pseudo Code for Classification 1. Apply composite classifier (OCC, x )to data examples x

taken from Stst . 2. X= [C1 (x),C2 ( x), ,Ck (x)], stack the predictions to form new

derived data.3. Compute OCC(x)

Working Of Genetic Programming1. The algorithm begins by creating a random initial population.2. The algorithm then creates a sequence of new populations. At each step, the

algorithm uses the individuals in the current generation to create the next population. To create the new population, the algorithm performs the following steps:

I. Scores each member of the current population by computing its fitness value.II. Scales the raw fitness scores to convert them into a more usable range of values.III. Selects members, called parents, based on their fitness.IV. Some of the individuals in the current population that have lower fitness are

chosen as elite. These elite individuals are passed to the next population.V. Produces children from the parents. Children are produced either by making

random changes to a single parent—mutation—or by combining the vector entries of a pair of parents—crossover.

VI. Replaces the current population with the children to form the next generation.

Dataset And Operations on Dataset

• KDD CUP 1999 dataset• Remove Redundancy• Conversion of values• Normalization• PCA• Final Corrected data

Tools Used

• Genetic Programming Tool Kit

• Windows operating system

• 4 Gb Ram

• I5 processor

• Matlab

RESULTS GRAPHS AND ANALYSIS

Fitness Function

• Records :records must be maximum • Num folds :Number of folds must be minimum• K_value: k should be closer optimal• Time: time must be minimum negative• Model : highest model is preferred• Accuracy: top accurate model is preferred

f=records + num folds + K_value + Time +model + accuracy;

Current Best individual

records

Num-folds

model

time

K-valueaccuracy

GP Stopping Criteria

GP Selection Function

Confusion Matrix For Normal Class

Confusion Matrix For DoS Class

Confusion Matrix For R2L Class

Confusion Matrix For U2R Class

Confusion Matrix For Probe Class

Confusion matrix

• Scatter Plot of Src bytes with Count For Class using KNN

• Scatter Plot of src bytes versus dst host same src port rate for Class using KNN

• Roc Curve• ROC curve for GP based Classifier showing

0.99976 area under the curve

• Classification Results using Ensemble of Classifiers

Conclusion

• Ensemble increase the performance

• It reduces the error rates

• GP based ensembler provides better results then individual classifier

References

• Gianluigi Folino, Giandomenico Spezzano and Clara Pizzuti, Ensemble Techniques for parallel Genetic Programming based classifier

• Michał Woz´niak, Manuel Grana, Emilio Corchado,2014, A survey of multiple classifier systems as hybrid systems, ELSEVIER.

• Urvesh Bhowan, Mark Johnston, Member, IEEE, Mengjie Zhang, Senior Member, IEEE, and Xin Yao, Fellow, IEEE, JUNE 2013, Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 17, NO. 3, JUNE 2013.

• H Nguyen, K Franke, S Petrovic Improving Effectiveness of Intrusion Detection by Correlation Feature Selection, 2010 International Conference on Availability, Reliability and Security, IEEE.

• Shelly Xiaonan Wu, Wolfgang Banzhaf. 2010. The use of computational intelligence in intrusion detection systems: A review. Applied Soft Computing 10, 1-35

• Ahmad Taher Azar, Hanaa Ismail Elshazly, Aboul Ella Hassanien, Abeer Mohamed Elkorany. 2013. A random forest classifier for lymph diseases. Computer Methods and Programs in Biomedicine.

Thank You

top related