PROJECT

PROJECT

•Compare SVM and standard models (neural nets etc)

&

•Bagging+SVM+Data Reduction for fast and efficient learning on large data

By: Nitin Chaudhary Prof: Dr. Vucetic Slobodan

INTRODUCTION

SVM:The goal in training a Support Vector machine is to find the separating hyperplanewith the largest margin; we expect that the larger the margin, the better generalization of the classifier

y1

y2

Optimal Hyper Plane

Wow, These are the support vectors

The Support Vectors are equally close to the hyperplane

The Support Vectors are the training samples that definethe optimal separating hyperplane and are the most difficult patterns to classify.

Informally speaking, they arethe patterns most informative for the classification task.

FORMULATIONS

1) C-Support Vector Classificationmin (1/2)WT W + C Σ ξ iw,b,ξ

2) nu-Support Vector Classification

min (1/2)WT W- pv +(1/L) Σ ξ i

w,b,ξ,p

Above are the primal problems for both classifications

Remember the form of polynomial kernel

(Gamma*<X(:,i),X(:,j)>+Coefficient)^Degree

The form of rbf or Gaussian Kernelexp(-Gamma*|X(:,i)-X(:,j)|^2)

Wow!I remember

these, it was taught in CIS 595

class

For the first part of the project I have used Pima.txt as my data set

Gamma : If the input value is zero, Gamma will be set defautly as 1/(max_pattern_dimension) in the function. If the input value is non-zero, Gamma will remain unchanged in the function.

C- Cost of Constraint Violation

METHODOLGY

For nu-SVC and C-SVC using rbf or Gaussian Kernel

C Gamma

ConfMat

Acc

1 1 260 0123 0 0.6789

100 0.0001 216 2770 70 0.7467

200 0.0001 218 3266 67 0.7441

1000 0.00001

214 3460 75 0.7546

0.00001

.000001

218 3149 85 0.7911

For nu-SVC or C-SVC using Polynomial Kernel

Gamma C Coeff Degree

ConfMat

Acc

0.0001 100 2 3 225 3154 73 0.778

1

0.00001 10 1 5 232 2255 74 0.799

0

For Neural Networks

HiddenNeuron

#of Training Iterations

Show

Max_fail ConfMat

Accu

5 100 10 5 120 2832 49 0.738

0

8 100 10 5 134 1632 47 0.790

4

10 100 10 50 136 1725 51 0.816

6

RESULTS

1) Maximum Accuracy in case of nu-SVC using rbf orGaussian kernel is 79.11% at Gamma = 0.000001 andC=0.00001

2) Maximum Accuracy in case of C-SVC using rbf orGaussian kernel is 80.94% at Gamma = 0.000001 andC=100000

3) Maximum Accuracy in case of nu-SVC using Polynomial kernel is 79.90% at Gamma = 0.00001, C=10, Coeff = 1and Degree = 5

RESULTS cont….

4) Maximum Accuracy in case of C-SVC using Polynomialkernel is 80.68% at Gamma = 0.00001, C=100, Coeff=10and Degree=3

5)Maximum Accuracy in case of Neural Networks is 81.66%at no of hidden neurons =10, number of training iterations=100, show=10, max_fail=50

Bagging + SVM + Data Reduction for fast and efficient learning on large data

Goal: To perform Bagging on very large data sets for SVM.

The data set that I have used over here is cover_type.txt

The test_covertype.txt has 20 attributes and 7 classes, so last 7 rows are classes. I have used class 2 ie column 22 as positive class and all other classes as negative class. So it means that I am only dealing with binary classification problem.

Same is true for train_covertype.txt

But what is Bagging?

Bagging is a “bootstrap” ensemble method that creates individuals for it’s ensemble by training each classifier on a random distribution of the training set. Each classifier’s training set is generated by randomly drawing with replacement, N examples- where N is the size of the training set

But how do you actually do it.

Methodology

1) Divide the data set in to train_covertype.txt and test_covertype.txt2) First take 20% of train_covertype.txt and train SVCusing any of the kernels and then use this particularkernel for future experiments too.3) Then we test our model with test_covertype.txt and get the Predicted Labels or Predictions4) Records these Predicted Labels as the first column of PreLabel Matrix5) Again repeat steps 2,3 and 4 for a few numberof times, say 3 or 56) And record the Predicted Labels as second, third andfourth column respectively7) Remember that when we take the 20% of train_coverType.txt after the first time we do it by replacement.

So, now we get a nice PreLabeled Matrix

1 2 3 4 5

1 1 0 0 1

0 1 1 0 0

1 0 1 1 1

1 1 0 1 1

0 0 1 0 0

1 1 0 0 0

1 0 1 1 1

0 1 1 1 1

1 1 0 0 0 And so on

Once I get the PreLabel Matrix I need to get the Predictions by majority vote

1 2 3 4 5

1 1 0 0 1

0 1 1 0 0

1 0 1 1 1

1 1 0 1 1

0 0 1 0 0

1 1 0 0 0

1 0 1 1 1

0 1 1 1 1

1 1 0 0 0

Predictions

1

0

1

1

0

0

1

1

0

Now, with these Predictions and the True Labels from the test_covertype I calculate the accuracy by using accuracy.m file provided in the class

Similarly, I do the above steps exactly the same way by taking 30%, 40%, 50%,….. Of train_covertype.txt and try to calculate the accuracy.

RESULTS

% of TrainDat Set

Accuracy

0.01% 0.5207+_(0.0263)

0.05% 0.5977+_(0.0122)

0.1% 0.6154+_(0.0232)

0.25% 0.6659+_(0.0075)

0.5% 0.7089+_(0.0084)

0.75% 0.7156+_(0.0027)

1% 0.7297+_(0.0066)

2.5% 0.7457+_(0.0047)

5% 0.7511+_(0.0043)

RESULTS cont…

% of Traindata set

Accuracy

7.5% 0.7529+_(0.0016)

10% 0.7530+_(0.0029)

20% 0.7605+_(0.0024)

30% 0.7619+_(0.0026)

40% 0.7630+_(0.0036)

50% Takes too much time

RESULTS cont..

We try to plot a graph between thePercentage of train_cover type takenAnd accuracy that we got.

Accuracy

% of Train_covertype.txt

QUESTIONS?????

Good Bye

PROJECT

Documents

case of c

case of nu

c cost

training samples

training iterationsshowmax

training setbut

classifiers training

bagging svm data reduction