Jan 12, 2016
PROJECT
•Compare SVM and standard models (neural nets etc)
&
•Bagging+SVM+Data Reduction for fast and efficient learning on large data
By: Nitin Chaudhary Prof: Dr. Vucetic Slobodan
INTRODUCTION
SVM:The goal in training a Support Vector machine is to find the separating hyperplanewith the largest margin; we expect that the larger the margin, the better generalization of the classifier
y1
y2
Optimal Hyper Plane
Wow, These are the support vectors
The Support Vectors are equally close to the hyperplane
The Support Vectors are the training samples that definethe optimal separating hyperplane and are the most difficult patterns to classify.
Informally speaking, they arethe patterns most informative for the classification task.
FORMULATIONS
1) C-Support Vector Classificationmin (1/2)WT W + C Σ ξ iw,b,ξ
2) nu-Support Vector Classification
min (1/2)WT W- pv +(1/L) Σ ξ i
w,b,ξ,p
Above are the primal problems for both classifications
Remember the form of polynomial kernel
(Gamma*<X(:,i),X(:,j)>+Coefficient)^Degree
The form of rbf or Gaussian Kernelexp(-Gamma*|X(:,i)-X(:,j)|^2)
Wow!I remember
these, it was taught in CIS 595
class
For the first part of the project I have used Pima.txt as my data set
Gamma : If the input value is zero, Gamma will be set defautly as 1/(max_pattern_dimension) in the function. If the input value is non-zero, Gamma will remain unchanged in the function.
C- Cost of Constraint Violation
METHODOLGY
For nu-SVC and C-SVC using rbf or Gaussian Kernel
C Gamma
ConfMat
Acc
1 1 260 0123 0 0.6789
100 0.0001 216 2770 70 0.7467
200 0.0001 218 3266 67 0.7441
1000 0.00001
214 3460 75 0.7546
0.00001
.000001
218 3149 85 0.7911
For nu-SVC or C-SVC using Polynomial Kernel
Gamma C Coeff Degree
ConfMat
Acc
0.0001 100 2 3 225 3154 73 0.778
1
0.00001 10 1 5 232 2255 74 0.799
0
For Neural Networks
HiddenNeuron
#of Training Iterations
Show
Max_fail ConfMat
Accu
5 100 10 5 120 2832 49 0.738
0
8 100 10 5 134 1632 47 0.790
4
10 100 10 50 136 1725 51 0.816
6
RESULTS
1) Maximum Accuracy in case of nu-SVC using rbf orGaussian kernel is 79.11% at Gamma = 0.000001 andC=0.00001
2) Maximum Accuracy in case of C-SVC using rbf orGaussian kernel is 80.94% at Gamma = 0.000001 andC=100000
3) Maximum Accuracy in case of nu-SVC using Polynomial kernel is 79.90% at Gamma = 0.00001, C=10, Coeff = 1and Degree = 5
RESULTS cont….
4) Maximum Accuracy in case of C-SVC using Polynomialkernel is 80.68% at Gamma = 0.00001, C=100, Coeff=10and Degree=3
5)Maximum Accuracy in case of Neural Networks is 81.66%at no of hidden neurons =10, number of training iterations=100, show=10, max_fail=50
Bagging + SVM + Data Reduction for fast and efficient learning on large data
Goal: To perform Bagging on very large data sets for SVM.
The data set that I have used over here is cover_type.txt
The test_covertype.txt has 20 attributes and 7 classes, so last 7 rows are classes. I have used class 2 ie column 22 as positive class and all other classes as negative class. So it means that I am only dealing with binary classification problem.
Same is true for train_covertype.txt
But what is Bagging?
Bagging is a “bootstrap” ensemble method that creates individuals for it’s ensemble by training each classifier on a random distribution of the training set. Each classifier’s training set is generated by randomly drawing with replacement, N examples- where N is the size of the training set
But how do you actually do it.
Methodology
1) Divide the data set in to train_covertype.txt and test_covertype.txt2) First take 20% of train_covertype.txt and train SVCusing any of the kernels and then use this particularkernel for future experiments too.3) Then we test our model with test_covertype.txt and get the Predicted Labels or Predictions4) Records these Predicted Labels as the first column of PreLabel Matrix5) Again repeat steps 2,3 and 4 for a few numberof times, say 3 or 56) And record the Predicted Labels as second, third andfourth column respectively7) Remember that when we take the 20% of train_coverType.txt after the first time we do it by replacement.
So, now we get a nice PreLabeled Matrix
1 2 3 4 5
1 1 0 0 1
0 1 1 0 0
1 0 1 1 1
1 1 0 1 1
0 0 1 0 0
1 1 0 0 0
1 0 1 1 1
0 1 1 1 1
1 1 0 0 0 And so on
Once I get the PreLabel Matrix I need to get the Predictions by majority vote
1 2 3 4 5
1 1 0 0 1
0 1 1 0 0
1 0 1 1 1
1 1 0 1 1
0 0 1 0 0
1 1 0 0 0
1 0 1 1 1
0 1 1 1 1
1 1 0 0 0
Predictions
1
0
1
1
0
0
1
1
0
Now, with these Predictions and the True Labels from the test_covertype I calculate the accuracy by using accuracy.m file provided in the class
Similarly, I do the above steps exactly the same way by taking 30%, 40%, 50%,….. Of train_covertype.txt and try to calculate the accuracy.
RESULTS
% of TrainDat Set
Accuracy
0.01% 0.5207+_(0.0263)
0.05% 0.5977+_(0.0122)
0.1% 0.6154+_(0.0232)
0.25% 0.6659+_(0.0075)
0.5% 0.7089+_(0.0084)
0.75% 0.7156+_(0.0027)
1% 0.7297+_(0.0066)
2.5% 0.7457+_(0.0047)
5% 0.7511+_(0.0043)
RESULTS cont…
% of Traindata set
Accuracy
7.5% 0.7529+_(0.0016)
10% 0.7530+_(0.0029)
20% 0.7605+_(0.0024)
30% 0.7619+_(0.0026)
40% 0.7630+_(0.0036)
50% Takes too much time
RESULTS cont..
We try to plot a graph between thePercentage of train_cover type takenAnd accuracy that we got.
Accuracy
% of Train_covertype.txt
QUESTIONS?????
Good Bye