Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI 11/12/2011 1
Jan 15, 2016
INFORMS DM-HI 1
Probabilistic Classification using Fuzzy Support Vector Machines
(PFSVM)
Marzieh ParandehgheibiORC - MIT
11/12/2011
2
Content
• Motivation• Problem• Methodology• Simulation Results• Conclusion
11/12/2011 INFORMS DM-HI
INFORMS DM-HI 3
Motivation• Is Cancer Misdiagnosis More Common Than You Thought? It is estimated
that nearly 12 percent of all cancer diagnoses may be in error.
• When a positive cancer diagnosis is missed, the consequences can be deadly. For example, a woman who is diagnosed with breast cancer in its early stages will survive at least 5 years longer.
• Being misdiagnosed with cancer can be a devastating. Patients who are misdiagnosed are often subjected to unnecessary, harmful, painful and expensive treatments.
• Confirm a diagnosis via methods such as seeking second opinions, consulting specialists, getting further medical tests, and researching information about the medical condition.
11/12/2011
INFORMS DM-HI 4
Motivation• Is Cancer Misdiagnosis More Common Than You Thought? It is estimated
that nearly 12 percent of all cancer diagnoses may be in error.
• When a positive cancer diagnosis is missed, the consequences can be deadly. For example, a woman who is diagnosed with breast cancer in its early stages will survive at least 5 years longer.
• Being misdiagnosed with cancer can be a devastating. Patients who are misdiagnosed are often subjected to unnecessary, harmful, painful and expensive treatments.
• Confirm a diagnosis via methods such as seeking second opinions, consulting specialists, getting further medical tests, and researching information about the medical condition.
11/12/2011
When can we trust a diagnosis? When do we need to have additional
tests?
INFORMS DM-HI 5
Problem
• What we do: Given data, is it a benign cancer or malignant?• What we need to do: Is the given data enough to decide on
the type of cancer?– YES : What’s the type of cancer?– NO : Do more Tests
11/12/2011
INFORMS DM-HI 6
Problem/Solution
• What we do: Given data, is it a benign cancer or malignant?• What we need to do: Is the given data enough to decide on
the type of cancer?– YES : What’s the type of cancer?– NO : Do more Tests
• Find the Criteria that most of errors occur• Find the probability of error (Pe)
• If Pe > α, wait for more tests
11/12/2011
INFORMS DM-HI 7
PFSVM Methodology Probabilistic Fuzzy Support Vector Machine (PFSVM) is a two-phase classification method which probabilistically assigns the points to each of the classes.
1- Apply FSVM to the whole training data such that most of the uncertain points will be placed in the margin. Moreover, the certain points are assigned to appropriate classes.
2- Define a fuzzy membership function and an appropriate rule to classify the points that were located in the margin.
This will result in assigning uncertain points to each of the classes with a specific probability.
11/12/2011
INFORMS DM-HI 8
SVM
11/12/2011
XTβ+ β0= 0
XTβ+ β0< 0
XTβ+ β0> 0
Suppose Training Data – N pairs (X1,Y1),…,(Xn,Yn) where Yi {-1,1}∈Separable Data:Separating Hyperplane {X: f(X)= XTβ+ β0=0} separates dataClassification Rule: g(x) = sign(XTβ+ β0)
NiM
M
...1 )(xy s.t.
0Tii
1,,max
0
1
M
INFORMS DM-HI 9
SVM
11/12/2011
ξi
ξi
ξi
1
M
1
M
Non-Separable Data:SVM maximizes the margin M between the training points for class 1 and -1, but allows for some points to be on the wrong side of the margin
Ni
...Nits
Min
i
N
ii
...1 0
1 -1)(xy ..
i0Tii
1
INFORMS DM-HI 10
FSVM• In many real-world applications, the effects of the training points are
different, i.e. some training points are more important than others.
• Each training point does not exactly belong to one of the two classes. It may 90% belong to one class and 10% of the other class.
• There is a fuzzy membership 0 < si ≤ 1 associated with each training point Xi.
11/12/2011
" " with " " replace11
N
iii
N
ii sMinMin
INFORMS DM-HI 11
FSVM• Suppose out of N training points, N1 points are in class 1 and N2 remaining
points are in class 2. Define the weight for each point as following:
where μjk and σjk refer to the mean and standard deviation of jth feature of all points in the class k, respectively. Moreover, xij indicates the jth feature value of ith point.
• Normalize the weights such that the total sum of the weights is equal to N, which is the sum of error costs for the classic SVM.
• the weights show up in the objective function
11/12/2011
KClassxxW i
P
j
x
ijk
jkij
exp)(1
2
)(2
2
)W(x)W(x
)(xW iN
1i i
in
N
i
N
i
)(xW min 1
in
INFORMS DM-HI 12
FSVM• Suppose out of N training points, N1 points are in class 1 and N2 remaining
points are in class 2. Define the weight for each point as following:
where μjk and σjk refer to the mean and standard deviation of jth feature of all points in the class k, respectively. Moreover, xij indicates the jth feature value of ith point.
• Normalize the weights such that the total sum of the weights is equal to N, which is the sum of error costs for the classic SVM.
• the weights show up in the objective function
11/12/2011
KClassxxW i
P
j
x
ijk
jkij
exp)(1
2
)(2
2
)W(x)W(x
)(xW iN
1i i
in
N
i
N
i
)(xW min 1
in
Points near to the center of each class have a higher weight than those farther. Therefore, near points will be classified certainly, and the points which are in the middle of the two classes, called uncertain points, will be located in the margin.
INFORMS DM-HI 13
PFSVM Methodology Probabilistic Fuzzy Support Vector Machine (PFSVM) is a Two-phase classification method which probabilistically assigns the uncertain points to each of the classes.
1- Apply FSVM to the whole training data such that most of the uncertain points will be placed in the margin. Moreover, the certain points are assigned to appropriate classes.
2- Define a fuzzy membership function and an appropriate rule to classify the points that were located in the margin.
This will result in assigning uncertain points to each of the classes with a specific probability.
11/12/2011
INFORMS DM-HI 14
Fuzzy Classification• Apply a fuzzy classification on the marginal points• Define Gaussian fuzzy membership function Aik for every test point Yi
located in the margin as
where μjk and σjk are the mean and standard deviation of training points of class k located in the margin, respectively.
• This membership shows the closeness of element Yi to the center of Kth class. To measure the related closeness of a point to both centers, a “membership probability” is defined for each marginal point as follows:
11/12/2011
2,1 exp1
2
)'(2
2
KAP
j
x
ikjk
jkij
C1i,C2i,C2i,C1i,
C1i,C1i, P- 1 P and ,
A A
A P
INFORMS DM-HI 15
Fuzzy Classification• Apply a fuzzy classification on the marginal points• Define Gaussian fuzzy membership function Aik for every test point Yi
located in the margin as
where μjk and σjk are the mean and standard deviation of training points of class k located in the margin, respectively.
• This membership shows the closeness of element Yi to the center of Kth class. To measure the related closeness of a point to both centers, a “membership probability” is defined for each marginal point as follows:
11/12/2011
2,1 exp1
2
)'(2
2
KAP
j
x
ikjk
jkij
C1i,C2i,C2i,C1i,
C1i,C1i, P- 1 P and ,
A A
A P
Points with probability more than 90% in class, will be assigned to that class. Otherwise, the given information is not sufficient to make a decision.
INFORMS DM-HI 16
DATA SET
• Wisconsin breast cancer diagnostic dataset
• 569 instances in two classes of Malignant (M) and Benign (B) with 32 features per instance.
• Reduce the number of features from 32 to 23 by saving just one feature out of every set of features with correlation more than 0.95.
• Determine the set of training and test data by 10-fold cross validation method.
11/12/2011
INFORMS DM-HI 17
Widen the Margin by FSVM
11/12/2011
SVM - Width of Margin: 0.895 FSVM - Width of Margin: 1.931
0
100
-100
-200
-300
-400
-500
-600
-700
-800-35 -30 -25 -20 -15 -10 -5 0
INFORMS DM-HI 18
Error Location in FSVM Methods
11/12/2011
On average, more than 80% of errors are inside the margin
INFORMS DM-HI 19
Comparison of different classification methods
Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave
SVMerr 1 1 5 3 4 1 2 4 0 1 3.86
FSVMerr 4 4 5 7 7 3 2 1 3 4 7.02
Fuzzyerr 3 3 5 8 4 6 5 3 3 1 7.19
PFSVMerr 1 1 0 0 3 0 0 1 2 0
PFSVMundet 1 1 1 2 0 1 2 1 0 0 1.58
11/12/2011
INFORMS DM-HI 20
Comparison of different classification methods
Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave
SVMerr 1 1 5 3 4 1 2 4 0 1 3.86
FSVMerr 4 4 5 7 7 3 2 1 3 4 7.02
Fuzzyerr 3 3 5 8 4 6 5 3 3 1 7.19
PFSVMerr 1+1 1 0 0+2 3+1 0+2 0 1 2+1 0+2 1.63
PFSVMundet 1 1 1 2 0 1 2 1 0 0 1.58
11/12/2011
INFORMS DM-HI 21
Double Cost PFSVM
1) Misdiagnosis of positive cancer is deadly2) Most of errors happen in positive cancer
diagnosis
11/12/2011
Double the cost of error for Positive Cancer Diagnosis
On average, more than 98% of errors are inside the margin
INFORMS DM-HI 22
Comparison of different classification methods
Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave
SVMerr 4 2 4 2 1 2 3 1 2 3 4.29
FSVMerr 6 2 3 2 1 2 4 1 4 5 5.36
Fuzzyerr 7 2 5 4 1 3 6 3 4 4 6.96
PFSVMerr 0 2 1 0 0 2 0 0 0 1
PFSVMundet 1 0 3 1 0 1 1 1 3 1 2.14
11/12/2011
INFORMS DM-HI 23
Comparison of different classification methods
Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave
SVMerr 4 2 4 2 1 2 3 1 2 3 4.29
FSVMerr 6 2 3 2 1 2 4 1 4 5 5.36
Fuzzyerr 7 2 5 4 1 3 6 3 4 4 6.96
PFSVMerr 0 2 1 0 0 2 0 0 0+1 1 1.23
PFSVMundet 1 0 3 1 0 1 1 1 3 1 2.14
11/12/2011
INFORMS DM-HI 24
Comparison of different classification methods
Method\Run 1 2 3 4 5 6 7 8 9 10 Percentave
SVMerr 4 2 4 2 1 2 3 1 2 3 4.29
FSVMerr 6 2 3 2 1 2 4 1 4 5 5.36
Fuzzyerr 7 2 5 4 1 3 6 3 4 4 6.96
PFSVMerr 0 2 1 0 0 2 0 0 0+1 1 1.23
PFSVMundet 1 0 3 1 0 1 1 1 3 1 2.14
11/12/2011
QUESTIONS?