ICP3083 L.I. Kuncheva Lecture 8: Classifiers Linear discriminant classifier Rule-based classifiers 1
ICP3083 L.I. Kuncheva
Lecture 8: Classifiers
Linear discriminant classifier
Rule-based classifiers
1
ICP3083 L.I. Kuncheva
Classifier Models
1. Nearest mean classifier
2. Linear discriminant classifier (LDC)
3. Rule-based classifiers
4. k-Nearest Neighbour Classifier (k-nn)
5. Decision tree classifier
6. Support Vector Machine classifier (SVM)
7. Classifier Ensembles
2
ICP3083 L.I. Kuncheva
The name shows the type of discriminant functions.
Linear discriminant classifier
ncncccc
nn
xaxaxaag
xaxaxaag
...)(
...
...)(
22110
1212111101
x
x
The coefficients aij could be any: positive, negative or zero.
3
ICP3083 L.I. Kuncheva
An example: two classes, 1-d feature space
.2, cx
xxg
xxg
23)(
2)(
2
1
)(2 xg
)(1 xg
Classification regions
x
Q1. Find the threshold pointthat determines the
classification regions.
3
1
232
)()( 21
x
xx
xgxg
4
ICP3083 L.I. Kuncheva
An example: three classes, 1-d feature space
3, cx
3)(
23)(
2)(
3
2
1
xg
xxg
xxg
)(2 xg)(1 xg
)(3 xg
Q2. Draw a graph and find the classification regions
Classification regions:
Class 1: from 1 to Class 2: from - to 0Class 3: from 0 to 1
discriminant functions
5
ICP3083 L.I. Kuncheva
2,2 cx
Each discriminant function is a plane, e.g.
An example: two classes, 2-d feature space
212
211
41)(
325)(
xxg
xxg
x
x
6
ICP3083 L.I. Kuncheva
Linear discriminant classifier (LDC)
• How do we get the discriminant functions?
We train the classifier so that the separability between the classes
is maximised. To train an LDC means to find all the coefficients aijin the discriminant functions.
• What do the classification regions of LDC look like?
For 1-d feature space these are intervals on the x-axis, one interval per class.
For 2-d feature space the classification regions are divided by straight lines. In a 2-class problem, the discriminant functions define a single line (classification boundary) that separates the two classification regions.
7
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
IPS3083/4083 L.I. Kuncheva
if
2.0
5.0
25.0
2
1
1
x
x
x
class “green”
class = “grey”
then
if
35.0
55.0
2
1
x
x
class “blue”
then
Rule-based classifiers
8
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
IPS3083/4083 L.I. Kuncheva
Rule-based classifiers 831 Grey 42%480 Green 24%689 Blue 34%
Q1. What would be the error rate of the Largest Prior (Majority) classifier?
Majority classifier will label all as Grey: error = 100-42% = 58% error rate
9
Zero-Rclassifier (0 rule)= Largest prior classifier
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
IPS3083/4083 L.I. Kuncheva
Rule-based classifiers 831 Grey 42%480 Green 24%689 Blue 34%
ONE-R classifier (1 rule)
Check each feature separately and calculate the accuracy at each split. Keep the ONE split with the highest accuracy.
10
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
0.6425
IPS3083/4083 L.I. Kuncheva
Rule-based classifiers
Label here as Grey
Label here as Blue
Mo
ve a
cro
ss t
he
wh
ole
sp
an o
f th
e f
eat
ure
Grey 147Green 57Blue 635
Grey 650Green 444Blue 67
11
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
0.6425
IPS3083/4083 L.I. Kuncheva
Rule-based classifiers
Even though the error is estimated on the TRAINING data only (resubstitution) the classifier is very robust, i.e., its generalisation is good!
The resubstitutionerror rate is 100-64.25 = 35.75%
12
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
0.6425
IPS3083/4083 L.I. Kuncheva
Rule-based classifiers
Posterior probabilities for ANY x in the respective region
Grey 147/839 = 0.18Green 57/839 = 0.07Blue 635/839 = 0.75
Grey 650/1161 = 0.56Green 444/1161 = 0.38Blue 67/1161 = 0.06
13