Lecture8 classifiers ldc_rules

ICP3083 L.I. Kuncheva

Lecture 8: Classifiers

Linear discriminant classifier

Rule-based classifiers

1


Classifier Models

1. Nearest mean classifier

2. Linear discriminant classifier (LDC)

3. Rule-based classifiers

4. k-Nearest Neighbour Classifier (k-nn)

5. Decision tree classifier

6. Support Vector Machine classifier (SVM)

7. Classifier Ensembles

2


The name shows the type of discriminant functions.

Linear discriminant classifier

ncncccc

nn

xaxaxaag

xaxaxaag

...)(

...

...)(

22110

1212111101

x

x

The coefficients aij could be any: positive, negative or zero.

3


An example: two classes, 1-d feature space

.2, cx

xxg

xxg

23)(

2)(

2

1

)(2 xg

)(1 xg

Classification regions

x

Q1. Find the threshold pointthat determines the

classification regions.

3

1

232

)()( 21

x

xx

xgxg

4


An example: three classes, 1-d feature space

3, cx

3)(

23)(

2)(

3

2

1

xg

xxg

xxg

)(2 xg)(1 xg

)(3 xg

Q2. Draw a graph and find the classification regions

Classification regions:

Class 1: from 1 to Class 2: from - to 0Class 3: from 0 to 1

discriminant functions

5


2,2 cx

Each discriminant function is a plane, e.g.

An example: two classes, 2-d feature space

212

211

41)(

325)(

xxg

xxg

x

x

6


Linear discriminant classifier (LDC)

• How do we get the discriminant functions?

We train the classifier so that the separability between the classes

is maximised. To train an LDC means to find all the coefficients aijin the discriminant functions.

• What do the classification regions of LDC look like?

For 1-d feature space these are intervals on the x-axis, one interval per class.

For 2-d feature space the classification regions are divided by straight lines. In a 2-class problem, the discriminant functions define a single line (classification boundary) that separates the two classification regions.

7

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

IPS3083/4083 L.I. Kuncheva

if

2.0

5.0

25.0

2

1

1

x

x

x

class “green”

class = “grey”

then

if

35.0

55.0

2

1

x

x

class “blue”

then


8

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Rule-based classifiers 831 Grey 42%480 Green 24%689 Blue 34%

Q1. What would be the error rate of the Largest Prior (Majority) classifier?

Majority classifier will label all as Grey: error = 100-42% = 58% error rate

9

Zero-Rclassifier (0 rule)= Largest prior classifier

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Rule-based classifiers 831 Grey 42%480 Green 24%689 Blue 34%

ONE-R classifier (1 rule)

Check each feature separately and calculate the accuracy at each split. Keep the ONE split with the highest accuracy.

10

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0.6425



Label here as Grey

Label here as Blue

Mo

ve a

cro

ss t

he

wh

ole

sp

an o

f th

e f

eat

ure

Grey 147Green 57Blue 635

Grey 650Green 444Blue 67

11

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0.6425



Even though the error is estimated on the TRAINING data only (resubstitution) the classifier is very robust, i.e., its generalisation is good!

The resubstitutionerror rate is 100-64.25 = 35.75%

12

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0.6425



Posterior probabilities for ANY x in the respective region

Grey 147/839 = 0.18Green 57/839 = 0.07Blue 635/839 = 0.75

Grey 650/1161 = 0.56Green 444/1161 = 0.38Blue 67/1161 = 0.06

13

Lecture8 classifiers ldc_rules

Education