Lec6: K-NN Classifier, DT and Fuzzy Classification 1 K-NN Classifier, Decision Tree and Fuzzy Classification Prof. Daniel Yeung School of Computer Science and Engineering South China University of Technology Lecture 6 Pattern Recognition Lec6: K-NN Classifier, DT and Fuzzy Classification 2 Outline K-Nearest Neighbor (K-NN) (4.5) Decision Tree (DT) (8.2 – 8.4) Fuzzy Classification (4.7)
27
Embed
K-NN Classifier, Decision Tree and Fuzzy Classificationmlclab.org/PR/notes/Lecture06-K-nnClassifierDTandFuzzy... · 2019-11-19 · Lec6: K-NN Classifier, DT and Fuzzy Classification
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Lec6: K-NN Classifier, DT and Fuzzy Classification 1
K-NN Classifier,
Decision Tree and
Fuzzy Classification
Prof. Daniel Yeung
School of Computer Science and Engineering
South China University of Technology
Lecture 6Pattern Recognition
Lec6: K-NN Classifier, DT and Fuzzy Classification 2
Outline
� K-Nearest Neighbor (K-NN) (4.5)
� Decision Tree (DT) (8.2 – 8.4)
� Fuzzy Classification (4.7)
Lec6: K-NN Classifier, DT and Fuzzy Classification 3
K-Nearest Neighbor (K-NN)
� A new pattern is classified by a majority
vote of its neighbors, with the pattern
being assigned to the class most common
amongst its k nearest neighbors
Lec6: K-NN Classifier, DT and Fuzzy Classification 4
� Algorithm:
�Given an unseen sample
�Calculate the distance between unseen sample and each training sample
�Select the K nearest training samples
�Majority vote from these K samples
K-Nearest Neighbor (K-NN)
K = 1Triangle
K = 3Circle Triangle
Unseen Sample
K = 5
Lec6: K-NN Classifier, DT and Fuzzy Classification 5
K-Nearest Neighbor (K-NN)
� Target function for the entire space may be described as a combination of less complex local approximations
Lec6: K-NN Classifier, DT and Fuzzy Classification 6
� Noisy Data
� A simple linearly separable dataset
� Obviously, unseen sample in question should be identified as Triangle
� However, if there is a noise sample next to this unseen sample, then a 1-NN classifier will classify it wrongly as a Circle
� Larger K can solve this problem
K-Nearest Neighbor (K-NN)
Circle
Triangle
Lec6: K-NN Classifier, DT and Fuzzy Classification 7
K-Nearest Neighbor (K-NN)
� Is larger K better?
� Obviously, this unseen
sample will be identified
as a Circle by a 5-NN
classifier
� But a19-NN classifier will
classify it as a Triangle
� However, 5-NN classifier
will classify it correctly
Circle
Triangle
Lec6: K-NN Classifier, DT and Fuzzy Classification 8
K-NN: Characteristic
� Advantages:
�Very simple
�All computations deferred until classification
�No training is needed
� Disadvantages:
�Difficult to determine K
�Affected by noisy training data
�Classification is time consuming� Need to calculate the distance between the
unseen sample and each training sample
Lec6: K-NN Classifier, DT and Fuzzy Classification 9
Decision Tree (DT)
� One of the most widely used and practical
methods for inductive inference
� Approximates discrete-valued functions
(including disjunctions)
Lec6: K-NN Classifier, DT and Fuzzy Classification 10
DT: Example
� Do we go to play tennis today?
If Outlook is Sunny AND
Humidity is Normal
If Outlook is Overcast
If Outlook is Rain AND
Wind is Weak
Yes
Yes
Yes
Other situation? No
Lec6: K-NN Classifier, DT and Fuzzy Classification 11
DT: Example
� Internal node
corresponds to a test
� Branch
corresponds to a
result of the test
� Leaf node
assigns a
classification result
Lec6: K-NN Classifier, DT and Fuzzy Classification 12
DT: Classification
x2
x1
� Decision Region:
a
b
x1 > a
No Yes
x2 > b
No Yes
Internal nodes can be univariate
Only one feature is used
Lec6: K-NN Classifier, DT and Fuzzy Classification 13
DT: Classification
x2
x1
� Internal nodes can be multivariate� More than one features are used
� Shape of Decision Region is irregular
ax1 + bx2 + c > 0
No Yes
Lec6: K-NN Classifier, DT and Fuzzy Classification 14
DT: Learning Algorithm
� LOOP:
1. Select the best feature (A)
2. For each value of A, create new descendant of node
3. Sort training samples to leaf nodes
� STOP when training samples perfectly classified
# x1 x2 x31 1 3 5
2 1 4 2
3 3 1 5
4 3 5 6
5 3 3 4
6 4 5 7
x1 > 2
No Yes
STOP x2 > 2
No Yes
STOP STOP
Lec6: K-NN Classifier, DT and Fuzzy Classification 15
DT: Learning Algorithm
� Observation
�For a given training set,
many trees may code it without any error
�Finding the smallest tree is a NP-hard
problem
� Local search algorithm to find reasonable
solutions
�What is the best feature?
Lec6: K-NN Classifier, DT and Fuzzy Classification 16
DT: Feature Measurement
� Entropy can be used as a feature measurement
� Measure of uncertainty
� Range: 0 - 1
� Smaller value, less uncertainty
∑=
−=n
i
ii xpxpXH1
2 )(log)()(
p(x) is the probability mass function of outcome x.
if X ∈ class x1, then p(x1) = 1, and all other p(xi) = 0 for i ≠ 1.
Thus H(X) = 0, the smallest value possible (no uncertainty).
X is a random variable with n outcomes {xi: i = 1,…,n}where
Lec6: K-NN Classifier, DT and Fuzzy Classification 17
DT: Feature Measurement
� Information Gain
�Reduction in entropy (reduce uncertainty) due
to sorting on a feature A
)|()(),( AXHXHAXGain −=
Current entropy Entropy after using feature A
Lec6: K-NN Classifier, DT and Fuzzy Classification 18
DT: Example
∑=
−=2
1
2 )(log)()(i
ii xpxpXH
)14
5(log
14
5)
14
9(log
14
922 −−=
0.5310.410 +=
0.941=
∑=
−=n
i
ii xpxpXH1
2 )(log)()(
Recall:
0.941)( =XHCurrent:Which feature is the best?
Uncertainty is high w/o any
sorting by feature
x1=yes x2 =No
Lec6: K-NN Classifier, DT and Fuzzy Classification 19
DT: Example
Outlook
SunnyRain Overcast
No:
Yes:
3
2No:
Yes:
2
3
No:
Yes:
0
4
Let A = Outlook
)()|(
)()|(
)()|(
overcastAPovercastAXH
RainAPRainAXH
sunnyAPsunnyAXH
==
+==
+===)|( AXHRecall:
)|()(),( AXHXHAXGain −=
Lec6: K-NN Classifier, DT and Fuzzy Classification 20
Outlook
SunnyRain Overcast
No:
Yes:
3
2No:
Yes:
2
3
No:
Yes:
0
4
DT: Example
Let A = Outlook
)()|(
)()|(
)()|(
overcastAPovercastAXH
RainAPRainAXH
sunnyAPsunnyAXH
==
+==
+===)|( AXH
)|( sunnyXH 0.971)5
2(log
5
2)5
3(log
5
322 =−−=
)|( RainXH 0.971)5
3(log
5
3)5
2(log
5
222 =−−=
)|( overcastXH 0)4
4(log
4
4)4
0(log
4
022 =−−=
×+
×+
×=14
40
14
5971.0
14
5971.0)|( AXH
694.0=
Lec6: K-NN Classifier, DT and Fuzzy Classification 21
DT: Example
0.941)( =XHCurrent:
694.0)|( =OutlookXH
Similarly, for each feature
0.911)|( =eTemperaturXH
0.789)|( =HumidityXH
0.892)|( =WindXH
Recall:
)|()(),( AXHXHAXGain −=
Information Gain is:
247.0),( =OutlookXGain
030.0),( =eTemperaturXGain
152.0),( =HumidityXGain
049.0),( =WindXGain
Outlook is the best feature and
Should be used as the first node
Lec6: K-NN Classifier, DT and Fuzzy Classification 22
DT: Example
� Next Step
�Repeat the steps for each sub-branch
�Until there is no ambiguity
(all samples are of the same class)
Outlook
SunnyRain Overcast
No:
Yes:
3
2No:
Yes:
2
3
No:
Yes:
0
4
DoneContinues to select next features
Lec6: K-NN Classifier, DT and Fuzzy Classification 23
DT: Continuous-Valued Feature
� So far, we handle features with nominal values
� How to build a decision tree whose features have continuous values?
29.9
28.2
35.2
26.4
18.9
21.2
20.4
24.4
17.0
25.1
24.0
24.5
27.7
25.5
Lec6: K-NN Classifier, DT and Fuzzy Classification 24