3DWWHUQ &ODVVLILFDWLRQisl.ee.boun.edu.tr/courses/ee573/lectures/DHSch2part1.pdf · 3dwwhuq &odvvlilfdwlrq &kdswhu 3duw ,qwurgxfwlrq 7kh vhd edvv vdoprq h[dpsoh 6wdwh ri qdwxuh sulru
Post on 31-Jan-2021
3 Views
Preview:
Transcript
Pattern Classification
All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000with the permission of the authors and the publisher
Chapter 2 (Part 1): Bayesian Decision Theory
(Sections 2.1-2.5, 2-7, 2.10)
• Introduction
• Bayesian Decision Theory–Continuous Features
Pattern Classification, Chapter 2 (Part 1)
2
Introduction
• The sea bass/salmon example
• State of nature, prior
• State of nature is a random variable
• The catch of salmon and sea bass is equiprobable
• P(1) = P(2) (uniform priors)
• P(1) + P( 2) = 1 (exclusivity and exhaustivity)
Pattern Classification, Chapter 2 (Part 1)
3
• Decision rule with only the prior information• Decide 1 if P(1) > P(2) otherwise decide 2
• PROBLEM!!!
• If P(1) > > P(2) correct most of the time
• If P(1) = P(2) 50% of being correct
• Probability of error?
Pattern Classification, Chapter 2 (Part 1)
4
• Use of the class –conditional information. Suppose x is the observed lightness.
• P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon
Pattern Classification, Chapter 2 (Part 1)
5
Likelihood
Pattern Classification, Chapter 2 (Part 1)
6
• Posterior, likelihood, evidence
• Bayes Formula
P(j | x) = P(x | j) . P (j) / P(x)
• Where in case of two categories
• Posterior = (Likelihood. Prior) / Evidence
2j
1jjj )(P)|x(P)x(P
Pattern Classification, Chapter 2 (Part 1)
7
Pattern Classification, Chapter 2 (Part 1)
8
• Decision given the posterior probabilities
X is an observation for which:
if P(1 | x) > P(2 | x) True state of nature = 1if P(1 | x) < P(2 | x) True state of nature = 2
Therefore:whenever we observe a particular x, the probability of
error is :P(error | x) = P(1 | x) if we decide 2P(error | x) = P(2 | x) if we decide 1
Pattern Classification, Chapter 2 (Part 1)
9
• Minimizing the probability of error
• Bayes Decision (Minimize the probability of error)
Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2
Therefore:
P(error | x) = min [P(1 | x), P(2 | x)]
Pattern Classification, Chapter 2 (Part 1)
10Bayesian Decision Theory –Continuous Features
• Generalization of the preceding ideas
• Use of more than one feature• Use more than two states of nature• Allowing actions and not only decide on the state of
nature
• Introduce a loss of function which is more general than the probability of error
Pattern Classification, Chapter 2 (Part 1)
11
• Allowing actions other than classification primarily allows the possibility of rejection
• Refusing to make a decision in close or bad cases!
• The loss function states how costly each action taken is
Pattern Classification, Chapter 2 (Part 1)
12
Let {1, 2,…, c} be the set of c states of nature
(or “categories”)
Let {1, 2,…, a} be the set of possible actions
Let (i | j) be the loss incurred for taking
action i when the state of nature is j
Pattern Classification, Chapter 2 (Part 1)
13
Expected Loss = R(i | x) for i = 1,…,a
Overall risk
To minimize R Choose action i (i = 1,…,a) thatminimizes R(i | x)
Bayes Risk R* - Resulting
Conditional risk
cj
1jjjii )x|(P)|()x|(R
dxxpxxRR )()|)((
Pattern Classification, Chapter 2 (Part 1)
14
• Two-category classification1 : deciding 12 : deciding 2ij = (i | j)
loss incurred for deciding i when the true state of nature is j
Conditional risk:
R(1 | x) = 11P(1 | x) + 12P(2 | x)R(2 | x) = 21P(1 | x) + 22P(2 | x)
Pattern Classification, Chapter 2 (Part 1)
15
Our rule is the following:
if R(1 | x) < R(2 | x)action 1: “decide 1” is taken
This results in the equivalent rule :
decide 1 if:
(21- 11) P(x | 1) P(1) >(12- 22) P(x | 2) P(2)
and decide 2 otherwise
Pattern Classification, Chapter 2 (Part 1)
16
Likelihood ratio:
The preceding rule is equivalent to the following rule:
Then take action 1 (decide 1)Otherwise take action 2 (decide 2)
)(P
)(P.
)|x(P)|x(P
if1
2
1121
2212
2
1
Pattern Classification, Chapter 2 (Part 1)
17
Optimal decision property
“If the likelihood ratio exceeds a threshold value independent of the input pattern x, we can take optimal actions”
Pattern Classification, Chapter 2 (Part 1)
18
Minimax Decision
Missing Features
Discriminant Functions
Pattern Classification, Chapter 2 (Part 1)
20
• Feature space divided into c decision regionsif gi(x) > gj(x) j i then x is in Ri
(Ri means assign x to i)
• The two-category case• A classifier is a “dichotomizer” that has two discriminant
functions g1 and g2
Let g(x) g1(x) – g2(x)
Decide 1 if g(x) > 0 ; Otherwise decide 2
Pattern Classification, Chapter 2 (Part 1)
21• Bayes Classifier gi(x) = - R(i | x)
(max. discriminant corresponds to min. risk!)
• For the minimum error rate, we take gi(x) = P(i | x)
(max. discrimination corresponds to max.posterior!)
gi(x) P(x | i) P(i)
gi(x) = ln P(x | i) + ln P(i)(ln: natural logarithm!)
Classifiers, Discriminant Functions
Pattern Classification, Chapter 2 (Part 1)
22
Decision Surfaces
Pattern Classification, Chapter 2 (Part 1)
23
Pattern Classification, Chapter 2 (Part 1)
24
Error Probabilities
• Two-category case
• c-Category case
dxPxpdxPxp
PRxPPRxP
RxPRxPerrorP
RR
)()()()(
)()()()(
),(),()(
2211
221112
2112
12
c
i R
iii
iii
c
ii
c
ii
i
dxPRxp
PRxPRxPcorrectP
1
11
)()(
)()(),()(
Pattern Classification, Chapter 2 (Part 1)
25Error Bounds
• Bayes Decision Rule – Lowest error rate• Question: What is the actual probability of error?
• Gaussian Case
• Chernoff Bound
• Bhattacharyya Bound
top related