Top Banner
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher
26

3DWWHUQ &ODVVLILFDWLRQisl.ee.boun.edu.tr/courses/ee573/lectures/DHSch2part1.pdf · 3dwwhuq &odvvlilfdwlrq &kdswhu 3duw ,qwurgxfwlrq 7kh vhd edvv vdoprq h[dpsoh 6wdwh ri qdwxuh sulru

Jan 31, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Pattern Classification

    All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000with the permission of the authors and the publisher

  • Chapter 2 (Part 1): Bayesian Decision Theory

    (Sections 2.1-2.5, 2-7, 2.10)

    • Introduction

    • Bayesian Decision Theory–Continuous Features

  • Pattern Classification, Chapter 2 (Part 1)

    2

    Introduction

    • The sea bass/salmon example

    • State of nature, prior

    • State of nature is a random variable

    • The catch of salmon and sea bass is equiprobable

    • P(1) = P(2) (uniform priors)

    • P(1) + P( 2) = 1 (exclusivity and exhaustivity)

  • Pattern Classification, Chapter 2 (Part 1)

    3

    • Decision rule with only the prior information• Decide 1 if P(1) > P(2) otherwise decide 2

    • PROBLEM!!!

    • If P(1) > > P(2) correct most of the time

    • If P(1) = P(2) 50% of being correct

    • Probability of error?

  • Pattern Classification, Chapter 2 (Part 1)

    4

    • Use of the class –conditional information. Suppose x is the observed lightness.

    • P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon

  • Pattern Classification, Chapter 2 (Part 1)

    5

    Likelihood

  • Pattern Classification, Chapter 2 (Part 1)

    6

    • Posterior, likelihood, evidence

    • Bayes Formula

    P(j | x) = P(x | j) . P (j) / P(x)

    • Where in case of two categories

    • Posterior = (Likelihood. Prior) / Evidence

    2j

    1jjj )(P)|x(P)x(P

  • Pattern Classification, Chapter 2 (Part 1)

    7

  • Pattern Classification, Chapter 2 (Part 1)

    8

    • Decision given the posterior probabilities

    X is an observation for which:

    if P(1 | x) > P(2 | x) True state of nature = 1if P(1 | x) < P(2 | x) True state of nature = 2

    Therefore:whenever we observe a particular x, the probability of

    error is :P(error | x) = P(1 | x) if we decide 2P(error | x) = P(2 | x) if we decide 1

  • Pattern Classification, Chapter 2 (Part 1)

    9

    • Minimizing the probability of error

    • Bayes Decision (Minimize the probability of error)

    Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2

    Therefore:

    P(error | x) = min [P(1 | x), P(2 | x)]

  • Pattern Classification, Chapter 2 (Part 1)

    10Bayesian Decision Theory –Continuous Features

    • Generalization of the preceding ideas

    • Use of more than one feature• Use more than two states of nature• Allowing actions and not only decide on the state of

    nature

    • Introduce a loss of function which is more general than the probability of error

  • Pattern Classification, Chapter 2 (Part 1)

    11

    • Allowing actions other than classification primarily allows the possibility of rejection

    • Refusing to make a decision in close or bad cases!

    • The loss function states how costly each action taken is

  • Pattern Classification, Chapter 2 (Part 1)

    12

    Let {1, 2,…, c} be the set of c states of nature

    (or “categories”)

    Let {1, 2,…, a} be the set of possible actions

    Let (i | j) be the loss incurred for taking

    action i when the state of nature is j

  • Pattern Classification, Chapter 2 (Part 1)

    13

    Expected Loss = R(i | x) for i = 1,…,a

    Overall risk

    To minimize R Choose action i (i = 1,…,a) thatminimizes R(i | x)

    Bayes Risk R* - Resulting

    Conditional risk

    cj

    1jjjii )x|(P)|()x|(R

    dxxpxxRR )()|)((

  • Pattern Classification, Chapter 2 (Part 1)

    14

    • Two-category classification1 : deciding 12 : deciding 2ij = (i | j)

    loss incurred for deciding i when the true state of nature is j

    Conditional risk:

    R(1 | x) = 11P(1 | x) + 12P(2 | x)R(2 | x) = 21P(1 | x) + 22P(2 | x)

  • Pattern Classification, Chapter 2 (Part 1)

    15

    Our rule is the following:

    if R(1 | x) < R(2 | x)action 1: “decide 1” is taken

    This results in the equivalent rule :

    decide 1 if:

    (21- 11) P(x | 1) P(1) >(12- 22) P(x | 2) P(2)

    and decide 2 otherwise

  • Pattern Classification, Chapter 2 (Part 1)

    16

    Likelihood ratio:

    The preceding rule is equivalent to the following rule:

    Then take action 1 (decide 1)Otherwise take action 2 (decide 2)

    )(P

    )(P.

    )|x(P)|x(P

    if1

    2

    1121

    2212

    2

    1

  • Pattern Classification, Chapter 2 (Part 1)

    17

    Optimal decision property

    “If the likelihood ratio exceeds a threshold value independent of the input pattern x, we can take optimal actions”

  • Pattern Classification, Chapter 2 (Part 1)

    18

    Minimax Decision

  • Missing Features

  • Discriminant Functions

    Pattern Classification, Chapter 2 (Part 1)

    20

    • Feature space divided into c decision regionsif gi(x) > gj(x) j i then x is in Ri

    (Ri means assign x to i)

    • The two-category case• A classifier is a “dichotomizer” that has two discriminant

    functions g1 and g2

    Let g(x) g1(x) – g2(x)

    Decide 1 if g(x) > 0 ; Otherwise decide 2

  • Pattern Classification, Chapter 2 (Part 1)

    21• Bayes Classifier gi(x) = - R(i | x)

    (max. discriminant corresponds to min. risk!)

    • For the minimum error rate, we take gi(x) = P(i | x)

    (max. discrimination corresponds to max.posterior!)

    gi(x) P(x | i) P(i)

    gi(x) = ln P(x | i) + ln P(i)(ln: natural logarithm!)

  • Classifiers, Discriminant Functions

    Pattern Classification, Chapter 2 (Part 1)

    22

  • Decision Surfaces

    Pattern Classification, Chapter 2 (Part 1)

    23

  • Pattern Classification, Chapter 2 (Part 1)

    24

    Error Probabilities

    • Two-category case

    • c-Category case

    dxPxpdxPxp

    PRxPPRxP

    RxPRxPerrorP

    RR

    )()()()(

    )()()()(

    ),(),()(

    2211

    221112

    2112

    12

    c

    i R

    iii

    iii

    c

    ii

    c

    ii

    i

    dxPRxp

    PRxPRxPcorrectP

    1

    11

    )()(

    )()(),()(

  • Pattern Classification, Chapter 2 (Part 1)

    25Error Bounds

    • Bayes Decision Rule – Lowest error rate• Question: What is the actual probability of error?

    • Gaussian Case

    • Chernoff Bound

    • Bhattacharyya Bound