ECE 8443 – Pattern Recognition

ECE 8443 – Pattern RecognitionECE 8443 – Pattern Recognition

LECTURE 04: PERFORMANCE BOUNDS

• Objectives:Typical ExamplesPerformance BoundsROC Curves

• Resources:D.H.S.: Chapter 2 (Part 3)V.V.: Chernoff BoundJ.G.: BhattacharyyaT.T. : ROC CurvesNIST: DET Curves

Audio:URL:

http://www.rii.ricoh.com/~stork/DHSch2part3.ppt

http://www.cs.ucsd.edu/~klevchen/techniques/chernoff.pdf

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8443/lectures/current/lecture_04_errorbounds.pdf

http://gim.unmc.edu/dxtests/ROC1.htm



http://www.ece.msstate.edu/research/isip/publications/courses/ece_8443/lectures/current/lecture_04_det.pdf

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8443/lectures/current/lecture_04_det.pdf

http://www.openchannelfoundation.org/VascAlert/ROC.gif

http://www.isip.piconepress.com/publications/courses/ece_8443/lectures/2009_spring/lecture_04.mp3

http://www.isip.piconepress.com/publications/courses/ece_8443/lectures/2009_spring/lecture_04.pptx

http://www.isip.piconepress.com/publications/courses/ece_8443/lectures/2009_spring/lecture_04.pptx

ECE 8443: Lecture 04, Slide 2

• A classifier that places a pattern in one of two classes is often referred to as a dichotomizer.

• We can reshape the decision rule:

02121 )() - g(g) g() (g)(if g xxxxx• If we use log of the posterior probabilities:

)ln()ln()ln(

)()(

2

1

2

1

21

PP

pp

)g()f(

PP)g(

xx

xx

xxx

• A dichotomizer can be viewed as a machine that computes a single discriminant function and classifies x according to the sign (e.g., support vector machines).

Two-Category Case (Review)


Unconstrained or “Full” Covariance (Review)


• This has a simple geometric interpretation:

)()(

ln2 222

i

jji P

P

xx

• The decision region when the priors are equal and the support regions are spherical is simply halfway between the means (Euclidean distance).

Threshold Decoding (Review)


)ln(21

)()(ln)

21

)(21

)ln(21

)()(ln)

21

))(21

)]()()()[(21

)ln(21

)()(ln

11

j1111

11

1111

11

j

i

j

iiiijjj

jiiij

j

i

j

iiiijjj

jjiiij

jjjiii

j

i

j

iji

PPc

c

PP

PPgg

tt

tt

tt

tt

(

xxx

(

(

)(-)(

bA

:wherebA

xxx

xxxx

xx

tt

General Case for Gaussian Classifiers


• Case: i = 2I

)()(ln)

21

)(10

2

2

j

ii

tij

tj

ji

PPc

(

bA

• This can be rewritten as:

)()()(ln)(

21

0)(

2

2

0

0

jij

i

jiji

t

PP

x

xxw

Identity Covariance


• Case: i =

)()()(

)()(ln)(

21

10 jijiji

jiji

PP

t

x

Equal Covariances


Arbitrary Covariances


Typical Examples of 2D Classifiers

http://www.ece.msstate.edu/research/isip/projects/speech/software/demonstrations/applets/util/pattern_recognition/current/index.html


• Bayes decision rule guarantees lowest average error rate• Closed-form solution for two-class Gaussian distributions• Full calculation for high dimensional space difficult• Bounds provide a way to get insight into a problem and

engineer better solutions.

• Need the following inequality:100,],min[ 1 andbababa

Assume a b without loss of generality: min[a,b] = b.

Also, ab(1- ) = (a/b)b and (a/b) 1.

Therefore, b (a/b)b, which implies min[a,b] ab(1- ) .

• Apply to our standard expression for P(error).

Error Bounds


xxx

xxx

xxx

xxx

xx

xxxxx

dppPP

dpPpP

dpPpP

dpp

pPp

pP

dpPPerrorP

)()()()(

)()()()(

)]()(),()(min[

)(])(

)()(,

)()()(

min[

)()](),(min[)(

21

121

1

21

21

11

2211

2211

21

• Recall:

• Note that this integral is over the entire feature space, not the decision regions (which makes it simpler).

• If the conditional probabilities are normal, this expression can be simplified.

Chernoff Bound


• If the conditional probabilities are normal, our bound can be evaluated analytically:

))(exp()()( 21

1 kdpp xxx

where:

)1(21

21

121

2112

)1(ln

21

)(])1([)(2

)1()(

tk

• Procedure: find the value of that minimizes exp(-k( ), and then compute P(error) using the bound.

• Benefit: one-dimensional optimization using

Chernoff Bound for Normal Densities


• The Chernoff bound is loose for extreme values• The Bhattacharyya bound can be derived by = 0.5:

))(exp()()(

)()()()(

)()()()(

21

2121

21

121

1

kPP

dppPP

dppPP

xxx

xxx

where:

21

21

12121

122ln

21)(]

2[)(

81)(

tk

• These bounds can still be used if the distributions are not Gaussian (why? hint: Occam’s Razor). However, they might not be adequately tight.

Bhattacharyya Bound

http://en.wikipedia.org/wiki/Occam's_Razor


• How do we compare two decision rules if they require different thresholds for optimum performance?

• Consider four probabilities:

rejectioncorrect :)|(miss:)|(

alarm false:)|(hit:)|(

1*

2*

1*

2*

xxxxx

xxxxx

PxP

PxP

Receiver Operating Characteristic (ROC)


• An ROC curve is typically monotonic but not symmetric:

• One system can be considered superior to another only if its ROC curve lies above the competing system for the operating region of interest.

General ROC Curves


Summary• Gaussian Distributions: how is the shape of the decision region influenced by

the mean and covariance?• Bounds on performance (i.e., Chernoff, Bhattacharyya) are useful

abstractions for obtaining closed-form solutions to problems.• A Receiver Operating Characteristic (ROC) curve is a very useful way to

analyze performance and select operating points for systems.• Discrete features can be handled in a way completely analogous to

continuous features.

ECE 8443 – Pattern Recognition

Documents