Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.

Pattern Recognition:Baysian Decision Theory

Charles TappertSeidenberg School of CSIS, Pace

University

Pattern Classification

Most of the material in these slides was taken from the figures in Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2001

Baysian Decision Theory

Fundamental pure statistical approach Assumes relevant probabilities are

known perfectly Makes theoretically optimal decisions


Based on Bayes formula P(j| x) = p(x | j) P(j) / p(x)

which is easily derived from writing the joint probability density two ways

P(j , x) = P(j|x) p(x) P(j , x) = p(x|j) p(j)

Note: uppercase P(.) denotes a probability mass function and lowercase p(.) a density function

Bayes Formula

Bayes formula

P(j| x) = p(x | j) P(j) / p(x)

can be expressed informally in English as

posterior = likelihood x prior / evidence

and Bayes decision chooses the class j with the greatest posterior probability

Bayes Formula

Bayes formula: P(j| x) = p(x | j) P(j) / p(x) Bayes decision chooses class j with the

greatest P(j| x) Since p(x) is the same for all classes, greatest

P(j| x) means greatest p(x | j) P(j) Special case: if all classes are equally likely,

i.e. same P(j), we get a further simplification – greatest P(j| x) is greatest likelihood p(x | j)


Now, let’s look at the fish example of two classes – sea bass and salmon – and one feature – lightness

Let p(x | 1) and p(x | 2) describe the difference in lightness between populations of sea bass and salmon (see next slide)


In the previous slide, if the two classes are equally likely, we get the simplification – greatest posterior means greatest likelihood, and Bayes decision is to choose class 1 when p(x | 1) > p(x | 2), i.e. when lightness is > approximately 12.4

However, if the two classes are not equally likely, we get a case like the next slide

Baysian Parameter Estimation

Because the actual probabilities are rarely known, they are usually estimated after assuming the form of the distributions

The usually assumed form of the distributions is multivariate normal

Baysian Parameter Estimation

Assuming multivariate normal probability density functions, it is necessary to estimate for each pattern class Feature means Feature covariance matrices

Multivariate Normal Densities

Simplifying assumptions can be made for multivariate normal density functions Statistically independent features with

equal variances yields hyperplane decision surfaces

Equal covariance matrices for each class also yields hyperplane decision surfaces

Arbitrary normal distributions yields hyperquadric decision surfaces

Nonparametric Techniques

Probabilities are not known Two approaches

Estimate the density functions from sample patterns

Bypass probability estimation entirely

Use a non-parametric method Such as k-Nearest-Neighbor

k-Nearest-Neighbor

k-Nearest-Neighbor (k-NN) Method

Used where probabilities are not known Bypasses probability estimation entirely Easy to implement Asymptotic error never worst than twice

Baysian error Computationally intense, therefore slow

Simple PR System with k-NN

Good for feasibility studies – easy to implement

Typical procedural steps Extract feature measurements Normalize features to 0-1 range

Classify by k nearest neighbor Using Euclidean distance

minmax

min'xx

xxx

Simple PR System with k-NN (cont):

Two Modes of Operation

Leave-one-out procedure One input file of training/test patterns Repeatedly train on all samples except one

which is left for testing Good for feasibility study with little data Train and test on separate files One input file for training and one for testing Good for measuring performance change

when varying an independent variable (e.g., different keyboards for keystroke biometric)

Simple PR System with k-NN (cont)

Used in keystroke biometric studies Feasibility study – Dr. Mary Curtin Different keyboards/modes – Dr. Mary Villani

Used in other studies that used keystroke data Study of procedures for handling incomplete and

missing data – e.g., fallback procedures in the keystroke biometric system – Dr. Mark Ritzmann

New kNN-ROC procedures – Dr. Robert Zack Used in other biometric studies

Mouse movement – Larry Immohr Stylometry + keystroke study – John Stewart

Conclusions

Bayes decision method best if probabilities known

Bayes method okay if you are good with statistics and the form of the probability distributions can be assumed, especially if there is justification for simplifying assumptions like independent features

Otherwise, stay with easier to implement methods that provide reasonable results, like k-NN

Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.

Documents