Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University
Pattern Recognition:Baysian Decision Theory
Charles TappertSeidenberg School of CSIS, Pace
University
Pattern Classification
Most of the material in these slides was taken from the figures in Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2001
Baysian Decision Theory
Fundamental pure statistical approach Assumes relevant probabilities are
known perfectly Makes theoretically optimal decisions
Baysian Decision Theory
Based on Bayes formula P(j| x) = p(x | j) P(j) / p(x)
which is easily derived from writing the joint probability density two ways
P(j , x) = P(j|x) p(x) P(j , x) = p(x|j) p(j)
Note: uppercase P(.) denotes a probability mass function and lowercase p(.) a density function
Bayes Formula
Bayes formula
P(j| x) = p(x | j) P(j) / p(x)
can be expressed informally in English as
posterior = likelihood x prior / evidence
and Bayes decision chooses the class j with the greatest posterior probability
Bayes Formula
Bayes formula: P(j| x) = p(x | j) P(j) / p(x) Bayes decision chooses class j with the
greatest P(j| x) Since p(x) is the same for all classes, greatest
P(j| x) means greatest p(x | j) P(j) Special case: if all classes are equally likely,
i.e. same P(j), we get a further simplification – greatest P(j| x) is greatest likelihood p(x | j)
Baysian Decision Theory
Now, let’s look at the fish example of two classes – sea bass and salmon – and one feature – lightness
Let p(x | 1) and p(x | 2) describe the difference in lightness between populations of sea bass and salmon (see next slide)
Baysian Decision Theory
In the previous slide, if the two classes are equally likely, we get the simplification – greatest posterior means greatest likelihood, and Bayes decision is to choose class 1 when p(x | 1) > p(x | 2), i.e. when lightness is > approximately 12.4
However, if the two classes are not equally likely, we get a case like the next slide
Baysian Parameter Estimation
Because the actual probabilities are rarely known, they are usually estimated after assuming the form of the distributions
The usually assumed form of the distributions is multivariate normal
Baysian Parameter Estimation
Assuming multivariate normal probability density functions, it is necessary to estimate for each pattern class Feature means Feature covariance matrices
Multivariate Normal Densities
Simplifying assumptions can be made for multivariate normal density functions Statistically independent features with
equal variances yields hyperplane decision surfaces
Equal covariance matrices for each class also yields hyperplane decision surfaces
Arbitrary normal distributions yields hyperquadric decision surfaces
Nonparametric Techniques
Probabilities are not known Two approaches
Estimate the density functions from sample patterns
Bypass probability estimation entirely
Use a non-parametric method Such as k-Nearest-Neighbor
k-Nearest-Neighbor
k-Nearest-Neighbor (k-NN) Method
Used where probabilities are not known Bypasses probability estimation entirely Easy to implement Asymptotic error never worst than twice
Baysian error Computationally intense, therefore slow
Simple PR System with k-NN
Good for feasibility studies – easy to implement
Typical procedural steps Extract feature measurements Normalize features to 0-1 range
Classify by k nearest neighbor Using Euclidean distance
minmax
min'xx
xxx
Simple PR System with k-NN (cont):
Two Modes of Operation
Leave-one-out procedure One input file of training/test patterns Repeatedly train on all samples except one
which is left for testing Good for feasibility study with little data Train and test on separate files One input file for training and one for testing Good for measuring performance change
when varying an independent variable (e.g., different keyboards for keystroke biometric)
Simple PR System with k-NN (cont)
Used in keystroke biometric studies Feasibility study – Dr. Mary Curtin Different keyboards/modes – Dr. Mary Villani
Used in other studies that used keystroke data Study of procedures for handling incomplete and
missing data – e.g., fallback procedures in the keystroke biometric system – Dr. Mark Ritzmann
New kNN-ROC procedures – Dr. Robert Zack Used in other biometric studies
Mouse movement – Larry Immohr Stylometry + keystroke study – John Stewart
Conclusions
Bayes decision method best if probabilities known
Bayes method okay if you are good with statistics and the form of the probability distributions can be assumed, especially if there is justification for simplifying assumptions like independent features
Otherwise, stay with easier to implement methods that provide reasonable results, like k-NN