Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.

Performance Evaluation in Computer Vision

Kyungnam KimComputer Vision Lab,

University of Maryland, College Park

Contents Error Estimation in Pattern Recognition

Jain et al., “Statistical Pattern Recognition: A Review”, IEEE PAMI 2000 (Section 7 Error Estimation).

Assessing and Comparing Algorithms Adrian Clark and Christine Clark, “Performance

Characterization in Computer Vision: A Tutorial”. Receiver Operating Characteristic (ROC) curve Detection Error Trade-off (DET) curve Confusion Matrix McNemar’s test

http://peipa.essex.ac.uk/benchmark/

Error Estimation in Pattern Recognition Reference - Jain et al., “Statistical Pattern Recognition: A

Review”, IEEE PAMI 2000 (Section 7 Error Estimation).

It is very difficult to obtain a closed-form expression for error rate Pe.

In practice, the error rate must be estimated from all the available samples split into training and test sets.

Error estimate = percentage of misclassified test samples.

Reliable error estimate – (1) Large sample size, (2) Independent training and test samples.

Error Estimation in Pattern Recognition The error estimate (function of the specific training and

test sets used) is random variable. Given a classifier, t is # of misclassified test samples out

of n. The probability density function of t has a binomial distribution.

The maximum-likelihood estimate, Pe, of Pe is given by Pe=t/n,

with E(Pe) = Pe and Var(Pe) = Pe(1- Pe)/n. Pe is a random variable a confidence interval (shrink

as n increases)

versions of cross-validation approach

leave all in

resamplingbased on the analogypopulation samplesample sample

http://www.uvm.edu/~dhowell/StatPages/Resampling/Bootstrapping.htmlhttp://www.childrens-mercy.org/stats/ask/bootstrap.asphttp://www.cnr.colostate.edu/class_info/fw663/bootstrap.pdfhttp://www.maths.unsw.edu.au/ForStudents/courses/math3811/lecture9.pdf

Error Estimation in Pattern Recognition Receiver Operating Characteristic (ROC) Curve

detailed later.

‘Reject Rate’: reject doubtful patterns near the decision boundary (low confidence).

A well-known reject option is to reject a pattern if its maximum a posteriori probability is below a threshold.

Trade-off between ‘reject rate’ and ‘error rate’.

Next seminar: Dimensionality Reduction/Manifold Learning ?

classification method

Assessing and Comparing Algorithms Reference: Adrian Clark and Christine Clark, “Performance

Characterization in Computer Vision: A Tutorial”. http://peipa.essex.ac.uk/benchmark/tutorials/essex/tutorial.pdf

The same training and test sets. Some standard sets – FERET, PETS.

Simply to see which has the better success rate? Not enough. A standard statistical test, McNemar’s test is required.

Two types of testing: Technology evaluation: the response of an underlying generic

algorithm to factors such as adjustment of its tuning parameters, noisy input date, etc.

Application evaluation: how well an algorithm performs a particular task

Assessing and Comparing Algorithms

Receiver Operating Characteristic (ROC) curve

FPTN

FP rate positive false

FNTP

TP rate positive true


Detection Error Trade-off (DET) curve

- logarithmic scales on both axes- more spread out, easier to distinguish- close to linear


Detection Error Trade-off (DET) curve

- Forensic applications: track down a suspect- High security applications: ATM machines- EER (equal error rate)

- Comparisons of algorithms tend to be performed with a specific set of tuning parameter values (Running them with settings that correspond to the EER is probably the most sensible.)


Crossing ROC curves

Comparisons of algorithms tend to be performed with a specific set of tuning parameter values (Running them with settings that correspond to the EER is probably the most sensible.)


Confusion Matrices


McNemar’s testAn appropriate statistical test must take into account

not only # of FP, etc. but also ‘# of tests’.

(a form of chi-square test)http://www.zephryus.demon.co.uk/geography/resources/fieldwork/stats/chi.htmlhttp://www.isixsigma.com/dictionary/Chi_Square_Test-67.htm


McNemar’s testIf # of tests > 30, the central limit theorem applies

Performance Evaluation in Computer Vision Kyungnam Kim Computer Vision Lab, University of Maryland, College Park.

Documents