A Bootstrap Interval Estimator for Bayes’ Classification Error

A Bootstrap Interval Estimator for Bayes’ Classification ErrorChad M. Hawesa,b, Carey E. Priebea

a The Johns Hopkins University, Dept of Applied Mathematics & Statisticsb The Johns Hopkins University Applied Physics Laboratory

Abstract PMH Distribution• Given finite length classifier training set, we propose a new estimation

approach that provides an interval estimate of the Bayes’-optimal classification error L*, by:• Assuming power-law decay for unconditional error rate of k-

nearest neighbor (kNN) classifier• Constructing bootstrap-sampled training sets of varying size• Evaluating kNN classifier on bootstrap training sets to estimate

unconditional error rate• Fitting resulting kNN error rate decay as function of training set

size to assumed power-law form• Standard kNN rule provides upper bound on L*• Hellman’s (k,k’) nearest neighbor rule with reject option provides

lower bound on L*• Result is asymptotic interval estimate of L* using finite sample• We apply this L* interval estimator to two classification datasets

Motivation Approach: Part 1 Pima Indians• Knowledge of Bayes’-optimal classification error L* tells us the best

any classification rule could do on a given classification problem:• Difference between your classifier’s error rate Ln and L* indicates

how much improvement is possible by changes to your classifier, for a fixed feature set

• If L* is small and |Ln-L*| is large, then it’s worth spending time & money to improve your classifier

• Knowledge of Bayes’-optimal classification error L* indicates how good our features are for discriminating between our (two) classes:

• If L* is large and |Ln-L*| is small, then better to spend time & money finding better features (changing FXY) than improving your classifier

• Estimate of Bayes’ error L* is useful for guiding where to invest time & money for classification rule improvement and feature development

Theory

Model & NotationWe have training data:

Conditional probability of error for kNN rule:Finite sample:Asymptotic:

Feature Vector:Class Label:

We have testing data:

We build k-nearest neighbor (kNN) classification rule:

denoted as

Unconditional probability of error for kNN rule:Finite sample:Asymptotic:

Empirical distribution puts mass 1/n on n training samples

• No approach to estimate Bayes’ error can work for all joint distributions FXY:• Devroye 1982: For any (fixed) integer n, e>0, and classification rule gn there

exists a distribution FXY with Bayes’ error L*=0 such that

there exist conditions on FXY for which our technique applies

• Asymptotic kNN-rule error rates form an interval bound on L*:• Devijver 1979: For fixed k: , where lower bound is

asymptotic error rate of the kNN-rule with reject option (Hellman 1970) if estimate asymptotic rates w/ finite sample, we have L* estimate

• KNN-rule’s unconditional error follows known form for class of distributions FXY:

• Snapp & Venkatesh 1998: Under regularity conditions on FXY, the finite sample unconditional error rate of the kNN-rule, for fixed k, follows the asymptotic expansion

there exists known parametric form for kNN-rule’s error rate decay

1. Construct B bootstrap-sampled training datasets of size nj from Dn using • For each bootstrap-constructed training dataset, estimate kNN-rule

conditional error rate on test set Tm, yielding

2. Estimate mean & variance of for training sample size nj:

• Mean provides estimate of unconditional error rate• Variance used for weighted fitting of error rate decay curve

3. Repeat steps 1 and 2 for desired training sample sizes : • Yields estimates

4. Construct estimated unconditional error rate decay curve versus training sample size n

Approach: Part 21. Assume kNN-rule error rates decay according to simple power-lay form:

2. Perform weighted nonlinear least squares fit to constructed error rate curve:• Use variance of bootstrapped conditional error rate estimates as weights

3. Resulting forms upper bound for L*: • Strong assumption on form of error rate decay enables estimate of

asymptotic error rate using only a finite sample

4. Repeat entire procedure using Hellman’s (k,k’) nearest neighbor rule with reject option to form lower bound estimate for L*: • This yields interval estimate for Bayes’ classification error as

Priebe, Marchette, Healy (PMH) distribution has known L* = 0.0653, d=6:

Training size n = 2000

Test set size m = 2000

Symbols are bootstrap estimates of unconditional error rate

Interval estimate:

UCI Pima Indian Diabetes distribution has unknown L*, d=8:

Training size n = 500

Test set size m = 268

Symbols are bootstrap estimates of unconditional error rate

Interval estimate:

References[1] Devijver, P. “New error bounds with the nearest neighbor rule,” IEEE Trans. on Informtion Theory, 25, 1979.

[2] Devroye, L. “Any discrimination rule can have an arbitrarily bad probability of error for finite sample size,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 4, 1982.

[3] Hellman, M. “The nearest neighbor classification rule with a reject option,” IEEE Trans. on Systems Science & Cybernetics, 6, 1970.

[4] Priebe, C., D. Marchette, & D. Healy. “Integrated sensing and processing decision trees,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 26, 2004.

[5] Snapp, R. & S. Venkatesh. “Asymptotic expansions of the k nearest neighbor risk,” Annals of Statistics, 26, 1998.

A Bootstrap Interval Estimator for Bayes’ Classification Error

Documents

asymptotic error rate

knn error rate decay

classification rule

neighbor rule

classifiers error rate

bayes classification

asymptotic interval

bootstrap training sets