A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawes a,b , Carey E. Priebe a a The Johns Hopkins University, Dept of Applied Mathematics & Statistics b The Johns Hopkins University Applied Physics Laboratory Abstract PMH Distribution • Given finite length classifier training set, we propose a new estimation approach that provides an interval estimate of the Bayes’-optimal classification error L*, by: • Assuming power-law decay for unconditional error rate of k-nearest neighbor (kNN) classifier • Constructing bootstrap-sampled training sets of varying size • Evaluating kNN classifier on bootstrap training sets to estimate unconditional error rate • Fitting resulting kNN error rate decay as function of training set size to assumed power-law form • Standard kNN rule provides upper bound on L* • Hellman’s (k,k’) nearest neighbor rule with reject option provides lower bound on L* • Result is asymptotic interval estimate of L* using finite sample • We apply this L* interval estimator to two classification datasets Motivatio n Approach: Part 1 Pima Indians • Knowledge of Bayes’-optimal classification error L* tells us the best any classification rule could do on a given classification problem: • Difference between your classifier’s error rate L n and L* indicates how much improvement is possible by changes to your classifier, for a fixed feature set • If L* is small and |L n -L*| is large, then it’s worth spending time & money to improve your classifier • Knowledge of Bayes’-optimal classification error L* indicates how good our features are for discriminating between our (two) classes: • If L* is large and |L n -L*| is small, then better to spend time & money finding better features (changing F XY ) than improving your classifier • Estimate of Bayes’ error L* is useful for guiding where to invest time & money for classification rule improvement and feature development Theory Model & Notation We have training data: Conditional probability of error for kNN rule: Finite sample: Asymptotic: Feature Vector: Class Label: We have testing data: We build k-nearest neighbor (kNN) classification rule: denoted as Unconditional probability of error for kNN rule: Finite sample: Asymptotic: Empirical distribution puts mass 1/n on n training samples • No approach to estimate Bayes’ error can work for all joint distributions F XY : • Devroye 1982 : For any (fixed) integer n, e>0, and classification rule g n there exists a distribution F XY with Bayes’ error L*=0 such that there exist conditions on F XY for which our technique applies • Asymptotic kNN-rule error rates form an interval bound on L*: • Devijver 1979 : For fixed k: , where lower bound is asymptotic error rate of the kNN-rule with reject option (Hellman 1970) if estimate asymptotic rates w/ finite sample, we have L* estimate • KNN-rule’s unconditional error follows known form for class of distributions F XY : • Snapp & Venkatesh 1998 : Under regularity conditions on F XY , the finite sample unconditional error rate of the kNN-rule, for fixed k, follows the asymptotic expansion there exists known parametric form for kNN-rule’s error rate decay 1. Construct B bootstrap-sampled training datasets of size n j from D n using • For each bootstrap-constructed training dataset, estimate kNN-rule conditional error rate on test set T m , yielding 2. Estimate mean & variance of for training sample size n j : • Mean provides estimate of unconditional error rate • Variance used for weighted fitting of error rate decay curve 3. Repeat steps 1 and 2 for desired training sample sizes : • Yields estimates 4. Construct estimated unconditional error rate decay curve versus training sample size n Approach: Part 2 1. Assume kNN-rule error rates decay according to simple power-lay form: 2. Perform weighted nonlinear least squares fit to constructed error rate curve: • Use variance of bootstrapped conditional error rate estimates as weights 3. Resulting forms upper bound for L*: • Strong assumption on form of error rate decay enables estimate of asymptotic error rate using only a finite sample 4. Repeat entire procedure using Hellman’s (k,k’) nearest neighbor rule with reject option to form lower bound estimate for L*: • This yields interval estimate for Bayes’ classification error as Priebe, Marchette, Healy (PMH) distribution has known L* = 0.0653 Training size n = 200 Test set size m = 200 Symbols are bootstra estimates of unconditional erro rate Interval estimate: UCI Pima Indian Diabetes distribution has unknown L*, d=8: Training size n = 500 Test set size m = 268 Symbols are bootstra estimates of unconditional erro rate Interval estimate: References [1] Devijver, P. “New error bounds with the nearest neighbor rule,” IEEE Tra Informtion Theory, 25, 1979. [2] Devroye, L. “Any discrimination rule can have an arbitrarily bad probabil for finite sample size,” IEEE Trans. on Pattern Analysis & Machine Intelligence , 4, 1 [3] Hellman, M. “The nearest neighbor classification rule with a reject opti Trans. on Systems Science & Cybernetics , 6, 1970. [4] Priebe, C., D. Marchette, & D. Healy. “Integrated sensing and processing trees,” IEEE Trans. on Pattern Analysis & Machine Intelligence , 26, 2004. [5] Snapp, R. & S. Venkatesh. “Asymptotic expansions of the k nearest neighb Annals of Statistics, 26, 1998.