The University of Montana
Department of Mathematical Sciences
Technical report #21/2008
Exact bootstrap k-nearest neighbor learners
Brian M. Steele
Dept. of Mathematical Sciences
The University of Montana
Missoula MT 59812
Abstract
Bootstrap aggregation, or bagging, is a method of reducing the prediction
error of a statistical learner. The goal of bagging is to construct a new learner which is
the expectation of the original learner with respect to the empirical distribution function. In nearly all cases, the expectation cannot be computed analytically, and bootstrap
sampling is used to produce an approximation. The k-nearest neighbor learners are
exceptions to this generalization, and exact bagging of many k-nearest neighbor learners
is straightforward. This article presents computationally simple and fast formulae for exact bagging of k-nearest neighbor learners and extends exact bagging methods from the
conventional bootstrap sampling (sampling n observations with replacement from a set
of n observations) to bootstrap sub-sampling schemes (with and without replacement).
In addition, a partially exact k-nearest neighbor regression learner is developed. The
article also compares the prediction error associated with elementary and exact bagging
k-nearest neighbor learners, and several other ensemble methods using a suite of publicly
available data sets.
Keywords: bagging, k-nearest neighbor, classication, regression, ensemble methods
AMS Subject Classification:
Download
Technical Report: pdf (171 KB)