Here's the "0.632 bootstrap":
Given a dataset D containing N training instances, generate a dataset D' by randomly selecting N instances from D with replacement, that is,Note that the selection is always made from the original D. This is what "sampling with replacement" means.initialize D' as an empty set repeat N times: randomly select an instance x from D add x to D'
=> P(given instance not chosen in any of the N passes) = (1-1/N)N = 1/e (approx.) = 0.368 => P(given instance is chosen) = 1 - 0.368 = 0.632The above may be used to evaluate a classifier: a 0.632 bootstrap sample is used for training, and the remaining 0.368N or so instances are used for testing.
Bagging (prediction algorithm A, dataset D, iterations T)1) model generation
for i = 1 to T: generate a bootstrap sample D(i) from D let M(i) be result of training A on D(i)2) prediction for a given test instance x
for i = 1 to T: let C(i) = output of M(i) on x return class that appears most often among C(1)..C(T)
the classifier algorithm's intrinsic "bias", i.e. its error rate on an infinitely large training datasetThe classifier's error rate is the sum of these two components.the "variance" term associated with variations in training datasets of a given finite size
Breiman (1996) suggested that bagging works by reducing the variance component of the error. Experiments carried out by Bauer and Kohavi (1999) confirm this statement for "unstable" learning techniques such as C4.5-like decision tree learners. However, for "stable" learners such as naive Bayes, bagging can actually increase the variance, as the size of the training dataset used for each learner is effectively reduced by bootstrap sampling.