abstract = "Historically, the quality of a solution in Genetic
Programming (GP) was often assessed based on its
performance on a given training sample. However, in
Machine Learning, we are more interested in achieving
reliable estimates of the quality of the evolving
individuals on unseen data. In this paper, we propose
to simulate the effect of unseen data during training
without actually using any additional data. We do this
by employing a technique called bootstrapping that
repeatedly re-samples with replacement from the
training data and helps estimate sensitivity of the
individual in question to small variations across these
re-sampled data sets. We minimise this sensitivity, as
measured by the Bootstrap Standard Error, together with
the training error, in an effort to evolve models that
generalise better to the unseen data.
We evaluate the proposed technique on four binary
classification problems and compare with a standard GP
approach. The results show that for the problems
undertaken, the proposed method not only generalises
significantly better than standard GP while the
training performance improves, but also demonstrates a
strong side effect of containing the tree sizes.",