abstract = "Generalization is a very important issue in Machine
Learning. In this paper, we present a new idea for
improving Genetic Programming generalization ability.
The idea is based on a dynamic two-layered selection
algorithm and it is tested on a real-life drug
discovery regression application. The algorithm begins
using root mean squared error as fitness and the usual
tournament selection. A list of individuals called
``repulsors'' is also kept in memory and initialized as
empty. As an individual is found to overfit the
training set, it is inserted into the list of
repulsors. When the list of repulsors is not empty,
selection becomes a two-layer algorithm: individuals
participating to the tournament are not randomly chosen
from the population but are themselves selected, using
the average dissimilarity to the repulsors as a
criterion to be maximized. Two kinds of
similarity/dissimilarity measures are tested for this
aim: the well known structural (or edit) distance and
the recently defined subtree crossover based similarity
measure. Although simple, this idea seems to improve
Genetic Programming generalization ability and the
presented experimental results show that Genetic
Programming generalizes better when subtree crossover
based similarity measure is used, at least for the test
problems studied in this paper.",
notes = "GECCO-2009 A joint meeting of the eighteenth
international conference on genetic algorithms
(ICGA-2009) and the fourteenth annual genetic
programming conference (GP-2009).