abstract = "Symbolic regression based on Pareto Front GP is the
key approach for generating high-performance
parsimonious empirical models acceptable for industrial
applications. The paper addresses the issue of finding
the optimal parameter settings of Pareto Front GP which
direct the simulated evolution toward simple models
with acceptable prediction error. A generic methodology
based on statistical design of experiments is proposed.
It includes statistical determination of the number of
replicates by half-width confidence intervals,
determination of the significant inputs by fractional
factorial design of experiments, approaching the
optimum by steepest ascent/descent, and local
exploration around the optimum by Box Behnken or by
central composite design of experiments. The results
from implementing the proposed methodology to a
small-sized industrial data set show that the
statistically significant factors for symbolic
regression, based on Pareto Front GP, are the number of
cascades, the number of generations, and the population
size. A second order regression model with high R2 of
0.97 includes the three parameters and their optimal
values have been defined. The optimal parameter
settings were validated with a separate small sized
industrial data set. The optimal settings are
recommended for symbolic regression applications using
data sets with up to 5 inputs and up to 50 data
points.",
notes = "GECCO-2006 A joint meeting of the fifteenth
international conference on genetic algorithms
(ICGA-2006) and the eleventh annual genetic programming
conference (GP-2006).