abstract = "We consider the problem of optimizing a controller for
agents whose observation and action spaces are
continuous, i.e., where the controller is a
multivariate real function. We use genetic programming
(GP) for solving this optimization problem. Namely, we
employ a multi-tree-based GP variant, where a candidate
solution is an array of m trees, each encoding a
univariate function of the agent observation. We
compare this form of optimization against the more
common one where the controller is a multi-layer
perceptron, with a predefined topology, whose weights
are optimized through (neuro)evolution (NE). Moreover,
we consider an evolutionary algorithm, GraphEA, that
directly evolves graphs, each having n input nodes and
m output nodes. We apply these three approaches to the
case of simulated modular soft robots, where a robot is
an aggregation of identical soft modules, each
employing a controller that processes the local
observation and produces the local action. We find
that, in our scenario, multi-tree-based GP is
competitive with NE and tends to produce different
behaviours. We then experimentally investigate the
possibility of optimising a controller using another,
pre-optimized one, as teacher, i.e., we realize a form
of offline imitation learning. We consider all the
teacher-learner pairs resulting from the three
evolutionary algorithms and find that NE is a better
learner than GP and GraphEA. However, controllers
obtained through offline imitation learning are far
less effective than those obtained through direct
evolution. We hypothesize that this gap in
effectiveness may be explained by the possibility,
given by direct evolution, of exploring during the
simulations a larger portion of the observation-action
space.",