abstract = "Algorithms that learn through environmental
interaction and delayed rewards, or reinforcement
learning, increasingly face the challenge of scaling to
dynamic, high-dimensional environments. Video games
model these types of real-world decision-making and
control scenarios while being simple enough to
implement within experiments. This work demonstrates
how emergent modularity and open-ended evolution allow
genetic programming (GP) to discover strategies for
difficult gaming scenarios while maintaining relatively
low model complexity. Two related learning algorithms
are considered: Policy Trees and Tangled Program Graphs
(TPG). In the case of Policy Trees, a methodology for
transfer learning is proposed which specifically
leverages both structural and behavioural modularity in
the learner representation. The utility of the approach
is empirically evaluated in two challenging task
domains: RoboCup Soccer and Ms. Pac-Man. In RoboCup,
decision-making policies are first evolved for simple
subtasks and then reused within a policy hierarchy in
order to learn the more complex task of Half-Field
Offense. The same methodology is applied to Ms.
Pac-Man, in which case the use of task-agnostic
diversity maintenance enables the automatic discovery
of suitable sub-policies, removing the need for a prior
human-specified task decomposition. In both task
domains, the final GP decision-making policies reach
state-of-the-art levels of play while being
significantly less complex than solutions from temporal
difference methods and neuroevolution.
Tangled Program Graphs takes a more open-ended approach
to modularity, emphasizing the ability to adaptively
complexify policies through interaction with the task
environment. The challenging Atari video game
environment is used to show that this approach builds
decision-making policies that broadly match the quality
of several deep learning methods while being several
orders of magnitude less computationally demanding,
both in terms of sample efficiency and model
complexity. Finally, the approach is capable of
evolving solutions to multiple game titles
simultaneously with no additional computational cost.
In this case, agent behaviours for an individual game
as well as single agents capable of playing up to 5
games emerge from the same evolutionary run.",