Interpreting Tangled Program Graphs Under Partially Observable Dota 2 Invoker Tasks
Created by W.Langdon from
gp-bibliography.bib Revision:1.7970
- @Article{Smith:2024:TAI,
-
author = "Robert J. Smith and Malcolm I. Heywood",
-
title = "Interpreting Tangled Program Graphs Under Partially
Observable Dota 2 Invoker Tasks",
-
journal = "IEEE Transactions on Artificial Intelligence",
-
year = "2024",
-
volume = "5",
-
number = "4",
-
pages = "1511--1524",
-
month = apr,
-
keywords = "genetic algorithms, genetic programming, Task
analysis, Artificial intelligence, AI, XAI,
Reinforcement learning, Registers, Complexity theory,
Visualization, Emergence, evolutionary computation,
interpretable machine learning",
-
ISSN = "2691-4581",
-
DOI = "doi:10.1109/TAI.2023.3279057",
-
size = "14 pages",
-
abstract = "Interpretable learning agents directly construct
models that provide insight into the relationships
learnt. Moreover, to date, there has been a lot of
emphasis on interpreting reactive models developed for
supervised learning tasks. In this article, we consider
the case of models developed to address a suite of six
partially observable tasks defined in the Dota 2 Online
Battle Arena game engine. This means that learning
agents need to make decisions based on the previous
state as developed by the learning agents memory, in
addition to a 310-D state vector provided by the game
engine. Interpretability is addressed by adopting the
tangled program graph approach to developing learning
agents. Thus, decision making is explicitly
divide-and-conquer, with different parts of the
resulting graph visited depending on the task context.
We demonstrate that programs comprising the tangled
program graph approach self-organize such that: 1)
small subsets of task features are identified to define
conditions under which index memory is written and 2)
the subset of programs responsible for defining actions
typically query indexed memory rather than task
features. Particular preferences emerge for different
tasks; thus, the blocking (or evasion) tasks result in
a preference for specific actions, whereas more
open-ended tasks assume policies based on combinations
of behaviors. In short, the ability to evolve the
topology of the learning agent provides insights into
how the policies are being constructed for addressing
partially observable tasks.",
-
notes = "also known as \cite{10133893}",
- }
Genetic Programming entries for
Robert J Smith
Malcolm Heywood
Citations