Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model
Created by W.Langdon from
gp-bibliography.bib Revision:1.8178
- @InProceedings{Heywood:2019:GPTP,
-
author = "Robert J. Smith and Malcolm I. Heywood",
-
title = "Evolving a {Dota 2 Hero} Bot with a Probabilistic
Shared Memory Model",
-
booktitle = "Genetic Programming Theory and Practice XVII",
-
year = "2019",
-
editor = "Wolfgang Banzhaf and Erik Goodman and
Leigh Sheneman and Leonardo Trujillo and Bill Worzel",
-
pages = "345--366",
-
address = "East Lansing, MI, USA",
-
month = "16-19 " # may,
-
publisher = "Springer",
-
keywords = "genetic algorithms, genetic programming",
-
isbn13 = "978-3-030-39957-3",
-
DOI = "doi:10.1007/978-3-030-39958-0_17",
-
abstract = "Reinforcement learning (RL) tasks have often assumed a
Markov decision process, which is to say, state
information is complete, hence there is no need to
learn what to learn from. However, recent advances
(such as visual reinforcement learning) have enabled
the tasks typically addressed using RL to expand to
include significant amounts of partial observability.
This implies that the representation needs to support
multiple forms of memory, thus credit assignment needs
to: find efficient ways to encode high dimensional
data, as well has, determining under what conditions to
save and recall specific pieces of information, and for
what purpose. In this work, we assume the tangled
program graph (TPG) formulation for genetic
programming, where this has already demonstrated
competitiveness with deep learning solutions to
multiple RL tasks (under complete information). In this
work, TPG is augmented with indexed memory using a
probabilistic formulation of a write operation (defines
long and short term memory) and an indexed read.
Moreover, the register information specific to the
programs co-operating within a program is used to
provide the low dimensional encoding of state. We
demonstrate that TPG can then successfully evolve a
behaviour for a hero bot in the Dota 2 game engine when
playing in a single lane 1-on-1 configuration with the
game engine hero bot as the opponent. Specific
recommendations are made regarding the design of an
appropriate fitness function. We show that TPG without
indexed memory completely fails to learn any useful
behaviour. Only with indexed memory are useful hero
behaviours discovered.",
-
notes = "Part of \cite{Banzhaf:2019:GPTP}, published after the
workshop",
- }
Genetic Programming entries for
Robert J Smith
Malcolm Heywood
Citations