Genetic Programming for Reward Function Search
Created by W.Langdon from
gp-bibliography.bib Revision:1.8051
- @Article{Niekum:2010:ieeeTAMD,
-
author = "Scott Niekum and Andrew G. Barto and Lee Spector",
-
title = "Genetic Programming for Reward Function Search",
-
journal = "IEEE Transactions on Autonomous Mental Development",
-
year = "2010",
-
month = jun,
-
volume = "2",
-
number = "2",
-
pages = "83--90",
-
keywords = "genetic algorithms, genetic programming, agent
learning performance, genetic programming algorithm,
intrinsic motivation, nonstationary environment,
psychological notion, reinforcement learning, task
based reward function, learning (artificial
intelligence)",
-
ISSN = "1943-0604",
-
URL = "https://www.ece.uvic.ca/~bctill/papers/ememcog/Niekum_etal_2010.pdf",
-
DOI = "doi:10.1109/TAMD.2010.2051436",
-
size = "8 pages",
-
abstract = "Reward functions in reinforcement learning have
largely been assumed to be given as part of the problem
being solved by the agent. However, the psychological
notion of intrinsic motivation has recently inspired
inquiry into whether there exist alternate reward
functions that enable an agent to learn a task more
easily than the natural task-based reward function
allows. We present a genetic programming algorithm to
search for alternate reward functions that improve
agent learning performance. We present experiments that
show the superiority of these reward functions,
demonstrate the possible scalability of our method, and
define three classes of problems where reward function
search might be particularly useful: distributions of
environments, nonstationary environments, and problems
with short agent lifetimes.",
-
notes = "Hungry-Thirsty 6 by 6 closed square grid world
food/water sources can only be found in two of the four
corners (12 possibilities). Q-learning, GP teaching
'shaping' function applied as addition to usual RL
reward scheme. Markov. Single agent in one of
2*2*(2**36) states? GP has (may have) agent's hunger,
thirst, x,y co-ordinates, noise. Float only (much like
ordinary tree GP rather than PushGP). Although
statistically significant GP improved agents increase
seems slight in static case but more (Fig 4) in dynamic
cases and experiments when the agents are short
lived.
Java implementation of PushGP called Psh
http://github.com/jonklein/Psh
Also known as \cite{5473118}",
- }
Genetic Programming entries for
Scott Niekum
Andrew G Barto
Lee Spector
Citations