Reinforcement learning with self-modifying policies
Created by W.Langdon from
gp-bibliography.bib Revision:1.8081
- @InCollection{Schmidhuber:1997:Thrun,
-
author = "Juergen Schmidhuber and Jieyu Zhao and
Nicol N. Schraudolph",
-
title = "Reinforcement learning with self-modifying policies",
-
booktitle = "Learning to learn",
-
publisher = "Kluwer",
-
year = "1997",
-
editor = "S. Thrun and L. Pratt",
-
pages = "293--309",
-
keywords = "genetic algorithms, genetic programming",
-
URL = "ftp://ftp.idsia.ch/pub/juergen/ssabook.pdf",
-
URL = "http://www.idsia.ch/~juergen/ssabook/ssabook.html",
-
abstract = "A learner's modifiable components are called its
policy. An algorithm that modifies the policy is a
learning algorithm. If the learning algorithm has
modifiable components represented as part of the
policy, then we speak of a self-modifying policy (SMP).
SMPs can modify the way they modify themselves etc.
They are of interest in situations where the initial
learning algorithm itself can be improved by experience
-- this is what we call ``learning to learn''. How can
we force some (stochastic) SMP to trigger better and
better self-modifications? The success-story algorithm
(SSA) addresses this question in a lifelong
reinforcement learning context. During the learner's
life-time, SSA is occasionally called at times computed
according to SMP itself. SSA uses backtracking to undo
those SMP-generated SMP-modifications that have not
been empirically observed to trigger lifelong reward
accelerations (measured up until the current SSA call
-- this evaluates the long-term effects of
SMP-modifications setting the stage for later
SMP-modifications). SMP-modifications that survive SSA
represent a lifelong success history. Until the next
SSA call, they build the basis for additional
SMP-modifications. Solely by self-modifications our
SMP/SSA-based learners solve a complex task in a
partially observable environment (POE) whose state
space is far bigger than most reported in the POE
literature.",
- }
Genetic Programming entries for
Jurgen Schmidhuber
Jieyu Zhao
Nicol N Schraudolph
Citations