Created by W.Langdon from gp-bibliography.bib Revision:1.8194
In this thesis we investigate methods to select the optimal action when artificial neural networks are used to approximate the value function, through the application of numerical optimisation techniques. Although it has been stated in the literature that gradient-ascent methods can be applied to the action selection [47], it is also stated that solving this problem would be infeasible, and therefore, is claimed that it is necessary to use a second artificial neural network to approximate the policy function [21,55].
The major contributions of this thesis include the investigation of the applicability of action selection by numerical optimisation methods, including gradient-ascent along with other derivative-based and derivative-free numerical optimisation methods,and the proposal of two novel algorithms which are based on the application of two alternative action selection methods: NM-SARSA [40] and NelderMead-SARSA.
We empirically compare the proposed methods to state-of-the-art methods from the literature on three continuous state- and action-space control benchmark problems from the literature: minimum-time full swing-up of the Acrobot; Cart-Pole balancing problem; and a double pole variant. We also present novel results from the application of the existing direct policy search method genetic programming to the Acrobot benchmark problem [12, 14].",
Genetic Programming entries for Barry D Nichols