Created by W.Langdon from gp-bibliography.bib Revision:1.8178
Symbolic Regression (SR) is an evolutionary optimization technique that automatically, without human intervention, generates analytical expressions to fit numerical data. The method has gained attention in the scientific community not only for its ability to recover known physical laws, but also for suggesting yet unknown but physically plausible and interpretable relationships. Additionally, the analytical nature of the result approximators allows to unleash the full power of mathematical apparatus.
This thesis aims to develop methods to integrate SR into RL in a fully continuous case. To accomplish this goal, the following original contributions to the field have been developed.
(i) Introduction of policy derivation methods. Their main goal is to exploit the full potential of using continuous action spaces, contrary to the state-of-the-art discretised set of actions.
(ii) Quasi-symbolic policy derivation (QSPD) algorithm, specifically designed to be used with a symbolic approximation of the value function. The goal of the proposed algorithm is to efficiently derive continuous policy out of symbolic approximator. The experimental evaluation indicated the superiority of QSPD over state-of-the-art methods.
(iii) Design of a symbolic proxy-function concept. Such a function is successfully used to alleviate the negative impacts of approximation artifacts on policy derivation.
(iv) Study on fitness criterion in the context of SR for RL. The analysis indicated a fundamental flaw with any other symmetric error functions, including commonly used mean squared error. Instead, a new error function procedure has been proposed alongside with a novel fitting procedure. The experimental evaluation indicated dramatic improvement of the approximation quality for both numerical and symbolic approximators.
(v) Robust symbolic policy derivation (RSPD) algorithm, which adds an extra level of robustness against imperfections in symbolic approximators. The experimental evaluation demonstrated significant improvements in the reachability of the goal state.
All these contributions are then combined into a single,efficient SR for RL (ESRL) framework. Such a framework is able to tackle high-dimensional, fully-continuous RL problems out-of-the-box. The proposed framework has been tested on three bench-marks: pendulum swing-up, magnetic manipulation, and high-dimensional drone strike benchmark.",
Supervisor: Olga Stepankova Supervisor-specialist: Robert Babuska",
Genetic Programming entries for Eduard Alibekov