Created by W.Langdon from gp-bibliography.bib Revision:1.8178
In the context of machine learning, the need for interpretability stems from an incompleteness in the problem formalisation. Since complex real-world tasks in industry are almost never completely testable, enumerating all possible outputs given all possible inputs is infeasible. Hence, we usually are unable to flag all undesirable outputs. Especially for industrial systems, domain experts are more likely to deploy automatically learned controllers if they are understandable and convenient to assess. Moreover, novel legal frameworks such as the European Union General Data Protection Regulation enforce interpretability of personal data processing systems.
Two of the three novel reinforcement learning methods of this thesis learn policies represented as fuzzy rule-based controllers since fuzzy controllers have proven to serve as interpretable and efficient system controllers in industry for decades. The first method, called fuzzy particle swarm reinforcement learning (FPSRL), uses swarm intelligence to optimize parameters of a fixed fuzzy rule set, whereas the second method, called fuzzy genetic programming reinforcement learning (FGPRL), applies genetic programming to generate a new fuzzy set, including the optimization of all parameters, from available building blocks. Empirical studies on benchmark problems show that FPSRL has advantages regarding computational costs on rather simple problems, where prior expert knowledge about informative state features and rule numbers is available. However,experiments using an industrial benchmark show that FGPRL can automatically select the most informative state features as well as the most compact fuzzy rule representation for a certain level of performance. The third interpretable approach, called genetic programming reinforcement learning (GPRL), finally drops the constraint on learning rule-based policies by representing the policies as basic algebraic equations of low complexity. Experimental results show that the GPRL policies yield human-understandable and well-performing control results. Moreover, both FGPRL and GPRL return not just one solution to the problem but a whole Pareto front containing the best-performing solutions for many different levels of complexity.
Comparing the results from experiments of all three interpretable reinforcement learning approaches with the performance of standard neural fitted Q iteration, a novel model predictive control approach, and a non-interpretable neural network policy method gives a comprehensive overview on the performance of the methods as well as the interpretability of the produced policies. However, choosing the most interpretable form of presentation is highly subjective and depends on many prerequisites, like the application domain, the ability to visualize solutions, or successive processing steps, for example. Therefore, it is all the more important to have methods at hand which can search domain-specific policy representation spaces automatically. The empirical studies show that, combining model-based reinforcement learning with genetic programming, is a very promising approach to achieve this goal.",
In English. TUM
supervisor Thomas Runkler",
Genetic Programming entries for Daniel Hein