Abstract
Deep reinforcement learning has met noticeable successes recently for a wide range of control problems. However, this is typically based on thousands of weights and non-linearities, making solutions complex, not easily reproducible, uninterpretable and heavy. The present paper presents genetic programming approaches for building symbolic controllers. Results are competitive, in particular in the case of delayed rewards, and the solutions are lighter by orders of magnitude and much more understandable.
M. Videau—Now at Meta AI Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This DPS benchmark has been merged in Nevergrad, and our experiments can be reproduced by «python −m nevergrad.benchmark gp \(--\)num_workers = 60 \(--\)plot».
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: ICML, p. 1 (2004)
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Arrieta, A.B., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. IF 58, 82–115 (2020)
Auger, A., Schoenauer, M., Teytaud, O.: Local and global order 3/2 convergence of a surrogate evolutionary algorithm. In: GECCO, p. 8 (2005)
Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. arXiv:1805.08328 (2018)
Beyer, H.G., Hellwig, M.: Controlling population size and mutation strength by meta-ES under fitness noise. In: FOGA, pp. 11–24 (2013)
Biecek, P., Burzykowski, T.: Explanatory Model Analysis: Explore, Explain And Examine Predictive Models. CRC Press, Boca Raton (2021)
Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer, Cham (2007). https://doi.org/10.1007/978-0-387-31030-5
Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)
Cazenave, T.: Nested Monte-Carlo search. In: IJCAI (2009)
Cazenille, L.: QDpy: a python framework for quality-diversity (2018). bit.ly/3s0uyVv
Coppens, Y., Efthymiadis, K., Lenaerts, T., Nowé, A., Miller, T., Weber, R., Magazzeni, D.: Distilling deep reinforcement learning policies in soft decision trees. In: CEX Workshop, pp. 1–6 (2019)
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. JMLR 6, 503–556 (2005)
Flageat, M., Cully, A.: Fast and stable map-elites in noisy domains using deep grids. In: ALIFE, pp. 273–282 (2020)
Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. JMLR 13, 2171–2175 (2012)
Gaier, A., Asteroth, A., Mouret, J.B.: Data-efficient exploration, optimization, and modeling of diverse designs through surrogate-assisted illumination. In: GECCO, pp. 99–106 (2017)
Gilpin, L., Bau, D., Yuan, B., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an approach to evaluating interpretability of ML. arXiv:1806.00069 (2018)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp. 1861–1870 (2018)
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. ECO 11(1), 1–10 (2003)
Hein, D., et al.: A benchmark environment motivated by industrial control problems. In: IEEE SSCI, pp. 1–8 (2017)
Hein, D., Udluft, S., Runkler, T.A.: Interpretable policies for reinforcement learning by genetic programming. Eng. App. Artif. Intell. 76, 158–169 (2018)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. JAIR 4, 237–285 (1996)
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: GECCO, pp. 195–202 (2017)
Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: IJCNN, pp. 1942–1948 (1995)
Koza, J.R.: Genetic Programming: On the Programming of Computers by means of Natural Evolution. MIT Press, Massachusetts (1992)
Kubalík, J., Žegklitz, J., Derner, E., Babuška, R.: Symbolic regression methods for reinforcement learning. arXiv:1903.09688 (2019)
Kwee, I., Hutter, M., Schmidhuber, J.: Gradient-based reinforcement planning in policy-search methods. In: Wiering, M.A. (ed.) EWRL. vol. 27, pp. 27–29 (2001)
Landajuela, M., et al.: Discovering symbolic policies with deep reinforcement learning. In: ICML, pp. 5979–5989 (2021)
Liu, G., Schulte, O., Zhu, W., Li, Q.: Toward interpretable deep reinforcement learning with linear model u-trees. In: ECML PKDD, pp. 414–429 (2018)
Liventsev, V., Härmä, A., Petković, M.: Neurogenetic programming framework for explainable reinforcement learning. arXiv:2102.04231 (2021)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS, pp. 4768–4777 (2017)
Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D.: Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: ICDS, pp. 37–51 (2012)
Mania, H., Guy, A., Recht, B.: Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 (2018)
Meunier, L., et al.: Black-box optimization revisited: Improving algorithm selection wizards through massive benchmarking. In: IEEE TEVC (2021)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML, pp. 1928–1937 (2016)
Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv:1504.04909 (2015)
Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI 3, 40 (2016)
Rapin, J., Teytaud, O.: Nevergrad - a gradient-free optimization platform (2018). bit.ly/3g8wghU
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you? Explaining the predictions of any classifier. In: SIGKDD, pp. 1135–1144 (2016)
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: AISTATS, pp. 627–635 (2011)
Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv:1907.01180 (2019)
Russell, S.: Learning agents for uncertain environments. In: COLT, pp. 101–103 (1998)
Schoenauer, M., Ronald, E.: Neuro-genetic truck backer-upper controller. In: IEEE CEC, pp. 720–723 (1994)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Selvaraju, R.R., et al.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: ICML, pp. 3145–3153 (2017)
Sigaud, O., Stulp, F.: Policy search in continuous action domains: an overview. arXiv:1803.04706 (2018)
Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. JGO 11(4), 341–359 (1997)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT press, Cambridge (2018)
Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. In: ICML, pp. 5045–5054 (2018)
Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: GECCO, pp. 229–236 (2018)
Zhang, H., Zhou, A., Lin, X.: Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis. Complex Intell. Syst. 6(3), 741–753 (2020). https://doi.org/10.1007/s40747-020-00175-y
Acknowledgements
This research was partially funded by the European Commission within the HORIZON program (TRUST-AI Project, Contract No. 952060).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Videau, M., Leite, A., Teytaud, O., Schoenauer, M. (2022). Multi-objective Genetic Programming for Explainable Reinforcement Learning. In: Medvet, E., Pappa, G., Xue, B. (eds) Genetic Programming. EuroGP 2022. Lecture Notes in Computer Science, vol 13223. Springer, Cham. https://doi.org/10.1007/978-3-031-02056-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-02056-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02055-1
Online ISBN: 978-3-031-02056-8
eBook Packages: Computer ScienceComputer Science (R0)