Multi-objective Genetic Programming for Explainable Reinforcement Learning

Videau, Mathurin; Leite, Alessandro; Teytaud, Olivier; Schoenauer, Marc

doi:10.1007/978-3-031-02056-8_18

Mathurin Videau¹¹,
Alessandro Leite¹¹,
Olivier Teytaud¹² &
…
Marc Schoenauer¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13223))

Included in the following conference series:

European Conference on Genetic Programming (Part of EvoStar)

1033 Accesses
4 Citations

Abstract

Deep reinforcement learning has met noticeable successes recently for a wide range of control problems. However, this is typically based on thousands of weights and non-linearities, making solutions complex, not easily reproducible, uninterpretable and heavy. The present paper presents genetic programming approaches for building symbolic controllers. Results are competitive, in particular in the case of delayed rewards, and the solutions are lighter by orders of magnitude and much more understandable.

M. Videau—Now at Meta AI Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This DPS benchmark has been merged in Nevergrad, and our experiments can be reproduced by «python −m nevergrad.benchmark gp \(--\)num_workers = 60 \(--\)plot».

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: ICML, p. 1 (2004)
Google Scholar
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Article Google Scholar
Arrieta, A.B., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. IF 58, 82–115 (2020)
Google Scholar
Auger, A., Schoenauer, M., Teytaud, O.: Local and global order 3/2 convergence of a surrogate evolutionary algorithm. In: GECCO, p. 8 (2005)
Google Scholar
Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. arXiv:1805.08328 (2018)
Beyer, H.G., Hellwig, M.: Controlling population size and mutation strength by meta-ES under fitness noise. In: FOGA, pp. 11–24 (2013)
Google Scholar
Biecek, P., Burzykowski, T.: Explanatory Model Analysis: Explore, Explain And Examine Predictive Models. CRC Press, Boca Raton (2021)
Book Google Scholar
Brameier, M.F., Banzhaf, W.: Linear Genetic Programming. Springer, Cham (2007). https://doi.org/10.1007/978-0-387-31030-5
Book MATH Google Scholar
Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)
Cazenave, T.: Nested Monte-Carlo search. In: IJCAI (2009)
Google Scholar
Cazenille, L.: QDpy: a python framework for quality-diversity (2018). bit.ly/3s0uyVv
Coppens, Y., Efthymiadis, K., Lenaerts, T., Nowé, A., Miller, T., Weber, R., Magazzeni, D.: Distilling deep reinforcement learning policies in soft decision trees. In: CEX Workshop, pp. 1–6 (2019)
Google Scholar
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)
Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. JMLR 6, 503–556 (2005)
MathSciNet MATH Google Scholar
Flageat, M., Cully, A.: Fast and stable map-elites in noisy domains using deep grids. In: ALIFE, pp. 273–282 (2020)
Google Scholar
Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. JMLR 13, 2171–2175 (2012)
MathSciNet Google Scholar
Gaier, A., Asteroth, A., Mouret, J.B.: Data-efficient exploration, optimization, and modeling of diverse designs through surrogate-assisted illumination. In: GECCO, pp. 99–106 (2017)
Google Scholar
Gilpin, L., Bau, D., Yuan, B., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an approach to evaluating interpretability of ML. arXiv:1806.00069 (2018)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML, pp. 1861–1870 (2018)
Google Scholar
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. ECO 11(1), 1–10 (2003)
Google Scholar
Hein, D., et al.: A benchmark environment motivated by industrial control problems. In: IEEE SSCI, pp. 1–8 (2017)
Google Scholar
Hein, D., Udluft, S., Runkler, T.A.: Interpretable policies for reinforcement learning by genetic programming. Eng. App. Artif. Intell. 76, 158–169 (2018)
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. JAIR 4, 237–285 (1996)
Article Google Scholar
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: GECCO, pp. 195–202 (2017)
Google Scholar
Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: IJCNN, pp. 1942–1948 (1995)
Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by means of Natural Evolution. MIT Press, Massachusetts (1992)
MATH Google Scholar
Kubalík, J., Žegklitz, J., Derner, E., Babuška, R.: Symbolic regression methods for reinforcement learning. arXiv:1903.09688 (2019)
Kwee, I., Hutter, M., Schmidhuber, J.: Gradient-based reinforcement planning in policy-search methods. In: Wiering, M.A. (ed.) EWRL. vol. 27, pp. 27–29 (2001)
Google Scholar
Landajuela, M., et al.: Discovering symbolic policies with deep reinforcement learning. In: ICML, pp. 5979–5989 (2021)
Google Scholar
Liu, G., Schulte, O., Zhu, W., Li, Q.: Toward interpretable deep reinforcement learning with linear model u-trees. In: ECML PKDD, pp. 414–429 (2018)
Google Scholar
Liventsev, V., Härmä, A., Petković, M.: Neurogenetic programming framework for explainable reinforcement learning. arXiv:2102.04231 (2021)
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS, pp. 4768–4777 (2017)
Google Scholar
Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D.: Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: ICDS, pp. 37–51 (2012)
Google Scholar
Mania, H., Guy, A., Recht, B.: Simple random search provides a competitive approach to reinforcement learning. arXiv:1803.07055 (2018)
Meunier, L., et al.: Black-box optimization revisited: Improving algorithm selection wizards through massive benchmarking. In: IEEE TEVC (2021)
Google Scholar
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Article MathSciNet Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: ICML, pp. 1928–1937 (2016)
Google Scholar
Mouret, J.B., Clune, J.: Illuminating search spaces by mapping elites. arXiv:1504.04909 (2015)
Pugh, J.K., Soros, L.B., Stanley, K.O.: Quality diversity: a new frontier for evolutionary computation. Front. Robot. AI 3, 40 (2016)
Article Google Scholar
Rapin, J., Teytaud, O.: Nevergrad - a gradient-free optimization platform (2018). bit.ly/3g8wghU
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you? Explaining the predictions of any classifier. In: SIGKDD, pp. 1135–1144 (2016)
Google Scholar
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: AISTATS, pp. 627–635 (2011)
Google Scholar
Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv:1907.01180 (2019)
Russell, S.: Learning agents for uncertain environments. In: COLT, pp. 101–103 (1998)
Google Scholar
Schoenauer, M., Ronald, E.: Neuro-genetic truck backer-upper controller. In: IEEE CEC, pp. 720–723 (1994)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Selvaraju, R.R., et al.: Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp. 618–626 (2017)
Google Scholar
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: ICML, pp. 3145–3153 (2017)
Google Scholar
Sigaud, O., Stulp, F.: Policy search in continuous action domains: an overview. arXiv:1803.04706 (2018)
Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. JGO 11(4), 341–359 (1997)
MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT press, Cambridge (2018)
MATH Google Scholar
Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. In: ICML, pp. 5045–5054 (2018)
Google Scholar
Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: GECCO, pp. 229–236 (2018)
Google Scholar
Zhang, H., Zhou, A., Lin, X.: Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis. Complex Intell. Syst. 6(3), 741–753 (2020). https://doi.org/10.1007/s40747-020-00175-y
Article Google Scholar

Download references

Acknowledgements

This research was partially funded by the European Commission within the HORIZON program (TRUST-AI Project, Contract No. 952060).

Author information

Authors and Affiliations

TAU, Inria Saclay, LISN, Paris, France
Mathurin Videau, Alessandro Leite & Marc Schoenauer
Meta AI Research, Paris, France
Olivier Teytaud

Authors

Mathurin Videau
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Leite
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Teytaud
View author publications
You can also search for this author in PubMed Google Scholar
Marc Schoenauer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathurin Videau .

Editor information

Editors and Affiliations

University of Trieste, Trieste, Italy
Eric Medvet
Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Gisele Pappa
Victoria University of Wellington, Wellington, New Zealand
Bing Xue

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Videau, M., Leite, A., Teytaud, O., Schoenauer, M. (2022). Multi-objective Genetic Programming for Explainable Reinforcement Learning. In: Medvet, E., Pappa, G., Xue, B. (eds) Genetic Programming. EuroGP 2022. Lecture Notes in Computer Science, vol 13223. Springer, Cham. https://doi.org/10.1007/978-3-031-02056-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-02056-8_18
Published: 13 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-02055-1
Online ISBN: 978-3-031-02056-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-objective Genetic Programming for Explainable Reinforcement Learning