Skip to main content

Hindsight Experience Replay with Evolutionary Decision Trees for Curriculum Goal Generation

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14635))

  • 135 Accesses

Abstract

Reinforcement learning (RL) algorithms often require a significant number of experiences to learn a policy capable of achieving desired goals in multi-goal robot manipulation tasks with sparse rewards. Hindsight Experience Replay (HER) is an existing method that improves learning efficiency by using failed trajectories and replacing the original goals with hindsight goals that are uniformly sampled from the visited states. However, HER has a limitation: the hindsight goals are mostly near the initial state, which hinders solving tasks efficiently if the desired goals are far from the initial state. To overcome this limitation, we introduce a curriculum learning method called HERDT (HER with Decision Trees). HERDT uses binary DTs to generate curriculum goals that guide a robotic agent progressively from an initial state toward a desired goal. During the warm-up stage, DTs are optimized using the Grammatical Evolution algorithm. In the training stage, curriculum goals are then sampled by DTs to help the agent navigate the environment. Since binary DTs generate discrete values, we fine-tune these curriculum points by incorporating a feedback value (i.e., the Q-value). This fine-tuning enables us to adjust the difficulty level of the generated curriculum points, ensuring that they are neither overly simplistic nor excessively challenging. In other words, these points are precisely tailored to match the robot’s ongoing learning policy. We evaluate our proposed approach on different sparse reward robotic manipulation tasks and compare it with the state-of-the-art HER approach. Our results demonstrate that our method consistently outperforms or matches the existing approach in all the tested tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The physical interpretation of the achieved goal depends on the task at hand. For some robotic manipulation tasks, the robot needs to pick and place (Fig. 5b), push (Fig. 5c), or slide (Fig. 5d) an object. In this case, the achieved goal corresponds to the x-y-z position of the object. Conversely, if there is no object in the task (Fig. 5a), the achieved goal is defined as the position of the end-effector of the robot.

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)

    Google Scholar 

  2. Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv:1312.5602 (2013)

  3. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  4. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)

    Article  Google Scholar 

  5. Rajeswaran, A., Lowrey, K., Todorov, E.V., Kakade, S.M.: Towards generalization and simplicity in continuous control. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  6. Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang, M.H., Khatib, O. (eds.) Experimental Robotics IX. STAR, vol. 21, pp. 363–372. Springer, Heidelberg (2006). https://doi.org/10.1007/11552246_35

    Chapter  Google Scholar 

  7. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2019)

  8. Zakka, K., et al.: RoboPianist: a benchmark for high-dimensional robot control. arXiv:2304.04150 (2023)

  9. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: International Conference on Machine Learning. (1999)

    Google Scholar 

  10. Rengarajan, D., Vaidya, G., Sarvesh, A., Kalathil, D., Shakkottai, S.: Reinforcement learning with sparse rewards using guidance from offline demonstration. In: International Conference on Learning Representations (2022)

    Google Scholar 

  11. Andrychowicz, M., et al.: Hindsight experience replay. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  12. Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning, pp. 113–122. PMLR (2018)

    Google Scholar 

  13. Zhao, R., Sun, X., Tresp, V.: Maximum entropy-regularized multi-goal reinforcement learning. In: International Conference on Machine Learning, pp. 7553–7562. PMLR (2019)

    Google Scholar 

  14. Puiutta, E., Veith, E.M.S.P.: Explainable reinforcement learning: a survey. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2020. LNCS, vol. 12279, pp. 77–95. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57321-8_5

    Chapter  Google Scholar 

  15. Lipton, Z.C.: The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3), 31–57 (2018)

    Article  Google Scholar 

  16. Molnar, C.: Interpretable machine learning. Lulu. com (2020)

    Google Scholar 

  17. Coppens, Y., et al.: Distilling deep reinforcement learning policies in soft decision trees. In: IJCAI Workshop on Explainable Artificial Intelligence, pp. 1–6 (2019)

    Google Scholar 

  18. Bastani, O., Pu, Y., Solar-Lezama, A.: Verifiable reinforcement learning via policy extraction. Adv. Neural Inf. Process. Syst. 31 (2018)

    Google Scholar 

  19. Ding, Z., Hernandez-Leal, P., Ding, G.W., Li, C., Huang, R.: CDT: cascading decision trees for explainable reinforcement learning. arXiv:2011.07553 (2020)

  20. Roth, A.M., Topin, N., Jamshidi, P., Veloso, M.: Conservative Q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv:1907.01180 (2019)

  21. Hallawa, A., et al.: Evo-RL: evolutionary-driven reinforcement learning. In: Genetic and Evolutionary Computation Conference Companion, pp. 153–154 (2021)

    Google Scholar 

  22. Custode, L.L., Iacca, G.: Evolutionary learning of interpretable decision trees. IEEE Access 11, 6169–6184 (2023)

    Article  Google Scholar 

  23. Ferigo, A., Custode, L.L., Iacca, G.: Quality diversity evolutionary learning of decision trees. In: Symposium on Applied Computing, pp. 425–432. ACM/SIGAPP (2023)

    Google Scholar 

  24. Custode, L.L., Iacca, G.: Interpretable pipelines with evolutionary optimized modules for reinforcement learning tasks with visual inputs. In: Genetic and Evolutionary Computation Conference Companion, pp. 224–227 (2022)

    Google Scholar 

  25. Custode, L.L., Iacca, G.: A co-evolutionary approach to interpretable reinforcement learning in environments with continuous action spaces. In: IEEE Symposium Series on Computational Intelligence, pp. 1–8. IEEE (2021)

    Google Scholar 

  26. Crespi, M., Ferigo, A., Custode, L.L., Iacca, G.: A population-based approach for multi-agent interpretable reinforcement learning. Appl. Soft Comput. 147, 110758 (2023)

    Article  Google Scholar 

  27. Todorov, E., Erez, T., Tassa, Y.: MuJoCo: A physics engine for model-based control. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)

    Google Scholar 

  28. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    Article  Google Scholar 

  29. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

  30. Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv:1802.09464 (2018)

  31. Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning. (2015)

    Google Scholar 

  32. Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930

    Chapter  Google Scholar 

  33. Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erdi Sayar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sayar, E., Vintaykin, V., Iacca, G., Knoll, A. (2024). Hindsight Experience Replay with Evolutionary Decision Trees for Curriculum Goal Generation. In: Smith, S., Correia, J., Cintrano, C. (eds) Applications of Evolutionary Computation. EvoApplications 2024. Lecture Notes in Computer Science, vol 14635. Springer, Cham. https://doi.org/10.1007/978-3-031-56855-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56855-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56854-1

  • Online ISBN: 978-3-031-56855-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics