Skip to main content

Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial

  • Chapter
  • First Online:

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

Tangled Program Graphs (TPG) represents a framework by which multiple programs can be organized to cooperate and decompose a task with minimal a priori information. TPG agents begin with least complexity and incrementally coevolve to discover a complexity befitting the nature of the task. Previous research has demonstrated the TPG framework under visual reinforcement learning tasks from the Arcade Learning Environment and VizDoom first person shooter game that are competitive with those from Deep Learning. However, unlike Deep Learning the emergent constructive properties of TPG results in solutions that are orders of magnitude simpler, thus execution never needs hardware support. In this work, our goal is to provide a tutorial overview demonstrating how the emergent properties of TPG have been achieved as well as providing specific examples of decompositions discovered under the VizDoom task.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In contrast, even once trained, the convolutional operation central to deep learning results in orders of magnitude higher computational cost.

  2. 2.

    In practice the sequence of frames as experienced by the agent might represent a stochastic sampling of the actual true frame sequence [24].

  3. 3.

    Game titles might not use all atomic actions.

  4. 4.

    The action of a button press is game dependent and might make the avatar ‘jump’ in some games and ‘fire’ in others.

  5. 5.

    Actually as nodes are subsumed into graphs, it will be come apparent that only a subset of nodes require explicit fitness evaluation (Sect. 3.4.1).

  6. 6.

    The variation operators are actually applied multiplicatively, possibly resulting in any single operator being applied several times, see [21].

  7. 7.

    This TPG individual actually represents a policy able to operate under ten different VizDoom tasks [31].

  8. 8.

    Part of this might be due to the types of tasks that researchers choose to deploy GP on. For example, ‘expressive GP’ demonstrates its more interesting properties under tasks such as software synthesis [32].

References

  1. Atkinson, T., Plump, D., Stepney, S.: Evolving graphs by graph programming. In: European Conference on Genetic Programming, Lecture Notes in Computer Science, vol. 10781, pp. 35–51. Springer (2018)

    Google Scholar 

  2. Banzhaf, W.: Artificial regulatory networks and genetic programming. In: R. Riolo, B. Worzel (eds.) Genetic Programming Theory and Practice, pp. 43–62. Springer (2003)

    Google Scholar 

  3. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)

    Article  Google Scholar 

  4. Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genetic Programming and Evolvable Machines 2, 381–407 (2001)

    Article  Google Scholar 

  5. Brameier, M., Banzhaf, W.: Linear Genetic Programming, Springer (2007)

    Google Scholar 

  6. Doucette, J.A., Lichodzijewski, P., Heywood, M.I.: Hierarchical task decomposition through symbiosis in reinforcement learning. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2012), pp. 97–104 (2012)

    Google Scholar 

  7. Doucette, J.A., McIntyre, A.R., Lichodzijewski, P., Heywood, M.I.: Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces. Genetic Programming and Evolvable Machines 13, 71–101 (2012)

    Article  Google Scholar 

  8. Fogal, L., Owens, A., Walsh, M.: Artificial intelligence through a simulation of evolution. In: Proceedings of the Cybernetic Sciences Symposium, pp. 131–155 (1965)

    Google Scholar 

  9. Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P.: A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games 6, 355–366 (2014)

    Article  Google Scholar 

  10. Jia, B., Ebner, M.: Evolving game state features from raw pixels. In: European Conference on Genetic Programming, Lecture Notes in Computer Science, vol. 10196, pp. 52–63. Springer (2017)

    Google Scholar 

  11. Kelly, S., Heywood, M.I.: On diversity, teaming, and hierarchical policies: Observations from the keepaway soccer task. In: European Conference on Genetic Programming 2014, Lecture Notes in Computer Science, vol. 8599, pp. 75–86. Springer (2014)

    Google Scholar 

  12. Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming 2017, Lecture Notes in Computer Science, vol. 10196, pp. 64–79. Springer (2017)

    Google Scholar 

  13. Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2017), pp. 195–202 (2017)

    Google Scholar 

  14. Kelly, S., Heywood, M.I.: Discovering agent behaviors through code reuse: Examples from Half-Field Offense and Ms. Pac-Man. IEEE Transactions on Games 10, 195–208 (2018)

    Article  Google Scholar 

  15. Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multi-task reinforcement learning. Evolutionary Computation 26(3) (2018)

    Article  Google Scholar 

  16. Kelly, S., Lichodzijewski, P., Heywood, M.I.: On run time libraries and hierarchical symbiosis. In: IEEE Congress on Evolutionary Computation, pp. 1–8 (2012)

    Google Scholar 

  17. Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)

    Google Scholar 

  18. Khanchi, S., Vahdat, A., Heywood, M.I., Zincir-Heywood, A.N.: On botnet detection with genetic programming under streaming data label budgets and class imbalance. Swarm and Evolutionary Computation 39, 123–140 (2018)

    Article  Google Scholar 

  19. Lichodzijewski, P., Heywood, M.I.: Coevolutionary bid-based genetic programming for problem decomposition in classification. Genetic Programming and Evolvable Machines 9, 331–365 (2008)

    Article  Google Scholar 

  20. Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2008), pp. 363–370 (2008)

    Google Scholar 

  21. Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (GECCO-2010), pp. 853–860 (2010)

    Google Scholar 

  22. Lichodzijewski, P., Heywood, M.I.: The Rubik’s Cube and GP temporal sequence learning. In: R. Riolo, T. McConaghy, E. Vladislavleva (eds.) Genetic Programming Theory and Practice VIII, pp. 35–54. Springer (2011)

    Google Scholar 

  23. Mabu, S., Hirasawa, K., Hu, J.: A graph-based evolutionary algorithm: Genetic network programming and its extension using reinforcement learning. Evolutionary Computation 15, 369–398 (2007)

    Article  Google Scholar 

  24. Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)

    Article  MathSciNet  Google Scholar 

  25. Metzen, J.H., Edgington, M., Kassahun, Y., Kirchner, F.: Analysis of an evolutionary reinforcement learning method in multiagent domain. In: ACM International Conference on Autonomous Agents and Multiagent Systems, pp. 291–298 (2008)

    Google Scholar 

  26. Miikkulainen, R., Liang, J.Z., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., Hodjat, B.: Evolving deep neural networks. CoRR abs/1703.00548 (2017)

    Google Scholar 

  27. Miller, J.F., Thomson, P.: Cartesian genetic programming. In: European Conference on Genetic Programming 2000, Lecture Notes in Computer Science, vol. 1802, pp. 121–132. Springer (2000)

    Google Scholar 

  28. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  29. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2017)

    Google Scholar 

  30. Smith, R.J., Heywood, M.I.: Coevolving deep hierarchies of programs to solve complex tasks. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2017), pp. 1009–1016 (2017)

    Google Scholar 

  31. Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming 2018, Lecture Notes in Computer Science, vol. 10781, pp. 135–150. Springer (2018)

    Google Scholar 

  32. Spector, L., McPhee, N.F.: Expressive genetic programming: concepts and applications. In: ACM Genetic and Evolutionary Computation Conference (Tutorial) (2016)

    Google Scholar 

  33. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10 (2002)

    Article  Google Scholar 

  34. Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018)

    Google Scholar 

  35. Teller, A., Veloso, M.: Pado: A new learning architecture for object recognition. In: Symbolic visual learning. Oxford University Press (1996)

    Google Scholar 

  36. Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2007), pp. 1708–1715 (2007)

    Google Scholar 

  37. Turner, A.J., Miller, J.F.: Neuroevolution: Evolving heterogeneous artificial neural networks. Evolutionary Intelligence 7, 135–154 (2014)

    Article  Google Scholar 

  38. Vahdat, A., Morgan, J., McIntyre, A.R., Heywood, M.I., Zincir-Heywood, A.N.: Evolving GP classifiers for streaming data tasks with concept change and label budgets: A benchmarking study. In: A.H. Gandomi, A.H. Alavi, C. Ryan (eds.) Handbook of Genetic Programming Applications, pp. 451–480. Springer (2015)

    Google Scholar 

  39. Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2018), pp. 229–236 (2018)

    Google Scholar 

  40. Wu, S.X., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2011), pp. 1403–1410 (2011)

    Google Scholar 

Download references

Acknowledgements

Stephen Kelly gratefully acknowledges support from the Nova Scotia Graduate Scholarship program. Malcolm Heywood gratefully acknowledges support from the NSERC Discovery and CRD programs (Canada).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kelly, S., Smith, R.J., Heywood, M.I. (2019). Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial. In: Banzhaf, W., Spector, L., Sheneman, L. (eds) Genetic Programming Theory and Practice XVI. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-04735-1_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04735-1_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04734-4

  • Online ISBN: 978-3-030-04735-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics