Abstract
Tangled Program Graphs (TPG) represents a framework by which multiple programs can be organized to cooperate and decompose a task with minimal a priori information. TPG agents begin with least complexity and incrementally coevolve to discover a complexity befitting the nature of the task. Previous research has demonstrated the TPG framework under visual reinforcement learning tasks from the Arcade Learning Environment and VizDoom first person shooter game that are competitive with those from Deep Learning. However, unlike Deep Learning the emergent constructive properties of TPG results in solutions that are orders of magnitude simpler, thus execution never needs hardware support. In this work, our goal is to provide a tutorial overview demonstrating how the emergent properties of TPG have been achieved as well as providing specific examples of decompositions discovered under the VizDoom task.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In contrast, even once trained, the convolutional operation central to deep learning results in orders of magnitude higher computational cost.
- 2.
In practice the sequence of frames as experienced by the agent might represent a stochastic sampling of the actual true frame sequence [24].
- 3.
Game titles might not use all atomic actions.
- 4.
The action of a button press is game dependent and might make the avatar ‘jump’ in some games and ‘fire’ in others.
- 5.
Actually as nodes are subsumed into graphs, it will be come apparent that only a subset of nodes require explicit fitness evaluation (Sect. 3.4.1).
- 6.
The variation operators are actually applied multiplicatively, possibly resulting in any single operator being applied several times, see [21].
- 7.
This TPG individual actually represents a policy able to operate under ten different VizDoom tasks [31].
- 8.
Part of this might be due to the types of tasks that researchers choose to deploy GP on. For example, ‘expressive GP’ demonstrates its more interesting properties under tasks such as software synthesis [32].
References
Atkinson, T., Plump, D., Stepney, S.: Evolving graphs by graph programming. In: European Conference on Genetic Programming, Lecture Notes in Computer Science, vol. 10781, pp. 35–51. Springer (2018)
Banzhaf, W.: Artificial regulatory networks and genetic programming. In: R. Riolo, B. Worzel (eds.) Genetic Programming Theory and Practice, pp. 43–62. Springer (2003)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)
Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genetic Programming and Evolvable Machines 2, 381–407 (2001)
Brameier, M., Banzhaf, W.: Linear Genetic Programming, Springer (2007)
Doucette, J.A., Lichodzijewski, P., Heywood, M.I.: Hierarchical task decomposition through symbiosis in reinforcement learning. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2012), pp. 97–104 (2012)
Doucette, J.A., McIntyre, A.R., Lichodzijewski, P., Heywood, M.I.: Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces. Genetic Programming and Evolvable Machines 13, 71–101 (2012)
Fogal, L., Owens, A., Walsh, M.: Artificial intelligence through a simulation of evolution. In: Proceedings of the Cybernetic Sciences Symposium, pp. 131–155 (1965)
Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P.: A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games 6, 355–366 (2014)
Jia, B., Ebner, M.: Evolving game state features from raw pixels. In: European Conference on Genetic Programming, Lecture Notes in Computer Science, vol. 10196, pp. 52–63. Springer (2017)
Kelly, S., Heywood, M.I.: On diversity, teaming, and hierarchical policies: Observations from the keepaway soccer task. In: European Conference on Genetic Programming 2014, Lecture Notes in Computer Science, vol. 8599, pp. 75–86. Springer (2014)
Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming 2017, Lecture Notes in Computer Science, vol. 10196, pp. 64–79. Springer (2017)
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2017), pp. 195–202 (2017)
Kelly, S., Heywood, M.I.: Discovering agent behaviors through code reuse: Examples from Half-Field Offense and Ms. Pac-Man. IEEE Transactions on Games 10, 195–208 (2018)
Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multi-task reinforcement learning. Evolutionary Computation 26(3) (2018)
Kelly, S., Lichodzijewski, P., Heywood, M.I.: On run time libraries and hierarchical symbiosis. In: IEEE Congress on Evolutionary Computation, pp. 1–8 (2012)
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)
Khanchi, S., Vahdat, A., Heywood, M.I., Zincir-Heywood, A.N.: On botnet detection with genetic programming under streaming data label budgets and class imbalance. Swarm and Evolutionary Computation 39, 123–140 (2018)
Lichodzijewski, P., Heywood, M.I.: Coevolutionary bid-based genetic programming for problem decomposition in classification. Genetic Programming and Evolvable Machines 9, 331–365 (2008)
Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2008), pp. 363–370 (2008)
Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (GECCO-2010), pp. 853–860 (2010)
Lichodzijewski, P., Heywood, M.I.: The Rubik’s Cube and GP temporal sequence learning. In: R. Riolo, T. McConaghy, E. Vladislavleva (eds.) Genetic Programming Theory and Practice VIII, pp. 35–54. Springer (2011)
Mabu, S., Hirasawa, K., Hu, J.: A graph-based evolutionary algorithm: Genetic network programming and its extension using reinforcement learning. Evolutionary Computation 15, 369–398 (2007)
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)
Metzen, J.H., Edgington, M., Kassahun, Y., Kirchner, F.: Analysis of an evolutionary reinforcement learning method in multiagent domain. In: ACM International Conference on Autonomous Agents and Multiagent Systems, pp. 291–298 (2008)
Miikkulainen, R., Liang, J.Z., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., Hodjat, B.: Evolving deep neural networks. CoRR abs/1703.00548 (2017)
Miller, J.F., Thomson, P.: Cartesian genetic programming. In: European Conference on Genetic Programming 2000, Lecture Notes in Computer Science, vol. 1802, pp. 121–132. Springer (2000)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2017)
Smith, R.J., Heywood, M.I.: Coevolving deep hierarchies of programs to solve complex tasks. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2017), pp. 1009–1016 (2017)
Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming 2018, Lecture Notes in Computer Science, vol. 10781, pp. 135–150. Springer (2018)
Spector, L., McPhee, N.F.: Expressive genetic programming: concepts and applications. In: ACM Genetic and Evolutionary Computation Conference (Tutorial) (2016)
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10 (2002)
Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018)
Teller, A., Veloso, M.: Pado: A new learning architecture for object recognition. In: Symbolic visual learning. Oxford University Press (1996)
Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2007), pp. 1708–1715 (2007)
Turner, A.J., Miller, J.F.: Neuroevolution: Evolving heterogeneous artificial neural networks. Evolutionary Intelligence 7, 135–154 (2014)
Vahdat, A., Morgan, J., McIntyre, A.R., Heywood, M.I., Zincir-Heywood, A.N.: Evolving GP classifiers for streaming data tasks with concept change and label budgets: A benchmarking study. In: A.H. Gandomi, A.H. Alavi, C. Ryan (eds.) Handbook of Genetic Programming Applications, pp. 451–480. Springer (2015)
Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2018), pp. 229–236 (2018)
Wu, S.X., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2011), pp. 1403–1410 (2011)
Acknowledgements
Stephen Kelly gratefully acknowledges support from the Nova Scotia Graduate Scholarship program. Malcolm Heywood gratefully acknowledges support from the NSERC Discovery and CRD programs (Canada).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kelly, S., Smith, R.J., Heywood, M.I. (2019). Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial. In: Banzhaf, W., Spector, L., Sheneman, L. (eds) Genetic Programming Theory and Practice XVI. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-04735-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-04735-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04734-4
Online ISBN: 978-3-030-04735-1
eBook Packages: Computer ScienceComputer Science (R0)