Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial

Kelly, Stephen; Smith, Robert J.; Heywood, Malcolm I.

doi:10.1007/978-3-030-04735-1_3

Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial

Stephen Kelly⁶,
Robert J. Smith⁶ &
Malcolm I. Heywood⁶

Chapter
First Online: 24 January 2019

866 Accesses
10 Citations

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

Tangled Program Graphs (TPG) represents a framework by which multiple programs can be organized to cooperate and decompose a task with minimal a priori information. TPG agents begin with least complexity and incrementally coevolve to discover a complexity befitting the nature of the task. Previous research has demonstrated the TPG framework under visual reinforcement learning tasks from the Arcade Learning Environment and VizDoom first person shooter game that are competitive with those from Deep Learning. However, unlike Deep Learning the emergent constructive properties of TPG results in solutions that are orders of magnitude simpler, thus execution never needs hardware support. In this work, our goal is to provide a tutorial overview demonstrating how the emergent properties of TPG have been achieved as well as providing specific examples of decompositions discovered under the VizDoom task.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In contrast, even once trained, the convolutional operation central to deep learning results in orders of magnitude higher computational cost.
2.
In practice the sequence of frames as experienced by the agent might represent a stochastic sampling of the actual true frame sequence [24].
3.
Game titles might not use all atomic actions.
4.
The action of a button press is game dependent and might make the avatar ‘jump’ in some games and ‘fire’ in others.
5.
Actually as nodes are subsumed into graphs, it will be come apparent that only a subset of nodes require explicit fitness evaluation (Sect. 3.4.1).
6.
The variation operators are actually applied multiplicatively, possibly resulting in any single operator being applied several times, see [21].
7.
This TPG individual actually represents a policy able to operate under ten different VizDoom tasks [31].
8.
Part of this might be due to the types of tasks that researchers choose to deploy GP on. For example, ‘expressive GP’ demonstrates its more interesting properties under tasks such as software synthesis [32].

References

Atkinson, T., Plump, D., Stepney, S.: Evolving graphs by graph programming. In: European Conference on Genetic Programming, Lecture Notes in Computer Science, vol. 10781, pp. 35–51. Springer (2018)
Google Scholar
Banzhaf, W.: Artificial regulatory networks and genetic programming. In: R. Riolo, B. Worzel (eds.) Genetic Programming Theory and Practice, pp. 43–62. Springer (2003)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47, 253–279 (2013)
Article Google Scholar
Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genetic Programming and Evolvable Machines 2, 381–407 (2001)
Article Google Scholar
Brameier, M., Banzhaf, W.: Linear Genetic Programming, Springer (2007)
Google Scholar
Doucette, J.A., Lichodzijewski, P., Heywood, M.I.: Hierarchical task decomposition through symbiosis in reinforcement learning. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2012), pp. 97–104 (2012)
Google Scholar
Doucette, J.A., McIntyre, A.R., Lichodzijewski, P., Heywood, M.I.: Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces. Genetic Programming and Evolvable Machines 13, 71–101 (2012)
Article Google Scholar
Fogal, L., Owens, A., Walsh, M.: Artificial intelligence through a simulation of evolution. In: Proceedings of the Cybernetic Sciences Symposium, pp. 131–155 (1965)
Google Scholar
Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P.: A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games 6, 355–366 (2014)
Article Google Scholar
Jia, B., Ebner, M.: Evolving game state features from raw pixels. In: European Conference on Genetic Programming, Lecture Notes in Computer Science, vol. 10196, pp. 52–63. Springer (2017)
Google Scholar
Kelly, S., Heywood, M.I.: On diversity, teaming, and hierarchical policies: Observations from the keepaway soccer task. In: European Conference on Genetic Programming 2014, Lecture Notes in Computer Science, vol. 8599, pp. 75–86. Springer (2014)
Google Scholar
Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming 2017, Lecture Notes in Computer Science, vol. 10196, pp. 64–79. Springer (2017)
Google Scholar
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2017), pp. 195–202 (2017)
Google Scholar
Kelly, S., Heywood, M.I.: Discovering agent behaviors through code reuse: Examples from Half-Field Offense and Ms. Pac-Man. IEEE Transactions on Games 10, 195–208 (2018)
Article Google Scholar
Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multi-task reinforcement learning. Evolutionary Computation 26(3) (2018)
Article Google Scholar
Kelly, S., Lichodzijewski, P., Heywood, M.I.: On run time libraries and hierarchical symbiosis. In: IEEE Congress on Evolutionary Computation, pp. 1–8 (2012)
Google Scholar
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)
Google Scholar
Khanchi, S., Vahdat, A., Heywood, M.I., Zincir-Heywood, A.N.: On botnet detection with genetic programming under streaming data label budgets and class imbalance. Swarm and Evolutionary Computation 39, 123–140 (2018)
Article Google Scholar
Lichodzijewski, P., Heywood, M.I.: Coevolutionary bid-based genetic programming for problem decomposition in classification. Genetic Programming and Evolvable Machines 9, 331–365 (2008)
Article Google Scholar
Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2008), pp. 363–370 (2008)
Google Scholar
Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (GECCO-2010), pp. 853–860 (2010)
Google Scholar
Lichodzijewski, P., Heywood, M.I.: The Rubik’s Cube and GP temporal sequence learning. In: R. Riolo, T. McConaghy, E. Vladislavleva (eds.) Genetic Programming Theory and Practice VIII, pp. 35–54. Springer (2011)
Google Scholar
Mabu, S., Hirasawa, K., Hu, J.: A graph-based evolutionary algorithm: Genetic network programming and its extension using reinforcement learning. Evolutionary Computation 15, 369–398 (2007)
Article Google Scholar
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)
Article MathSciNet Google Scholar
Metzen, J.H., Edgington, M., Kassahun, Y., Kirchner, F.: Analysis of an evolutionary reinforcement learning method in multiagent domain. In: ACM International Conference on Autonomous Agents and Multiagent Systems, pp. 291–298 (2008)
Google Scholar
Miikkulainen, R., Liang, J.Z., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., Hodjat, B.: Evolving deep neural networks. CoRR abs/1703.00548 (2017)
Google Scholar
Miller, J.F., Thomson, P.: Cartesian genetic programming. In: European Conference on Genetic Programming 2000, Lecture Notes in Computer Science, vol. 1802, pp. 121–132. Springer (2000)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2017)
Google Scholar
Smith, R.J., Heywood, M.I.: Coevolving deep hierarchies of programs to solve complex tasks. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2017), pp. 1009–1016 (2017)
Google Scholar
Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming 2018, Lecture Notes in Computer Science, vol. 10781, pp. 135–150. Springer (2018)
Google Scholar
Spector, L., McPhee, N.F.: Expressive genetic programming: concepts and applications. In: ACM Genetic and Evolutionary Computation Conference (Tutorial) (2016)
Google Scholar
Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10 (2002)
Article Google Scholar
Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018)
Google Scholar
Teller, A., Veloso, M.: Pado: A new learning architecture for object recognition. In: Symbolic visual learning. Oxford University Press (1996)
Google Scholar
Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2007), pp. 1708–1715 (2007)
Google Scholar
Turner, A.J., Miller, J.F.: Neuroevolution: Evolving heterogeneous artificial neural networks. Evolutionary Intelligence 7, 135–154 (2014)
Article Google Scholar
Vahdat, A., Morgan, J., McIntyre, A.R., Heywood, M.I., Zincir-Heywood, A.N.: Evolving GP classifiers for streaming data tasks with concept change and label budgets: A benchmarking study. In: A.H. Gandomi, A.H. Alavi, C. Ryan (eds.) Handbook of Genetic Programming Applications, pp. 451–480. Springer (2015)
Google Scholar
Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2018), pp. 229–236 (2018)
Google Scholar
Wu, S.X., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: ACM Genetic and Evolutionary Computation Conference (GECCO-2011), pp. 1403–1410 (2011)
Google Scholar

Download references

Acknowledgements

Stephen Kelly gratefully acknowledges support from the Nova Scotia Graduate Scholarship program. Malcolm Heywood gratefully acknowledges support from the NSERC Discovery and CRD programs (Canada).

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
Stephen Kelly, Robert J. Smith & Malcolm I. Heywood

Authors

Stephen Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Robert J. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm I. Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Computer Science and Engineering, John R. Koza Chair, Michigan State University, East Lansing, MI, USA
Wolfgang Banzhaf
Cognitive Science, Hampshire College, Amherst, MA, USA
Lee Spector
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
Leigh Sheneman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kelly, S., Smith, R.J., Heywood, M.I. (2019). Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial. In: Banzhaf, W., Spector, L., Sheneman, L. (eds) Genetic Programming Theory and Practice XVI. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-04735-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-04735-1_3
Published: 24 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04734-4
Online ISBN: 978-3-030-04735-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics