Emergent Tangled Graph Representations for Atari Game Playing Agents

Kelly, Stephen; Heywood, Malcolm I.

doi:10.1007/978-3-319-55696-3_5

Emergent Tangled Graph Representations for Atari Game Playing Agents

Stephen Kelly¹⁸ &
Malcolm I. Heywood¹⁸

Conference paper
First Online: 15 March 2017

1563 Accesses
16 Citations
2 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10196))

Abstract

Organizing code into coherent programs and relating different programs to each other represents an underlying requirement for scaling genetic programming to more difficult task domains. Assuming a model in which policies are defined by teams of programs, in which team and program are represented using independent populations and coevolved, has previously been shown to support the development of variable sized teams. In this work, we generalize the approach to provide a complete framework for organizing multiple teams into arbitrarily deep/wide structures through a process of continuous evolution; hereafter the Tangled Program Graph (TPG). Benchmarking is conducted using a subset of 20 games from the Arcade Learning Environment (ALE), an Atari 2600 video game emulator. The games considered here correspond to those in which deep learning was unable to reach a threshold of play consistent with that of a human. Information provided to the learning agent is limited to that which a human would experience. That is, screen capture sensory input, Atari joystick actions, and game score. The performance of the proposed approach exceeds that of deep learning in 15 of the 20 games, with 7 of the 15 also exceeding that associated with a human level of competence. Moreover, in contrast to solutions from deep learning, solutions discovered by TPG are also very ‘sparse’. Rather than assuming that all of the state space contributes to every decision, each action in TPG is resolved following execution of a subset of an individual’s graph. This results in significantly lower computational requirements for model building than presently the case for deep learning.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
ALE includes a parameter \(repeat\_action\_probability\), for which we assumed the suggested value of 0.25.
2.
Partial observability can be mitigated by averaging pixel colours across each pair of sequential frames, a preprocessing step not used in this work.
3.
ALE provides SECAM as an alternative encoding to the default NSTC format.
4.
Individuals in the team population merely index a subset of programs from the program population under a variable length representation. A valid team conforms to the constraint that it must index a minimum of 2 programs and have at least two different actions.
5.
A vector of 50 double-precision values, or the program’s output when executed relative to each unique state stored in the archive.
6.
All experiments were conducted on a shared cluster with a maximum run-time of 2 weeks. The nature of some games allowed for >1000 generations, while others limited evolution to the order of a few hundred.

References

Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Google Scholar
Nolfi, S.: Using emergent modularity to develop control systems for mobile robots. Adapt. Behav. 5(3–4), 343–363 (1997)
Article Google Scholar
Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P.: A neuroevolution approach to general Atari game playing. IEEE Trans. Comput. Intell. AI in Games 6(4), 355–366 (2014)
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
MATH Google Scholar
Rosca, J.: Towards automatic discovery of building blocks in genetic programming. In: Working Notes for the AAAI Symposium on Genetic Programming, AAAI, pp. 78–85, 10–12 1995
Google Scholar
Spector, L., Martin, B., Harrington, K., Helmuth, T.: Tag-based modules in genetic programming. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 1419–1426. ACM (2011)
Google Scholar
Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program. Evolvable Mach. 2(4), 381–407 (2001)
Article MATH Google Scholar
Imamura, K., Soule, T., Heckendorn, R.B., Foster, J.A.: Behavioural diversity and probabilistically optimal GP ensemble. Genet. Program. Evolvable Mach. 4(3), 235–254 (2003)
Article Google Scholar
Wu, S.X., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 1403–1410 (2011)
Google Scholar
Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 1708–1715 (2007)
Google Scholar
Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 863–870 (2008)
Google Scholar
Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)
Google Scholar
Kelly, S., Heywood, M.I.: On diversity, teaming, and hierarchical policies: observations from the keepaway soccer task. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 75–86. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44303-3_7
Google Scholar
Kelly, S., Heywood, M.I.: Genotypic versus behavioural diversity for teams of programs under the 4-v-3 keepaway soccer task. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3110–3111 (2014)
Google Scholar
Lichodzijewski, P., Heywood, M.I.: The Rubik cube and GP temporal sequence learning: an initial study. In: Riolo, R., McConaghy, T., Vladislavleva, E. (eds.) Genetic Programming Theory and Practice VIII, 35–54. GEC. Springer, Heidelberg (2011)
Google Scholar
Doucette, J.A., Lichodzijewski, P., Heywood, M.I.: Hierarchical task decomposition through symbiosis in reinforcement learning. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 97–104 (2012)
Google Scholar
Kelly, S., Lichodzijewski, P., Heywood, M.I.: On run time libraries and hierarchical symbiosis. In: IEEE Congress on Evolutionary Computation, pp. 3245–3252 (2012)
Google Scholar
Steenkiste, S., Koutník, J., Driessens, K., Schmidhuber, J.: A wavelet-based encoding for neuroevolution. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 517–524 (2016)
Google Scholar
Brameier, M., Banzhaf, W.: Linear Genetic Programming, 1st edn. Springer, Heidelberg (2007)
MATH Google Scholar
Pepels, T., Winands, M.H.M.: Enhancements for monte-carlo tree search in Ms Pac-Man. In: IEEE Symposium on Computational Intelligence in Games, pp. 265–272 (2012)
Google Scholar
Schrum, J., Miikkulainen, R.: Discovering multimodal behavior in Ms. Pac-Man through evolution of modular neural networks. IEEE Trans. Comput. Intell. AI in Games 8(1), 67–81 (2016)
Article Google Scholar
Kashtan, N., Noor, E., Alon, U.: Varying environments can speed up evolution. Proc. Nat. Acad. Sci. 104(34), 13711–13716 (2007)
Article Google Scholar
Parter, M., Kashtan, N., Alon, U.: Facilitated variation: how evolution learns from past environments to generalize to new environments. PLoS Comput. Biol. 4(11), e1000206 (2008)
Article Google Scholar

Download references

Acknowledgments

S. Kelly gratefully acknowledges support from the Nova Scotia Graduate Scholarship program. M. Heywood gratefully acknowledges support from the NSERC Discovery program. All runs were completed on cloud computing infrastructure provided by ACENET, the regional computing consortium for universities in Atlantic Canada. The TPG code base is not in any way parallel, but in adopting ACENET the five independent runs for each of the 20 games were conducted in parallel.

Author information

Authors and Affiliations

Dalhousie University, Halifax, Nova Scotia, Canada
Stephen Kelly & Malcolm I. Heywood

Authors

Stephen Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm I. Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen Kelly .

Editor information

Editors and Affiliations

University College Dublin , Dublin, Ireland
James McDermott
Universidade Nova de Lisboa , Lisbon, Portugal
Mauro Castelli
Brno University of Technology , Brno, Czech Republic
Lukas Sekanina
Vrije Universiteit Amsterdam , Amsterdam, The Netherlands
Evert Haasdijk
University of Cádiz , Cádiz, Spain
Pablo García-Sánchez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kelly, S., Heywood, M.I. (2017). Emergent Tangled Graph Representations for Atari Game Playing Agents. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2017. Lecture Notes in Computer Science(), vol 10196. Springer, Cham. https://doi.org/10.1007/978-3-319-55696-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-55696-3_5
Published: 15 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55695-6
Online ISBN: 978-3-319-55696-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics