Skip to main content

Emergent Tangled Graph Representations for Atari Game Playing Agents

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10196))

Abstract

Organizing code into coherent programs and relating different programs to each other represents an underlying requirement for scaling genetic programming to more difficult task domains. Assuming a model in which policies are defined by teams of programs, in which team and program are represented using independent populations and coevolved, has previously been shown to support the development of variable sized teams. In this work, we generalize the approach to provide a complete framework for organizing multiple teams into arbitrarily deep/wide structures through a process of continuous evolution; hereafter the Tangled Program Graph (TPG). Benchmarking is conducted using a subset of 20 games from the Arcade Learning Environment (ALE), an Atari 2600 video game emulator. The games considered here correspond to those in which deep learning was unable to reach a threshold of play consistent with that of a human. Information provided to the learning agent is limited to that which a human would experience. That is, screen capture sensory input, Atari joystick actions, and game score. The performance of the proposed approach exceeds that of deep learning in 15 of the 20 games, with 7 of the 15 also exceeding that associated with a human level of competence. Moreover, in contrast to solutions from deep learning, solutions discovered by TPG are also very ‘sparse’. Rather than assuming that all of the state space contributes to every decision, each action in TPG is resolved following execution of a subset of an individual’s graph. This results in significantly lower computational requirements for model building than presently the case for deep learning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    ALE includes a parameter \(repeat\_action\_probability\), for which we assumed the suggested value of 0.25.

  2. 2.

    Partial observability can be mitigated by averaging pixel colours across each pair of sequential frames, a preprocessing step not used in this work.

  3. 3.

    ALE provides SECAM as an alternative encoding to the default NSTC format.

  4. 4.

    Individuals in the team population merely index a subset of programs from the program population under a variable length representation. A valid team conforms to the constraint that it must index a minimum of 2 programs and have at least two different actions.

  5. 5.

    A vector of 50 double-precision values, or the program’s output when executed relative to each unique state stored in the archive.

  6. 6.

    All experiments were conducted on a shared cluster with a maximum run-time of 2 weeks. The nature of some games allowed for >1000 generations, while others limited evolution to the order of a few hundred.

References

  1. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Google Scholar 

  2. Nolfi, S.: Using emergent modularity to develop control systems for mobile robots. Adapt. Behav. 5(3–4), 343–363 (1997)

    Article  Google Scholar 

  3. Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P.: A neuroevolution approach to general Atari game playing. IEEE Trans. Comput. Intell. AI in Games 6(4), 355–366 (2014)

    Article  Google Scholar 

  4. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  5. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  6. Rosca, J.: Towards automatic discovery of building blocks in genetic programming. In: Working Notes for the AAAI Symposium on Genetic Programming, AAAI, pp. 78–85, 10–12 1995

    Google Scholar 

  7. Spector, L., Martin, B., Harrington, K., Helmuth, T.: Tag-based modules in genetic programming. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 1419–1426. ACM (2011)

    Google Scholar 

  8. Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program. Evolvable Mach. 2(4), 381–407 (2001)

    Article  MATH  Google Scholar 

  9. Imamura, K., Soule, T., Heckendorn, R.B., Foster, J.A.: Behavioural diversity and probabilistically optimal GP ensemble. Genet. Program. Evolvable Mach. 4(3), 235–254 (2003)

    Article  Google Scholar 

  10. Wu, S.X., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 1403–1410 (2011)

    Google Scholar 

  11. Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 1708–1715 (2007)

    Google Scholar 

  12. Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 863–870 (2008)

    Google Scholar 

  13. Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)

    Google Scholar 

  14. Kelly, S., Heywood, M.I.: On diversity, teaming, and hierarchical policies: observations from the keepaway soccer task. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 75–86. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44303-3_7

    Google Scholar 

  15. Kelly, S., Heywood, M.I.: Genotypic versus behavioural diversity for teams of programs under the 4-v-3 keepaway soccer task. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3110–3111 (2014)

    Google Scholar 

  16. Lichodzijewski, P., Heywood, M.I.: The Rubik cube and GP temporal sequence learning: an initial study. In: Riolo, R., McConaghy, T., Vladislavleva, E. (eds.) Genetic Programming Theory and Practice VIII, 35–54. GEC. Springer, Heidelberg (2011)

    Google Scholar 

  17. Doucette, J.A., Lichodzijewski, P., Heywood, M.I.: Hierarchical task decomposition through symbiosis in reinforcement learning. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 97–104 (2012)

    Google Scholar 

  18. Kelly, S., Lichodzijewski, P., Heywood, M.I.: On run time libraries and hierarchical symbiosis. In: IEEE Congress on Evolutionary Computation, pp. 3245–3252 (2012)

    Google Scholar 

  19. Steenkiste, S., Koutník, J., Driessens, K., Schmidhuber, J.: A wavelet-based encoding for neuroevolution. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 517–524 (2016)

    Google Scholar 

  20. Brameier, M., Banzhaf, W.: Linear Genetic Programming, 1st edn. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  21. Pepels, T., Winands, M.H.M.: Enhancements for monte-carlo tree search in Ms Pac-Man. In: IEEE Symposium on Computational Intelligence in Games, pp. 265–272 (2012)

    Google Scholar 

  22. Schrum, J., Miikkulainen, R.: Discovering multimodal behavior in Ms. Pac-Man through evolution of modular neural networks. IEEE Trans. Comput. Intell. AI in Games 8(1), 67–81 (2016)

    Article  Google Scholar 

  23. Kashtan, N., Noor, E., Alon, U.: Varying environments can speed up evolution. Proc. Nat. Acad. Sci. 104(34), 13711–13716 (2007)

    Article  Google Scholar 

  24. Parter, M., Kashtan, N., Alon, U.: Facilitated variation: how evolution learns from past environments to generalize to new environments. PLoS Comput. Biol. 4(11), e1000206 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

S. Kelly gratefully acknowledges support from the Nova Scotia Graduate Scholarship program. M. Heywood gratefully acknowledges support from the NSERC Discovery program. All runs were completed on cloud computing infrastructure provided by ACENET, the regional computing consortium for universities in Atlantic Canada. The TPG code base is not in any way parallel, but in adopting ACENET the five independent runs for each of the 20 games were conducted in parallel.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Kelly .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kelly, S., Heywood, M.I. (2017). Emergent Tangled Graph Representations for Atari Game Playing Agents. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2017. Lecture Notes in Computer Science(), vol 10196. Springer, Cham. https://doi.org/10.1007/978-3-319-55696-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55696-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55695-6

  • Online ISBN: 978-3-319-55696-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics