Skip to main content
Log in

Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

Reactive learning agents cannot solve partially observable sequential decision-making tasks as they are limited to defining outcomes purely in terms of the observable state. However, augmenting reactive agents with external memory might provide a path for addressing this limitation. In this work, external memory takes the form of a linked list data structure that programs have to learn how to use. We identify conditions under which additional recurrent connectivity from program output to input is necessary for state disambiguation. Benchmarking against recent results from the neural network literature on three scalable partially observable sequential decision-making tasks demonstrates that the proposed approach scales much more effectively. Indeed, solutions are shown to generalize to far more difficult sequences than those experienced under training conditions. Moreover, recommendations are made regarding the instruction set and additional benchmarking is performed with input state values designed to explicitly disrupt the identification of useful states for later recall. The protected division operator appears to be particularly useful in developing simple solutions to all three tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. However, this says nothing about the ease with which reactive versus non-reactive agents might discover optimal policies.

  2. Potentially reflecting any number of events that the agent experienced in the past.

  3. Register values are not reset between consecutive program executions.

  4. Other drawbacks, such as the loss of gradient information are specific to the gradient-descent form of credit assignment [12], thus outside the purview of this review.

  5. Such as a forwhile(a, b, c) instruction.

  6. Different control signals were given access to different subsets of ADFs.

  7. The stack and queue had 5 control signals, and the list 10.

  8. Identify if a sequence of brackets has a matching number of open and close brackets

  9. Three types of bracket in which the goal is to declare whether the brackets in the sequence are correctly paired.

  10. This function was only necessary for the Copy Task (Sect. 4.3).

  11. For example, increasing ‘signal-to-noise’ as the ensemble size increases [1] or ‘over generalization’ when ensemble members fail to identify an appropriate specialization [21].

  12. In the case of multiple tied actions, then the following priority order is assumed between actions: push > pop_head > no_op > pop_tail

  13. Also referred to as corridor length.

  14. Task state, s(t), value at the head of the linked list, \(y_h(t-1)\), and past internal state, \(a_1(t-1)\).

  15. NEAT does not explicitly enforce an ensemble, thus potentially not benefiting from a prior decomposition of the task. We, therefore, provide NEAT with a larger population and more generations to search for solutions.

  16. We plot the performance of the best individual from each generation under test conditions.

  17. Non-parametric version of the ANOVA statistic.

  18. NEAT and three instances of ensemble GP (Div, Mult, All).

  19. Five depths of Sequence Recall and Sequence Classification and one multi-depth case of the Copy task.

  20. The original mean and standard deviation for the ‘Div’ instruction set under test was \(95\%\) (21.8) compared to \(100\%\) accuracy (0 variance) under the test of the non-zero version of the task.

References

  1. A.K. Agogino, K. Tumer, Efficient evaluation functions for evolving coordination. Evol. Comput. 16(2), 257–288 (2008)

    Article  Google Scholar 

  2. D. Andre, Evolution of mapmaking: Learning, planning, and memory using genetic programming, in: Proceedings of the IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence. (IEEE, 1994), pp. 250–255

  3. S. Brave, The evolution of memory and mental models using genetic programming, in Proceedings of the Annual Conference on Genetic Programming. (MIT, 1996), pp. 261–266

  4. J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  5. J.L. Elman, Finding structure in time. Cognet. Sci. 14(2), 179–211 (1990)

    Article  Google Scholar 

  6. F. Fortin, F.D. Rainville, M. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  Google Scholar 

  7. F.J. Gomez, J. Schmidhuber, R. Miikkulainen, Accelerated neural evolution through cooperatively coevolved synapses. J. Mach. Learn. Res. 9, 937–965 (2008)

    MathSciNet  MATH  Google Scholar 

  8. A. Graves, G. Wayne, I. Danihelka, Neural turing machines, pp. 1–26 (2014). CoRR arXiv:abs/1410.5401

  9. A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S.G. Colmenarejo, E. Grefenstette, T. Ramalho, J.P. Agapiou, A.P. Badia, K.M. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, D. Hassabis, Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)

    Article  Google Scholar 

  10. K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber, LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)

    Article  MathSciNet  Google Scholar 

  11. R.B. Greve, E.J. Jacobsen, S. Risi, Evolving neural turing machines for reward-based learning, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2016), pp. 117–124

  12. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  13. S. Kelly, R.J. Smith, M.I. Heywood, W. Banzhaf, Emergent tangled program graphs in partially observable recursive forecasting and ViZDoom navigation tasks. ACM Trans. Evolut. Optim. Learn. 1(3), 1–41 (2021)

    Article  Google Scholar 

  14. S. Kelly, T. Voegerl, W. Banzhaf, C. Gondro, Evolving hierarchical memory-prediction machines in multi-task reinforcement learning. Genet. Program Evol. Mach. 22(4), 573–605 (2021)

    Article  Google Scholar 

  15. S. Khadka, J.J. Chung, K. Tumer, Neuroevolution of a modular memory-augmented neural network for deep memory problems. Evol. Comput. 27(4), 639–664 (2019)

    Article  Google Scholar 

  16. A. Lalejini, M.A. Moreno, C. Ofria, Tag-based regulation of modules in genetic programming improves context-dependent problem solving. Genet. Program Evol. Mach. 22(3), 325–355 (2021)

    Article  Google Scholar 

  17. W.B. Langdon, Genetic Programming and Data Structures (Kluwer Academic, 1998)

    Book  MATH  Google Scholar 

  18. X. Luo, M.I. Heywood, A.N. Zincir-Heywood, Evolving recurrent models using linear GP, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2005), pp. 1787–1788

  19. M.A. Masalma, M.I. Heywood, Genetic programming with external memory in sequence recall task, in Proceedings of the Genetic and Evolutionary Computation Conference (companion). (ACM, 2022)

  20. N.F. McPhee, R. Poli, Memory with memory: soft assignment in genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2008), pp. 1235–1242

  21. L. Panait, S. Luke, R.P. Wiegand, Biasing coevolutionary search for optimal multiagent behaviors. IEEE Trans. Evol. Comput. 10(6), 629–645 (2006)

    Article  Google Scholar 

  22. R. Poli, N.F. McPhee, L. Citi, E.F. Crane, Memory with memory in tree-based genetic programming, in Proceedings of the European Conference on Genetic Programming, LNCS, vol. 5481. (Springer, 2009), pp. 25–36

  23. M.A. Potter, K.A.D. Jong, Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evol. Comput. 8(1), 1–29 (2000)

    Article  Google Scholar 

  24. A. Rawal, R. Miikkulainen, Evolving deep lstm-based memory networks using an information maximization objective, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2016), pp. 501–508

  25. H.T. Siegelmann, E.D. Sontag, On the computational power of neural nets. J. Comput. Syst. Sci. 50(1), 132–150 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  26. A. Silva, A. Neves, E. Costa, Evolving controllers for autonomous agents using genetically programmed networks, in Proceedings of the European Conference on Genetic Programming, LNCS, vol. 1598. (Springer, 1999), pp. 255–269

  27. R.J. Smith, M.I. Heywood, Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2019), pp. 179–187

  28. L. Spector, S. Luke, Cultural transmission of information in genetic programming, in Proceedings of the Annual Conference on Genetic Programming. (MIT Press, 1996), pp. 209–214

  29. K.O. Stanley, R. Miikkulainen, Evolving neural network through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)

    Article  Google Scholar 

  30. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT, 2018)

    MATH  Google Scholar 

  31. A. Teller, Turing completeness in the language of genetic programming with indexed memory, in Proceedings of the IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence. (IEEE, 1994), pp. 136–141

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malcolm Heywood.

Ethics declarations

Competing interests

The authors confirm that they have no competing interests that are directly or indirectly related to the work submitted for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Example solutions

Appendix A: Example solutions

Three examples of evolved solutions and their corresponding function under simplified test sequences are provided, in Table 10. These solutions are sufficient to generalize to a wide range of alternative inputs. The code is presented as evolved without attempting to simplify. In all three cases the ‘Div’ instruction set is assumed (Table 1). Tables 11, 12 and 13 illustrate how the various inputs and program outputs change for examples of each task. For clarity, we only show \(y_h\) on Sequence Classification and ignore the data inputs on the Copy task as they do not have a role to play in the evolved control behaviour(s).

Table 10 Example champion programs evolved using the Div instruction set
Table 11 Operation of sequence recall program from Table 10 for example task
Table 12 Operation of sequence classification program from Table 10 for an example task
Table 13 Operation of copy task program from Table 10 for an example task

Some general behaviours to note include the use of action priority order when ties appear between actions and the effective behaviour of \(a_1\) to identify the recall phase of the Copy task. From the evolved control action perspective, the programs associated with no_op tended to be the more complex, both behaviourally and from the perspective of the evolved code. It is also apparent that pop_tail output is never used in these examples. However, this need not be the case. For example, under the sequence classification task pop_head and pop_tail are interchangeable without detracting from the behaviour of the ensemble.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al Masalma, M., Heywood, M. Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks. Genet Program Evolvable Mach 23 (Suppl 1), 1–29 (2022). https://doi.org/10.1007/s10710-022-09446-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-022-09446-8

Keywords

Navigation