Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks

Al Masalma, Mihyar; Heywood, Malcolm

doi:10.1007/s10710-022-09446-8

Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks

Published: 30 November 2022

Volume 23, pages 1–29, (2022)
Cite this article

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

142 Accesses
Explore all metrics

Abstract

Reactive learning agents cannot solve partially observable sequential decision-making tasks as they are limited to defining outcomes purely in terms of the observable state. However, augmenting reactive agents with external memory might provide a path for addressing this limitation. In this work, external memory takes the form of a linked list data structure that programs have to learn how to use. We identify conditions under which additional recurrent connectivity from program output to input is necessary for state disambiguation. Benchmarking against recent results from the neural network literature on three scalable partially observable sequential decision-making tasks demonstrates that the proposed approach scales much more effectively. Indeed, solutions are shown to generalize to far more difficult sequences than those experienced under training conditions. Moreover, recommendations are made regarding the instruction set and additional benchmarking is performed with input state values designed to explicitly disrupt the identification of useful states for later recall. The protected division operator appears to be particularly useful in developing simple solutions to all three tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Article 09 October 2021

Stephen Kelly, Tatiana Voegerl, … Cedric Gondro

Investigating Deep Recurrent Connections and Recurrent Memory Cells Using Neuro-Evolution

Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies

Notes

However, this says nothing about the ease with which reactive versus non-reactive agents might discover optimal policies.
Potentially reflecting any number of events that the agent experienced in the past.
Register values are not reset between consecutive program executions.
Other drawbacks, such as the loss of gradient information are specific to the gradient-descent form of credit assignment [12], thus outside the purview of this review.
Such as a forwhile(a, b, c) instruction.
Different control signals were given access to different subsets of ADFs.
The stack and queue had 5 control signals, and the list 10.
Identify if a sequence of brackets has a matching number of open and close brackets
Three types of bracket in which the goal is to declare whether the brackets in the sequence are correctly paired.
This function was only necessary for the Copy Task (Sect. 4.3).
For example, increasing ‘signal-to-noise’ as the ensemble size increases [1] or ‘over generalization’ when ensemble members fail to identify an appropriate specialization [21].
In the case of multiple tied actions, then the following priority order is assumed between actions: push > pop_head > no_op > pop_tail
Also referred to as corridor length.
Task state, s(t), value at the head of the linked list, \(y_h(t-1)\), and past internal state, \(a_1(t-1)\).
NEAT does not explicitly enforce an ensemble, thus potentially not benefiting from a prior decomposition of the task. We, therefore, provide NEAT with a larger population and more generations to search for solutions.
We plot the performance of the best individual from each generation under test conditions.
Non-parametric version of the ANOVA statistic.
NEAT and three instances of ensemble GP (Div, Mult, All).
Five depths of Sequence Recall and Sequence Classification and one multi-depth case of the Copy task.
The original mean and standard deviation for the ‘Div’ instruction set under test was \(95\%\) (21.8) compared to \(100\%\) accuracy (0 variance) under the test of the non-zero version of the task.

References

A.K. Agogino, K. Tumer, Efficient evaluation functions for evolving coordination. Evol. Comput. 16(2), 257–288 (2008)
Article Google Scholar
D. Andre, Evolution of mapmaking: Learning, planning, and memory using genetic programming, in: Proceedings of the IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence. (IEEE, 1994), pp. 250–255
S. Brave, The evolution of memory and mental models using genetic programming, in Proceedings of the Annual Conference on Genetic Programming. (MIT, 1996), pp. 261–266
J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
J.L. Elman, Finding structure in time. Cognet. Sci. 14(2), 179–211 (1990)
Article Google Scholar
F. Fortin, F.D. Rainville, M. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
MathSciNet Google Scholar
F.J. Gomez, J. Schmidhuber, R. Miikkulainen, Accelerated neural evolution through cooperatively coevolved synapses. J. Mach. Learn. Res. 9, 937–965 (2008)
MathSciNet MATH Google Scholar
A. Graves, G. Wayne, I. Danihelka, Neural turing machines, pp. 1–26 (2014). CoRR arXiv:abs/1410.5401
A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S.G. Colmenarejo, E. Grefenstette, T. Ramalho, J.P. Agapiou, A.P. Badia, K.M. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, D. Hassabis, Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)
Article Google Scholar
K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber, LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)
Article MathSciNet Google Scholar
R.B. Greve, E.J. Jacobsen, S. Risi, Evolving neural turing machines for reward-based learning, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2016), pp. 117–124
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
S. Kelly, R.J. Smith, M.I. Heywood, W. Banzhaf, Emergent tangled program graphs in partially observable recursive forecasting and ViZDoom navigation tasks. ACM Trans. Evolut. Optim. Learn. 1(3), 1–41 (2021)
Article Google Scholar
S. Kelly, T. Voegerl, W. Banzhaf, C. Gondro, Evolving hierarchical memory-prediction machines in multi-task reinforcement learning. Genet. Program Evol. Mach. 22(4), 573–605 (2021)
Article Google Scholar
S. Khadka, J.J. Chung, K. Tumer, Neuroevolution of a modular memory-augmented neural network for deep memory problems. Evol. Comput. 27(4), 639–664 (2019)
Article Google Scholar
A. Lalejini, M.A. Moreno, C. Ofria, Tag-based regulation of modules in genetic programming improves context-dependent problem solving. Genet. Program Evol. Mach. 22(3), 325–355 (2021)
Article Google Scholar
W.B. Langdon, Genetic Programming and Data Structures (Kluwer Academic, 1998)
Book MATH Google Scholar
X. Luo, M.I. Heywood, A.N. Zincir-Heywood, Evolving recurrent models using linear GP, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2005), pp. 1787–1788
M.A. Masalma, M.I. Heywood, Genetic programming with external memory in sequence recall task, in Proceedings of the Genetic and Evolutionary Computation Conference (companion). (ACM, 2022)
N.F. McPhee, R. Poli, Memory with memory: soft assignment in genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2008), pp. 1235–1242
L. Panait, S. Luke, R.P. Wiegand, Biasing coevolutionary search for optimal multiagent behaviors. IEEE Trans. Evol. Comput. 10(6), 629–645 (2006)
Article Google Scholar
R. Poli, N.F. McPhee, L. Citi, E.F. Crane, Memory with memory in tree-based genetic programming, in Proceedings of the European Conference on Genetic Programming, LNCS, vol. 5481. (Springer, 2009), pp. 25–36
M.A. Potter, K.A.D. Jong, Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evol. Comput. 8(1), 1–29 (2000)
Article Google Scholar
A. Rawal, R. Miikkulainen, Evolving deep lstm-based memory networks using an information maximization objective, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2016), pp. 501–508
H.T. Siegelmann, E.D. Sontag, On the computational power of neural nets. J. Comput. Syst. Sci. 50(1), 132–150 (1995)
Article MathSciNet MATH Google Scholar
A. Silva, A. Neves, E. Costa, Evolving controllers for autonomous agents using genetically programmed networks, in Proceedings of the European Conference on Genetic Programming, LNCS, vol. 1598. (Springer, 1999), pp. 255–269
R.J. Smith, M.I. Heywood, Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2019), pp. 179–187
L. Spector, S. Luke, Cultural transmission of information in genetic programming, in Proceedings of the Annual Conference on Genetic Programming. (MIT Press, 1996), pp. 209–214
K.O. Stanley, R. Miikkulainen, Evolving neural network through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)
Article Google Scholar
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT, 2018)
MATH Google Scholar
A. Teller, Turing completeness in the language of genetic programming with indexed memory, in Proceedings of the IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence. (IEEE, 1994), pp. 136–141

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS, Canada
Mihyar Al Masalma & Malcolm Heywood

Authors

Mihyar Al Masalma
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malcolm Heywood.

Ethics declarations

Competing interests

The authors confirm that they have no competing interests that are directly or indirectly related to the work submitted for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Example solutions

Three examples of evolved solutions and their corresponding function under simplified test sequences are provided, in Table 10. These solutions are sufficient to generalize to a wide range of alternative inputs. The code is presented as evolved without attempting to simplify. In all three cases the ‘Div’ instruction set is assumed (Table 1). Tables 11, 12 and 13 illustrate how the various inputs and program outputs change for examples of each task. For clarity, we only show \(y_h\) on Sequence Classification and ignore the data inputs on the Copy task as they do not have a role to play in the evolved control behaviour(s).

Table 10 Example champion programs evolved using the Div instruction set

Full size table

Table 11 Operation of sequence recall program from Table 10 for example task

Full size table

Table 12 Operation of sequence classification program from Table 10 for an example task

Full size table

Table 13 Operation of copy task program from Table 10 for an example task

Full size table

Some general behaviours to note include the use of action priority order when ties appear between actions and the effective behaviour of \(a_1\) to identify the recall phase of the Copy task. From the evolved control action perspective, the programs associated with no_op tended to be the more complex, both behaviourally and from the perspective of the evolved code. It is also apparent that pop_tail output is never used in these examples. However, this need not be the case. For example, under the sequence classification task pop_head and pop_tail are interchangeable without detracting from the behaviour of the ensemble.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Al Masalma, M., Heywood, M. Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks. Genet Program Evolvable Mach 23 (Suppl 1), 1–29 (2022). https://doi.org/10.1007/s10710-022-09446-8

Download citation

Received: 17 May 2022
Revised: 24 October 2022
Accepted: 13 November 2022
Published: 30 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10710-022-09446-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks

Abstract

Access this article

Similar content being viewed by others

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Investigating Deep Recurrent Connections and Recurrent Memory Cells Using Neuro-Evolution

Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix A: Example solutions

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks

Abstract

Access this article

Similar content being viewed by others

Evolving hierarchical memory-prediction machines in multi-task reinforcement learning

Investigating Deep Recurrent Connections and Recurrent Memory Cells Using Neuro-Evolution

Challenges in High-Dimensional Reinforcement Learning with Evolution Strategies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix A: Example solutions

Appendix A: Example solutions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation