Abstract
Reactive learning agents cannot solve partially observable sequential decision-making tasks as they are limited to defining outcomes purely in terms of the observable state. However, augmenting reactive agents with external memory might provide a path for addressing this limitation. In this work, external memory takes the form of a linked list data structure that programs have to learn how to use. We identify conditions under which additional recurrent connectivity from program output to input is necessary for state disambiguation. Benchmarking against recent results from the neural network literature on three scalable partially observable sequential decision-making tasks demonstrates that the proposed approach scales much more effectively. Indeed, solutions are shown to generalize to far more difficult sequences than those experienced under training conditions. Moreover, recommendations are made regarding the instruction set and additional benchmarking is performed with input state values designed to explicitly disrupt the identification of useful states for later recall. The protected division operator appears to be particularly useful in developing simple solutions to all three tasks.
Similar content being viewed by others
Notes
However, this says nothing about the ease with which reactive versus non-reactive agents might discover optimal policies.
Potentially reflecting any number of events that the agent experienced in the past.
Register values are not reset between consecutive program executions.
Other drawbacks, such as the loss of gradient information are specific to the gradient-descent form of credit assignment [12], thus outside the purview of this review.
Such as a forwhile(a, b, c) instruction.
Different control signals were given access to different subsets of ADFs.
The stack and queue had 5 control signals, and the list 10.
Identify if a sequence of brackets has a matching number of open and close brackets
Three types of bracket in which the goal is to declare whether the brackets in the sequence are correctly paired.
This function was only necessary for the Copy Task (Sect. 4.3).
In the case of multiple tied actions, then the following priority order is assumed between actions: push > pop_head > no_op > pop_tail
Also referred to as corridor length.
Task state, s(t), value at the head of the linked list, \(y_h(t-1)\), and past internal state, \(a_1(t-1)\).
NEAT does not explicitly enforce an ensemble, thus potentially not benefiting from a prior decomposition of the task. We, therefore, provide NEAT with a larger population and more generations to search for solutions.
We plot the performance of the best individual from each generation under test conditions.
Non-parametric version of the ANOVA statistic.
NEAT and three instances of ensemble GP (Div, Mult, All).
Five depths of Sequence Recall and Sequence Classification and one multi-depth case of the Copy task.
The original mean and standard deviation for the ‘Div’ instruction set under test was \(95\%\) (21.8) compared to \(100\%\) accuracy (0 variance) under the test of the non-zero version of the task.
References
A.K. Agogino, K. Tumer, Efficient evaluation functions for evolving coordination. Evol. Comput. 16(2), 257–288 (2008)
D. Andre, Evolution of mapmaking: Learning, planning, and memory using genetic programming, in: Proceedings of the IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence. (IEEE, 1994), pp. 250–255
S. Brave, The evolution of memory and mental models using genetic programming, in Proceedings of the Annual Conference on Genetic Programming. (MIT, 1996), pp. 261–266
J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
J.L. Elman, Finding structure in time. Cognet. Sci. 14(2), 179–211 (1990)
F. Fortin, F.D. Rainville, M. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
F.J. Gomez, J. Schmidhuber, R. Miikkulainen, Accelerated neural evolution through cooperatively coevolved synapses. J. Mach. Learn. Res. 9, 937–965 (2008)
A. Graves, G. Wayne, I. Danihelka, Neural turing machines, pp. 1–26 (2014). CoRR arXiv:abs/1410.5401
A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S.G. Colmenarejo, E. Grefenstette, T. Ramalho, J.P. Agapiou, A.P. Badia, K.M. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, D. Hassabis, Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)
K. Greff, R.K. Srivastava, J. Koutník, B.R. Steunebrink, J. Schmidhuber, LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)
R.B. Greve, E.J. Jacobsen, S. Risi, Evolving neural turing machines for reward-based learning, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2016), pp. 117–124
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
S. Kelly, R.J. Smith, M.I. Heywood, W. Banzhaf, Emergent tangled program graphs in partially observable recursive forecasting and ViZDoom navigation tasks. ACM Trans. Evolut. Optim. Learn. 1(3), 1–41 (2021)
S. Kelly, T. Voegerl, W. Banzhaf, C. Gondro, Evolving hierarchical memory-prediction machines in multi-task reinforcement learning. Genet. Program Evol. Mach. 22(4), 573–605 (2021)
S. Khadka, J.J. Chung, K. Tumer, Neuroevolution of a modular memory-augmented neural network for deep memory problems. Evol. Comput. 27(4), 639–664 (2019)
A. Lalejini, M.A. Moreno, C. Ofria, Tag-based regulation of modules in genetic programming improves context-dependent problem solving. Genet. Program Evol. Mach. 22(3), 325–355 (2021)
W.B. Langdon, Genetic Programming and Data Structures (Kluwer Academic, 1998)
X. Luo, M.I. Heywood, A.N. Zincir-Heywood, Evolving recurrent models using linear GP, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2005), pp. 1787–1788
M.A. Masalma, M.I. Heywood, Genetic programming with external memory in sequence recall task, in Proceedings of the Genetic and Evolutionary Computation Conference (companion). (ACM, 2022)
N.F. McPhee, R. Poli, Memory with memory: soft assignment in genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2008), pp. 1235–1242
L. Panait, S. Luke, R.P. Wiegand, Biasing coevolutionary search for optimal multiagent behaviors. IEEE Trans. Evol. Comput. 10(6), 629–645 (2006)
R. Poli, N.F. McPhee, L. Citi, E.F. Crane, Memory with memory in tree-based genetic programming, in Proceedings of the European Conference on Genetic Programming, LNCS, vol. 5481. (Springer, 2009), pp. 25–36
M.A. Potter, K.A.D. Jong, Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evol. Comput. 8(1), 1–29 (2000)
A. Rawal, R. Miikkulainen, Evolving deep lstm-based memory networks using an information maximization objective, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2016), pp. 501–508
H.T. Siegelmann, E.D. Sontag, On the computational power of neural nets. J. Comput. Syst. Sci. 50(1), 132–150 (1995)
A. Silva, A. Neves, E. Costa, Evolving controllers for autonomous agents using genetically programmed networks, in Proceedings of the European Conference on Genetic Programming, LNCS, vol. 1598. (Springer, 1999), pp. 255–269
R.J. Smith, M.I. Heywood, Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory, in Proceedings of the Genetic and Evolutionary Computation Conference. (ACM, 2019), pp. 179–187
L. Spector, S. Luke, Cultural transmission of information in genetic programming, in Proceedings of the Annual Conference on Genetic Programming. (MIT Press, 1996), pp. 209–214
K.O. Stanley, R. Miikkulainen, Evolving neural network through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT, 2018)
A. Teller, Turing completeness in the language of genetic programming with indexed memory, in Proceedings of the IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence. (IEEE, 1994), pp. 136–141
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors confirm that they have no competing interests that are directly or indirectly related to the work submitted for publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Example solutions
Appendix A: Example solutions
Three examples of evolved solutions and their corresponding function under simplified test sequences are provided, in Table 10. These solutions are sufficient to generalize to a wide range of alternative inputs. The code is presented as evolved without attempting to simplify. In all three cases the ‘Div’ instruction set is assumed (Table 1). Tables 11, 12 and 13 illustrate how the various inputs and program outputs change for examples of each task. For clarity, we only show \(y_h\) on Sequence Classification and ignore the data inputs on the Copy task as they do not have a role to play in the evolved control behaviour(s).
Some general behaviours to note include the use of action priority order when ties appear between actions and the effective behaviour of \(a_1\) to identify the recall phase of the Copy task. From the evolved control action perspective, the programs associated with no_op tended to be the more complex, both behaviourally and from the perspective of the evolved code. It is also apparent that pop_tail output is never used in these examples. However, this need not be the case. For example, under the sequence classification task pop_head and pop_tail are interchangeable without detracting from the behaviour of the ensemble.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Al Masalma, M., Heywood, M. Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks. Genet Program Evolvable Mach 23 (Suppl 1), 1–29 (2022). https://doi.org/10.1007/s10710-022-09446-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-022-09446-8