Reinforced Genetic Programming

Downing, Keith L.

doi:10.1023/A:1011953410319

Keith L. Downing¹

219 Accesses
19 Citations
3 Altmetric
Explore all metrics

Abstract

This paper introduces the Reinforced Genetic Programming (RGP) system, which enhances standard tree-based genetic programming (GP) with reinforcement learning (RL). RGP adds a new element to the GP function set: monitored action-selection points that provide hooks to a reinforcement-learning system. Using strong typing, RGP can restrict these choice points to leaf nodes, thereby turning GP trees into classify-and-act procedures. Then, environmental reinforcements channeled back through the choice points provide the basis for both lifetime learning and general GP fitness assessment. This paves the way for evolutionary acceleration via both Baldwinian and Lamarckian mechanisms. In addition, the hybrid hints of potential improvements to RL by exploiting evolution to design proper abstraction spaces, via the problem-state classifications of the internal tree nodes. This paper details the basic mechanisms of RGP and demonstrates its application on a series of static and dynamic maze-search problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

D. H. Ackley and M. L. Littman, “Interactions between learning and evolution,” in Artificial Life II, C. G. Langton, C. Taylor, J. D. Farmer, and S. Rasmussen (eds.), Addison-Wesley, Reading, MA, 1992, pp. 487-509.
Google Scholar
J. M. Baldwin, A new factor in evolution, American Naturalist vol. 30, pp. 441-451, 1896.
Google Scholar
K. A. DeJong, “Genetic-algorithm-based learning,” in Machine Learning, Y. Kodratoff and R. Michalski (eds.), vol. 3, Morgan Kaufmann: San Francisco, 1990, pp. 611-638.
Google Scholar
G. E. Hinton and S. J. Nowlan, “How learning can guide evolution,” Complex Syst. vol. 1, pp. 495-502, 1987.
Google Scholar
J. H. Holland, Adaptation in Natural and Artificial Systems, 2nd ed., The MIT Press: Cambridge, MA, 1992.
Google Scholar
C. R. Houck, J. A. Joines, M. G. Kay, and J. R. Wilson, “Empirical investigation of the benefits of partial Lamarckianism,” Evolutionary Comput. vol. 5, pp. 31-60, 1997.
Google Scholar
H. Iba, “Multi-agent reinforcement learning with genetic programming,” in Genetic Programming 1998: Proc. Third Annual Conf., J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo (eds.), Morgan Kaufmann: San Francisco, 1998, pp. 167-172.
Google Scholar
H. Kitano, “Deisgning neutral networks using genetic algorithms with graph generation system,” Complex syst. vol. 4, pp. 461-467, 1990.
Google Scholar
J. R. Koza, Genetic Programming: On the Programming of Computers by Natural Selection, MIT Press: Cambridge, MA, 1992.
Google Scholar
J. B. Lamarck, “Of the influence of the environment on the activities and habits of animals, and the influence of the activities and habits of these living bodies in modifiying their organization and structure,” Zool. Philos., pp. 106-127, 1914.
P. L. Lanzi and S. W. Wilson, “Toward optimal classifier system performance in non-markov environments,” Evolution Comput. vol. 8, pp. 393-418, 2000.
Google Scholar
G. Mayley, “Landscapes, learning costs and genetic assimilation,” Evolutionary Comput. vol. 4, 1996.
G. F. Miller, P. M. Todd, and S. U. Hedge, “Designing neutral networks using genetic algorithms,” in Proc. Third Int. Conf. Genetic Algorithms, Morgan Kaufmann: San Francisco, 1989, pp. 379-384.
Google Scholar
R. L. Riolo, “Bucket brigade performance: I. Long sequences of classifiers,” in Proc. Second Int. Conf. Genetic Algorithms, J. J. Grefenstette (ed.), Lawrence Erlbaum Association: Mahwah, NJ, 1987, pp. 184-195.
Google Scholar
R. L. Riolo, “Lookahead planning and latent learning in a classifier system,” in Proc. First Int. Conf. Simulation of Adaptive Behavior: From Animals to Animats, J.-A. Meyer and S. W. Wilson (eds.), MIT Press: Cambridge, MA, 1991, pp. 316-326.
Google Scholar
G. G. Robertson and R. L. Riolo, “A tale of two classifier systems,” Machine Learning vol. 3, pp. 139-159, 1988.
Google Scholar
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press: Cambridge, MA, 1998.
Google Scholar
S. Taylor, “Using Lamarckian evolution to increase the effectiveness of neutral network training with a genetic algorithm and backpropagation,” in Artificial Life at Stanford 1994, J. R. Koza (ed.), Stanford Bookstore: Stanford, CA, 1994, pp. 181-186.
Google Scholar
A. Teller, “The internal reinforcement of evolving algorithms,” in Advances in Genetic Programming 3, L. Spector, W. B. Langdon, U.-M. O'Reilly, and P. J. Angeline (eds.), MIT Press: Cambridge, MA, 1999, pp. 325-354.
Google Scholar
G. Tesauro, “Temporal difference learning and TD-Gammon,” Commun. ACM vol. 38, pp. 58-68, 1995.
Google Scholar
P. Turney, L. D. Whitley, and R. W. Anderson, “Introduction to the special issue: Evolution, learning, and instinct: 100 years of the Baldwin effect,” Evolutionary Comput. vol. 4, pp. iv-viii, 1997.
Google Scholar
C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, pp. 297-292, 1992.
Google Scholar
D. L. Whitley, V. S. Gordon, and K. E. Mathias, “Lamarckian evolution, the Baldw in effect and function optimization,” in Parallel Problem Solving from Nature-PPSN III, Y. Davidor, H.-P. Schwefel, and R. Manner (eds.), Springer-Verlag: Berlin, 1994, pp. 6-15.
Google Scholar
S. W. Wilson and D. E. Goldberg, “A critical reviewof classifier systems,” in Proc. 3rd Int. Conf. Genetic Algorithms (ICGA89), J. D. Schaffer (ed.), Morgan Kaufmann: San Francisco, CA, 1989, pp. 244-255.
Google Scholar
L. Yaeger, “Computational genetics, physiology, metabolism, neutral systems, learning, vision and behavior or polyworld: Life in a new context,” in Artificial Life III, Proc. vol. XVII, C. G. Langton (ed.), Addison-Wesley, Reading, MA, 1994, Santa Fe Institute Studies in the Sciences of Complexity, pp. 263-298.
Google Scholar
W. Zhang and T. G. Dietterich, “A reinforcement learning approach to job-shop scheduling,” in Proc. Int. Joint Conf. Artificial Intelligence, C. S. Mellish, (ed.), Morgan Kaufmann: San Francisco, CA, 1995, pp. 1114-1120.
Google Scholar

Download references

Author information

Authors and Affiliations

The Norwegian University of Science and Technology, Trondheim, Norway
Keith L. Downing

Authors

Keith L. Downing
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Downing, K.L. Reinforced Genetic Programming. Genetic Programming and Evolvable Machines 2, 259–288 (2001). https://doi.org/10.1023/A:1011953410319

Download citation

Issue Date: September 2001
DOI: https://doi.org/10.1023/A:1011953410319

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforced Genetic Programming

Abstract

Access this article

Similar content being viewed by others

Evolutionary Computation and the Reinforcement Learning Problem

The Evolutionary Buffet Method

Solution and Fitness Evolution (SAFE): Coevolving Solutions and Their Objective Functions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Reinforced Genetic Programming

Abstract

Access this article

Similar content being viewed by others

Evolutionary Computation and the Reinforcement Learning Problem

The Evolutionary Buffet Method

Solution and Fitness Evolution (SAFE): Coevolving Solutions and Their Objective Functions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation