neat Genetic Programming: Controlling bloat naturally
Introduction
Genetic Programming (GP) [9], [10], [21] is an evolutionary computation (EC) paradigm used for automatic program induction, its general goal is to generate computer programs through an evolutionary search. In its most common form, GP can be understood as a supervised learning algorithm that attempts to construct a syntactically valid expression using a finite set of basic functions and input variables, guided by a domain dependent objective or cost function [21]. In its original form [9], GP is characterized by two main features that distinguishes it from other EC techniques. Firstly, evolved solutions represent valid syntactic expressions or programs, that might be used as models, predictors, operators or classifiers. The ability of GP to construct syntactic expressions directly, without assuming a prior model, can allow it to produce highly interpretable solutions, that not only solve the problem but also provide insights into the problem domain. Secondly, GP uses a variable length encoding scheme, where the set of candidate solutions contains programs of different size and shape.
EC literature contains many examples of the problem solving abilities of GP, that illustrate the flexibility of the search paradigm [8]. Indeed, GP can be understood as a hyper-heuristic, an algorithmic approach for the automatic synthesis of heuristic approaches, a view that has strong theoretical background and real-world applicability [19]. Despite its success, GP is still not used as an off-the-shelf methodology [16], in the way that, for example, Support Vector Machines or Linear Regression are used. This lack of wider acceptance stems from some important pragmatic limitations of the GP approach.
In particular, syntactic search can be inefficient, due to its poor local structure and ill-defined fitness landscape (refer to [18] and [22] where the authors reviewed some of the main open issues in GP). Among them, one of the most studied problems is the bloat phenomenon, which occurs when program trees tend to grow unnecessarily large, without a corresponding increase in fitness [21], [25]. In some sense, bloat seems to be an unavoidable consequence of the nature of the search space in GP and fitness driven search [11], [12]. Moreover, bloat causes several undesirable side effects, since evaluating large programs is more time consuming, and large solutions are more difficult to interpret. Therefore, multiple approaches have been studied to deal with bloat, ranging from modifications of the basic search operators up to investigating the use of different search spaces, such as semantic space [6], [33] and behavioral space [30].
This paper presents a novel approach toward bloat control, that leverages the insights of recent studies [23] and an algorithm originally developed for neuroevolution [28]. Silva [23] suggests that a powerful bloat control strategy is to induce a uniform distribution of program sizes within the evolving population. In particular, she proposed the Flat Operator Equalization bloat control method (Flat-OE), which explicitly forces the evolving population to follow a uniform distribution of program sizes, while the range of the distribution remains constant across all generations.
The contribution of our work is the development of a GP-based system based on the NeuroEvolution of Augmenting Topologies (NEAT) algorithm, which uses speciation to protect novel solution topologies and promote the incremental evolution of complexity [28]. In our recent work, we showed that NEAT can run bloat free, using a careful parametrization and system configuration [29]. However, it was unclear if the results obtained from neuroevolution could be replicated in a traditional GP domain.
The proposed algorithm is called neat-GP and it can be understood as a stripped down version of the original NEAT algorithm, which is adapted to the GP paradigm, designed to induce similar search dynamics as those shown by Flat-OE. Experiments are carried out using a tree-based representation and tested on several benchmark problems for both symbolic regression and classification. The results show that a neat-GP based search can outperform a standard GP search, based on test performance and especially with regards to solution size and depth. These results agree with those reported in [29], with the added advantage that the bloat control method does not incur in any additional computational cost exhibited by other state-of-the-art bloat control methods [23], [26].
The remainder of this paper proceeds as follows. Section 2 provides a comprehensive overview on both the bloat phenomenon and the NEAT algorithm, discussing the theoretical causes of bloat, state-of-the-art bloat control methods and how bloat relates to NEAT. The proposed neat-GP algorithm is presented in Section 3, discussing different possible variants and detailing important algorithm features. The experimental work is presented in Section 4, discussing system setup, benchmarking and results. Finally, a summary and concluding remarks are outlined in Section 5.
Section snippets
Background
This section presents a comprehensive discussion of the most relevant background topics related to the current research paper.
neat Genetic Programming
The goal of this work is to develop a bloat-free GP search, inspired by the insights of Flat-OE and built around the basic features of NEAT, that implicitly shapes the program size distribution. The proposed method is called neat-GP, taking the two evolutionary paradigms on which the proposed algorithm is based on and which serve as namesakes. Note that the name neat-GP should not be taken as an acronym, since what is evolved are program trees and not network structures. The name is just
Experimental setup
The proposed neat-GP algorithm is implemented using the Matlab GPLab toolbox developed by Silva and Almeida [24]. The GPLab-based implementation is freely available at our team’s homepage http://www.tree-lab.org/,5 along with an implementation that can be run over the DEAP framework for Python [5].
Different variants of the algorithm are tested, to illustrate the effect that each component has on performance, regarding
Conclusions and future work
This paper presents a new GP algorithm called neat-GP, that incorporates some of the main features of the NEAT algorithm and is able to implicitly shape the program size distribution during the search process. The method is loosely based on two well-known methods in evolutionary computation, Flat-OE and NEAT. It uses a similar overall strategy as the one proposed in Flat-OE, maintaining a diverse population of programs in terms of size throughout the search. However, while Flat-OE explicitly
Acknowledgments
Funding for this work was provided by CONACYT (Mexico) Basic Science Research Project no. 178323, DGEST (Mexico) Research Projects 5414.14-P and 5621.15-P, and FP7-PEOPLE-2013-IRSES project ACOBSEC financed by the European Commission with contract no. 612689. First author was supported by CONACYT scholarship no. 372126. The third author acknowledges funding provided by an ELEVATE Fellowship, the Irish Research Council’s Career Development Fellowship co-funded by Marie Curie Actions, and thanks
References (36)
- et al.
A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms
Swarm Evol. Comput.
(2011) The evolution of evolvability in genetic programming
- K. Bache, M. Lichman, UCI machine learning repository, 2013,...
- et al.
Operator equalisation and bloat free GP
Proceedings of the 11th European Conference on Genetic Programming, EuroGP’08
(2008) - et al.
DEAP: evolutionary algorithms made easy
J. Mach. Learn. Res.
(2012) - et al.
Using semantics in the selection mechanism in genetic programming: a simple method for promoting semantic diversity
2013 IEEE Congress on Evolutionary Computation (CEC)
(2013) - et al.
Genetic algorithms with sharing for multimodal function optimization
Proceedings of the Second International Conference on Genetic Algorithms on Genetic algorithms and their application
(1987) Human-competitive results produced by genetic programming
Gen. Prog. Evol. Mach.
(2010)Genetic Programming: On the Programming of Computers by Means of Natural Selection
(1992)- et al.
Foundations of Genetic Programming
(2002)
Fitness causes bloat
Proceedings of the Second On-line World Conference on Soft Computing in Engineering Design and Manufacturing
Fitness causes bloat: mutation
Proceedings of the First European Workshop on Genetic Programming, EuroGP ’98
Abandoning objectives: evolution through the search for novelty alone
Evol. Comput.
Searching for novel regression functions
Proceedings of the 2013 IEEE Congress on Evolutionary Computation (CEC)
A comparison of fitness-case sampling methods for symbolic regression with genetic programming
EVOLVE – A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V
FFX: fast, scalable, deterministic symbolic regression technology
Genetic programming needs better benchmarks
Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, GECCO ’12
Open issues in genetic programming
Gen. Prog. Evol. Mach.
Cited by (47)
Hyper-heuristics: A survey and taxonomy
2024, Computers and Industrial EngineeringSemantic approximation for reducing code bloat in Genetic Programming
2020, Swarm and Evolutionary ComputationSOAP: Semantic outliers automatic preprocessing
2020, Information SciencesCellular geometric semantic genetic programming
2024, Genetic Programming and Evolvable MachinesJaws 30
2023, Genetic Programming and Evolvable Machines