The influence of population size in geometric semantic GP
Introduction
As reported in several studies (see for instance [8], [4], [5], [16], [37], [39]) the performance of Genetic Programming (GP) [25] is strongly dependent on the value of a set of parameters. Among those parameters, one that has a deep impact on GP's functioning is the size of the population, i.e. the number of candidate solutions that are evolved. In particular, the size of the population is involved in several phenomena that characterize GP. For instance, population size and population diversity are related to premature convergence [45], [7] and it was also hypothesized that the population size is related to the occurrence of bloat [34]. Furthermore, while existing studies suggest that bloat and overfitting are unrelated phenomena [43], other studies hint the existence of a relation between these phenomena [32]. Under this perspective, an incorrect choice of the population size may be one of the reasons for the overfitting of training data. For all these reasons, an accurate choice of the value of this parameter is often crucial. For instance, a small population may result in premature convergence or in poor performance of GP. On the other hand, a large population may cause a slowdown of the algorithm due to the high number of fitness evaluations that are needed.
The study of the parameters that characterize GP is an important hot topic, in particular when geometric semantic operators, defined by Moraglio and coauthors in 2013 [28], are used to explore the search space [44]. The definition of these operators, in fact, has opened a new research line in the GP community, and a lot of theoretical studies have appeared [29], [31]. Besides being grounded in a strong body of theory, the use of these genetic operators has produced substantially better results, compared to standard GP, on a number of problems, both benchmarks [42] and real-world applications [12], [13].
The objective of this paper is to study the role of population size on the learning process of GP when geometric semantic operators are used. In particular, we want to investigate the role of the population size in achieving good quality models, both on training and unseen data. This study has been performed considering several test problems and different population size values, including GP with only one candidate solution in the population.
The paper is organized as follows: Section 2 presents previous works related to the importance of population size in evolutionary computation, pointing out some interesting findings; Section 3 reports the definition of the geometric semantic operators presented in [28]; Section 4 presents the experimental settings and the obtained results, discussing the effect of different population size values on the learning process. In particular, an analysis of the quality of the obtained models and their ability to generalize on unseen data is proposed. Finally, Section 5 concludes the paper and provides hints for future research directions.
Section snippets
Population size: previous and related work
The study of the effect of the population size in evolutionary algorithms has been investigated in several works so far. In this section a brief literature review is presented, in order to frame our work in the context of the existing studies. The first studies that have appeared concern Genetic Algorithms (GAs): in [18] the notion of genetic drift was introduced and a study on the relation between genetic drift and population size has been reported. Genetic drift was defined as an effect based
Geometric semantic operators
Even though the term semantics can have several different interpretations, it is a common trend in the GP community (and this is what we do also here) to identify the semantics of a solution with the vector of its output values on the training data [44]. Under this perspective, a GP individual can be identified with a point (its semantics) in a multidimensional space that we call semantic space. The term Geometric Semantic Genetic Programming (GSGP) indicates a recently introduced variant of GP
Test problems and experimental settings
For the experimental study presented in this section, we have decided to consider eight different test problems: six of them are complex real-life problems, while two of them are well-known theoretical synthetic functions. All these problems have been widely used as benchmarks for GP and a discussion of all of them can be found in [27]. The objective of three of the real-life problems taken into account is the prediction of different pharmacokinetic parameters of potentially new drugs: human
Conclusions
Several studies have discussed the importance of a correct choice of the parameters that characterize evolutionary algorithms and, more in particular, genetic programming (GP). These studies have shown that the performance of GP is strongly dependent on the values of some parameters. Hence, considering the difficulty that characterize the parameter tuning phase, a plethora of contributions has appeared trying to analyze the impact of the different parameters. With the recent definition of
References (47)
- et al.
Prediction of high performance concrete strength using genetic programming with geometric semantic genetic operators
Expert Syst. Appl.
(2013) - et al.
Prediction of the unified parkinson's disease rating scale assessment using a genetic programming system with geometric semantic genetic operators
Expert Syst. Appl.
(2014) - J. Arabas, Z. Michalewicz, J. Mulawka, Gavaps—a genetic algorithm with varying population size, in: 1994 Proceedings of...
- F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi, Genetic programming for human oral bioavailability of drugs, in:...
- F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi, Genetic programming and other machine learning approaches to predict...
- T. Bäck, Self-adaptation in genetic algorithms, in: Proceedings of the First European Conference on Artificial Life,...
- T. Bäck, Optimal mutation rates in genetic search, in: Proceedings of the 5th International Conference on Genetic...
- T. Brooks, D. Pope, A. Marcolini, Airfoil Self-Noise and Prediction, Technical Report, NASA RP-1218,...
- E. Burke, S. Gustafson, G. Kendall, N. Krasnogor, Advanced population diversity measures in genetic programming, in:...
- M. Castelli, L. Manzoni, L. Vanneschi, Parameter tuning of evolutionary reactions systems, in: GECCO ’12: Proceedings...
Self-tuning geometric semantic genetic programming
Genet. Program. Evol. Mach.
A C++ framework for geometric semantic genetic programming
Genet. Program. Evol. Mach.
Parameter evaluation of geometric semantic genetic programming in pharmacokinetics
Int. J. BioInspir. Comput.
Semantic search-based genetic programming and the effect of intron deletion
IEEE Trans. Cybern.
Optimization of control parameters for genetic algorithms
IEEE Trans. Syst. Man Cybern.
Simulation of concrete slump using neural networks
Constr. Mater.
Cited by (10)
A study of dynamic populations in geometric semantic genetic programming
2023, Information SciencesSemantic schema based genetic programming for symbolic regression
2022, Applied Soft ComputingCitation Excerpt :Moreover, in this method, the standard crossover has a probability of occurrence, so that diversity could be kept high. The effect of population size on geometric semantic genetic programming has been studied in [63] and the operators probabilities have been adjusted by the proposed algorithm of [64]. In what follows, some of the best versions of genetic programming will be introduced that tried to make gradual evolution through data layering, have high generalization, and are employed for comparison with the proposed method.
Semantic tournament selection for genetic programming based on statistical analysis of error vectors
2018, Information SciencesCitation Excerpt :The genetic search operators of crossover and mutation can be modified to improve the semantic locality of search [9,30,34]. In addition, the preservation of semantic diversity is a desirable feature of an evolving GP population to avoid local optima [5,12], thus, it is also attractive to examine whether using the error vectors of individuals on the fitness cases during selection can improve GP performance. In our preliminary research [6], we have proposed two forms of semantic tournament selection that are based on statistical analysis of the error vectors of individuals.
An investigation of geometric semantic gp with linear scaling
2023, GECCO 2023 - Proceedings of the 2023 Genetic and Evolutionary Computation ConferenceComparative Study of Impacts of Typical Bio-Inspired Optimization Algorithms on Source Inversion Performance
2022, Frontiers in Environmental Science