Semantic tournament selection for genetic programming based on statistical analysis of error vectors
Introduction
Genetic Programming (GP) is a biologically inspired method of using a computer to evolve solutions, in the form of computer programs, for a problem [24], [37]. To solve a problem using a GP system, a population of individuals is first initialised. The population is then evolved, under fitness based selection, through a number of generations by applying genetic operators. The evolutionary process terminates when a desired solution is found or when the maximum number of generations is exceeded.
There are several factors that can affect the performance of GP for a given problem. These factors include the size of the population, the fitness evaluation of individuals, the selection mechanisms for reproduction and the genetic operators for modifying individuals. Amongst these, selection plays a critical role in GP performance [4]. To date, there have been many selection schemes proposed [23] and the most widely used selection in GP is tournament selection [11].
Tournament selection compares the fitness values of sampled individuals. The individual with the best fitness is then selected as the winner. This implementation is simple and its effectiveness has been widely evidenced [11]. However, the standard approach only uses the fitness value while ignoring information from the error vectors of individuals in all fitness cases. Consequently, some information that is potentially useful for GP search may be lost. Recent research has shown that significant benefit could be gained by using semantic information of GP individuals (e.g., [21], [22], [28], [31], [35]). The genetic search operators of crossover and mutation can be modified to improve the semantic locality of search [9], [30], [34]. In addition, the preservation of semantic diversity is a desirable feature of an evolving GP population to avoid local optima [5], [12], thus, it is also attractive to examine whether using the error vectors of individuals on the fitness cases during selection can improve GP performance.
In our preliminary research [6], we have proposed two forms of semantic tournament selection that are based on statistical analysis of the error vectors of individuals. The experimental results on a set of GP benchmark problems showed the benefit of the proposed techniques [6]. In this paper, we extend this research with the main contributions of this paper being:
- •
We introduce the use of statistical analysis of GP error vectors to create novel forms of tournament selection. Based on a Wilcoxon signed rank test, three variants of tournament selection are proposed to exploit semantic diversity and to explore the potential of the approach to control program bloat.
- •
The performance of the selection strategies are examined on a large set of regression problems employing the original problems and noisy variants. We observe that the new selection techniques help to reduce the code growth and improve the generalization ability of the evolved solutions when compared to standard tournament selection and a state of the art method for controlling code bloat in GP.
- •
The simplicity of the design of the proposed selection strategies allows for further improvements. In this paper, the addition of a state of the art crossover operator is observed to further enhance performance.
In the next section, we present the background of the paper. Section 3 reviews the related work on improving tournament selection in GP. Three proposed tournament selection strategies are presented in Section 4. Section 5 presents the experimental settings adopted in the paper. Section 6 analyses and compares the performance of the proposed selection strategies with standard tournament selection. The approach is further enhanced through it’s coupling to a state of the art crossover strategy in Section 7. Section 8 investigates the ability of the proposed techniques on noisy datasets. Finally, Section 9 concludes the paper and highlights some future work.
Section snippets
Background
This section presents some important concepts used in the proposed selection strategies, including the semantics of a GP individual, the error vector of an individual, and the Wilcoxon signed rank test.
In GP, it is common to define the semantics of a program simply as its behaviour with respect to a set of input values [27], [31]. Formally, the semantics of a program is defined as follows:
Definition 2.1 Let be the fitness cases of the problem. The program semantics S(P) of a program P is the
Related work
Selection is a key factor that affects the performance of Evolutionary Algorithms (EAs) [10]. Commonly used selection strategies in EAs include fitness proportionate selection, rank selection, and tournament selection [4]. The most popular selection method in GP is tournament selection [45]. In standard tournament selection, a number of individuals (tournament size) are randomly selected from the population. These individuals are compared with each other and the winner (in terms of better
Methods
This section presents three statistics tournament selection techniques using the Wilcoxon signed rank test. The objective is to select breeding parents based on the statistical test instead of on their fitness values. The first method is called statistics tournament selection with random [6] and shortened as TS-R. The main objective of TS-R is to promote the semantic diversity of GP population and the process of TS-R is similar to standard tournament selection. However, instead of using the
Experimental settings
We tested the proposed tournament selection techniques on twenty-five regression problems. Among them, fifteen problems are GP benchmark problems recommended in the literature [43] and an additional ten problems were taken from UCI machine learning repository [2]. For each problem, we also created a noisy version from the original (noiseless) form that results in twenty-five noisy datasets. Totally, fifty datasets were used for the experiments. The detailed description of the tested problems
Performance analysis of statistics tournament selection
This section analyses the performance of the statistics tournament selection methods and compares them with GP and semantic in selection (SiS) by Galvan-Lopez et al.[13]. The first metric used in the comparison is the generalisation ability of the tested methods [1], [42]. The median of the testing error across 100 runs is shown in Table 3. We can see that the testing error of SiS and GP are roughly equal. The difference between the testing error of two techniques is often marginal. SiS is only
Combining semantic tournament selection with semantic crossover
In this section, we present an improvement of TS-S (the best approach among three proposed selection techniques) performance by combining this technique with a recently proposed crossover - random desired operator (RDO) [36]. In other words, we used RDO instead of standard crossover in TS-S. The resulting GP system is called statistics tournament selection with random desired operator and referred to as TS-RDO. The reason for combining RDO with TS-S is that the training error of TS-S is often
Performance analysis on the noisy data
This section investigates the performance of five methods in Section 7 on the noisy data. In data mining, it has been observed that the problems will become harder when they are incorporated with noise [39], [40]. We created a noisy dataset from the original one by adding 10% Gaussian noise with zero mean and one standard deviation in the all features and objective function of the problems in Table 1. Moreover, the noise is installed for both the training and the testing data. The testing error
Conclusions and future work
In this paper, we introduced the idea of using a statistical test as part of an approach to semantic selection, which utilizes the error vectors of GP individuals. We proposed three variations of tournament selection that employed statistical analysis of these semantic vectors to select the winner for the mating pool. The proposed techniques aim at enhancing the semantic diversity and reducing the code bloat in GP population. The effectiveness of the approach was examined on a large number of
Acknowledgement
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2014.09. MON acknowledges the support of Science Foundation Ireland grants 13/IA/1850 and 13/RC/2094.
References (49)
- et al.
The influence of population size in geometric semantic gp
Swarm Evol. Comput.
(2017) - et al.
A comparative analysis of selection schemes used in genetic algorithms
Found.Genet.Algorithms
(1991) - et al.
Gp made faster with semantic surrogate modelling
Inf. Sci.
(2016) - et al.
Surrogate genetic programming: a semantic aware evolutionary search
Inf. Sci.
(2015) - et al.
Evolving genetic programming classifiers with novelty search
Inf. Sci.
(2016) - et al.
Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition
Knowl. Inf. Syst.
(2014) Introduction to Machine Learning
(2014)- K. Bache, M. Lichman, UCI machine learning repository, 2013,...
Selective pressure in evolutionary algorithms: acharacterization of selection mechanisms
Evolutionary Computation
(1994)- et al.
A comparison of selection schemes used in evolutionary algorithms
Evol. Comput.
(1996)
Tournament selection based on statistical test in genetic programming
International Conference on Parallel Problem Solving from Nature
Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis
A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms
Swarm Evol. Comput.
Semantic-based local search in multiobjective genetic programming
Proceedings of the Genetic and Evolutionary Computation Conference Companion
Introduction to Evolutionary Computing
A review of tournament selection in genetic programming
Proceedings of the International Conferences on Advances in Computation and Intelligence, ISICA 2010
Introducing semantic-clustering selection in grammatical evolution
GECCO 2015 Semantic Methods in Genetic Programming (SMGP’15) Workshop
Using semantics in the selection mechanism in genetic programming: a simple method for promoting semantic diversity
An investigation of supervised learning in genetic programming
Effects of lexicase and tournament selection on diversity recovery and maintenance
Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion (GECCO)
Solving uncompromising problems with lexicase selection
IEEE Trans. Evol. Comput.
Equivalence of probabilistic tournament and polynomial ranking selection
2008 IEEE Congress on Evolutionary Computation
Simulating exponential normalization with weighted k-tournaments
Evolutionary Computation
100 Statistical Tests
Cited by (15)
Non-revisiting genetic cost-sensitive sparse autoencoder for imbalanced fault diagnosis
2022, Applied Soft ComputingCitation Excerpt :In this work, we first adopt the tournament selection to choose the offspring individuals from the parent individuals. For tournament selection, it generates the offspring by repeatedly selecting the best individuals among a few randomly chosen individuals [42]. Then the crossover and mutation operators are applied with a certain probability in offspring.
Optimizing genetic programming by exploiting semantic impact of sub trees
2021, Swarm and Evolutionary ComputationCitation Excerpt :Similar to context semantics is another aspect of GP tree, exploited in many existing methods to improve crossover operator. Chu et al. [12]. proposed semantic tournament selection calculated from the error vector of the individual to improve the performance of GP.
Dynamic dispatching for interbay automated material handling with lot targeting using improved parallel multiple-objective genetic algorithm
2021, Computers and Operations ResearchSemantic approximation for reducing code bloat in Genetic Programming
2020, Swarm and Evolutionary ComputationCitation Excerpt :For each problem and each parameter setting, 30 runs were performed. We compared SA and DA with standard GP (referred to as GP), Prune and Plant (PP) [1], Statistics Tournament Selection with Size (TS-S) [11] and Random Desired Operator (RDO) [36]. PP is probably the most similar technique to SA, RDO is the inspiration for DA and TS-S is the most recently proposed bloat control method.
Genetic programming performance prediction and its application for symbolic regression problems
2019, Information SciencesSpatial-domain fitness landscape analysis for combinatorial optimization
2019, Information SciencesCitation Excerpt :In this experiment, we compare the effects of two crossover operators in an evolutionary process for solving COPs. The adopted selection operator is the tournament selection operator [45] and the two crossover operators are the simulated binary crossover (SBX) [41] and multiple-point crossover operators. The main aim is to compare the influence of crossover behaviors on COPs.