Genetic programming with semantic equivalence classes
Introduction
The use of semantic methods is one of the hottest topics in Genetic Programming (GP) [1] and has recently attracted noteworthy attention from researchers, particularly in the applied domain of symbolic regression [2]. Let be the set of input data, or fitness cases, of a symbolic regression problem, and the vector of the respective expected output or target values (in other words, for each i = 1, 2, …, n , t i is the expected output corresponding to input ). A GP individual (or program) P can be seen as a function that for each input vector returns the scalar value . Following [3], we call the semantics of P the vector . This vector can be represented as a point in an n -dimensional space, which we call a s emantic space, that can be counterpoised to the syntactic or genotypic space, where individuals are represented by programs. While each program has one and only one semantics, the mapping between the semantic space and the genotypic space is generally not a bijection because several programs can have the same semantics. The target vector itself is a point in the semantic space and, in general, the objective of GP is to find at least one program in the genotypic space that maps into in the semantic space.
Several different methods to exploit semantic awareness in GP have so far been presented. For relatively complete surveys, the reader is referred to [[4], [5], [6]]. The work presented here proposes a new idea for exploiting semantic awareness in GP: semantics-based equivalence classes. Our concept of an equivalence class is such that, once an individual in a class is found, it must be possible, and easy, to generate all the other individuals in that class. In this way, if we are able to find one solution that is in the same equivalence class as a globally optimal solution, then we are able to solve the problem by reconstructing the global optimum analytically. Two possible realizations of this idea are proposed in this paper. In the first, two individuals P 1 and P 2 are considered in the same equivalence class if . In such a situation, we are identifying a collection of affine subspaces in the semantic space. In this collection, each subspace is a straight line (or hyperplane, if we think in n dimensions) and all the lines belonging to this collection are parallel to each other. In a system like this, it makes sense to use the variance of the difference between the coordinates of the semantic vector and the target as fitness. When this variance is equal to zero, the semantic vector of the individual is in the same equivalence class as the target, and the search can terminate, analytically reconstructing a globally optimal solution. In the second proposed system, two individuals P 1 and P 2 are in the same equivalence class if , where k is a constant such that k ≠0. In such a situation, we are defining a projective space (i.e. the space of all the straight lines – or hyperplanes, if we think in n dimensions – intersecting the origin) in the semantic space. The objective of GP then is to find any solution whose semantics is directly proportional to the target (i.e. that stands on the same straight line as the one intersecting the target and the origin). It therefore makes sense to define a new fitness function, corresponding to the variance of the ratios between the coordinates of the semantics of an individual and . When this variance equals zero, the individual is in the same equivalence class as the target, and the search can terminate, analytically reconstructing a globally optimal solution. The first of these two systems is called GPPLUS (given that addition is the operator that allows us to reconstruct the target), while the second one is called GPMUL (given that the target can be reconstructed using multiplication). Moreover, we introduce two new GP systems that are similar to GPPLUS and GPMUL, but in which a filter that allows us to reject individuals is applied. Specifically, a newly generated individual is rejected if another individual belonging to the same equivalence class already exists in the population. These “filtered” versions are called FGPPLUS and FGPMUL, respectively. The performance of GPPLUS, GPMUL, F GPPLUS, and FGPMUL is compared to one of the best known state-of-the-art semantics-based GP systems: Geometric Semantic GP (GSGP), introduced in Ref. [3] and described in detail in Ref. [7]. As test cases, we choose seven complex real-life symbolic regression problems, from as many different applied domains. Finally, in order to show the appropriateness of semantic filters also on systems that are not directly based on the idea of semantic equivalence classes, we apply filters to one of the most used GP techniques: linear scaling [8,9]. For this reason, linear scaling without filters (called LS) and linear scaling with filters (called FLS) are also included in the comparative study.
The paper is structured as follows: Section 2 discusses previous and related works. In Section 3, we present the general concept of a semantics-based equivalence class and we define GPPLUS, GPMUL, FGPPLUS, and FGPMUL. Section 4 presents the experimental study. Specifically, the experimental settings and test problems are presented and, subsequently, the obtained results are reported and discussed. Finally, Section 5 concludes the paper, by also discussing ideas for future research.
Section snippets
Previous and related work
The definition of semantics-based GP systems is a hot topic in the field of evolutionary computation and several works that have appeared so far can be considered as relating to this paper or having inspired the work presented here. In Ref. [3], Moraglio and co-authors introduce genetic operators that have a direct effect on the semantics of the offspring. GP using these operators was called Geometric Semantic Genetic Programming (GSGP). These operators induce a unimodal fitness landscape for
Methodology
Our objective is to use equivalence classes (EQCs) to create partitions of the solution space and to explore the space of EQCs instead of the space of single solutions. To achieve this objective, we start by defining equivalence relationships using criteria based on semantics. We introduce a relationship that we call equivalence function (EF ). EF receives two semantics as arguments, and returns a vector of the same cardinality:We say that two individuals are equivalent (i.e. they
Systems and test problems
The objective of the experimental study is to compare the methods GPPLUS, GPMUL, LS and their filtered variants FGPPLUS, FGPMUL, and FLS. To ensure a better understanding of the quality of the performance of these systems, we also consider the performance obtained by geometric semantic genetic programming (GSGP) [3]. We also decided not to report the results achieved by standard GP because a preliminary study indicated that standard GP was consistently outperformed by all the presented methods
Conclusions and future work
In this paper, the idea of semantics-based equivalence classes for Genetic Programming (GP) was presented. This idea is general, and it can be implemented in several different ways. In this paper, it was implemented by means of two simple GP systems, called GPPLUS and GPMUL. Each of these systems uses a different definition of equivalence: GPPLUS identifies a collection of affine subspaces in the semantic space, while GPMUL defines a projective space in the semantic space. Moreover, filtered
References (30)
- et al.
Prediction of high performance concrete strength using genetic programming with geometric semantic genetic operators
Expert Syst. Appl.
(2013) - et al.
Prediction of the unified Parkinson's disease rating scale assessment using a genetic programming system with geometric semantic genetic operators
Expert Syst. Appl.
(2014) - et al.
Semantic genetic programming for fast and accurate data knowledge discovery
Swarm Evolut. Comput.
(2016) Genetic Programming: on the Programming of Computers by Means of Natural Selection
(1992)The goodness of fit of regression formulae, and the distribution of regression coefficients
J. Roy. Stat. Soc.
(1922)- et al.
Geometric semantic genetic programming
- et al.
A survey of semantic methods in genetic programming
Genet. Program. Evolvable Mach.
(2014) Semantic methods in genetic programming
Genet. Program. Evolvable Mach.
(2016)- et al.
Review and comparative analysis of geometric semantic crossovers
Genet. Program. Evolvable Mach.
(2015) - L. Vanneschi, An Introduction to Geometric Semantic Genetic Programming, Springer International Publishing, Cham, pp....
Improving symbolic regression with interval arithmetic and linear scaling
Scaled symbolic regression
Genet. Program. Evolvable Mach.
A new implementation of geometric semantic GP and its application to problems in pharmacokinetics
Semantic genetic programming operators based on projections in the phenotype space
Res. Comput. Sci.
Cited by (9)
Optimizing genetic programming by exploiting semantic impact of sub trees
2021, Swarm and Evolutionary ComputationCitation Excerpt :This resulted in reducing the code bloat without adversely affecting the semantics of the tree. Ruberto et al. [22]. introduced the concept of equivalence classes in genetic programming to solve symbolic regression problems.
Genetic programming performance prediction and its application for symbolic regression problems
2019, Information SciencesSemantic Linear Genetic Programming for Symbolic Regression
2024, IEEE Transactions on CyberneticsA semantic genetic programming framework based on dynamic targets
2021, Genetic Programming and Evolvable MachinesSGP-DT: Towards effective symbolic regression with a semantic GP approach based on dynamic targets
2020, GECCO 2020 Companion - Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion