Elsevier

Swarm and Evolutionary Computation

Volume 44, February 2019, Pages 453-469
Swarm and Evolutionary Computation

Genetic programming with semantic equivalence classes

https://doi.org/10.1016/j.swevo.2018.06.001Get rights and content

Abstract

In this paper, we introduce the concept of semantics-based equivalence classes for symbolic regression problems in genetic programming. The idea is implemented by means of two different genetic programming systems, in which two different definitions of equivalence are used. In both systems, whenever a solution in an equivalence class is found, it is possible to generate any other solution in that equivalence class analytically. As such, these two systems allow us to shift the objective of genetic programming: instead of finding a globally optimal solution, the objective is now to find any solution that belongs to the same equivalence class as a global optimum. Further, we propose improvements to these genetic programming systems in which, once a solution that belongs to a particular equivalence class is generated, no other solution in that class is accepted in the population during the evolution anymore. We call these improved versions filtered systems. Experimental results obtained via seven complex real-life test problems show that using equivalence classes is a promising idea and that filters are generally helpful for improving the systems' performance. Furthermore, the proposed methods produce individuals with a much smaller size with respect to geometric semantic genetic programming. Finally, we show that filters are also useful to improve the performance of a state-of-the-art method, not explicitly based on semantic equivalence classes, like linear scaling.

Introduction

The use of semantic methods is one of the hottest topics in Genetic Programming (GP) [1] and has recently attracted noteworthy attention from researchers, particularly in the applied domain of symbolic regression [2]. Let X={x1,x2,,xn} be the set of input data, or fitness cases, of a symbolic regression problem, and t=[t1,t2,,tn] the vector of the respective expected output or target values (in other words, for each i  = 1, 2, …, n , t i is the expected output corresponding to input xi). A GP individual (or program) P can be seen as a function that for each input vector xi returns the scalar value P(xi). Following [3], we call the semantics of P the vector sP=[P(x1),P(x2),,P(xn)]. This vector can be represented as a point in an n -dimensional space, which we call a s emantic space, that can be counterpoised to the syntactic or genotypic space, where individuals are represented by programs. While each program has one and only one semantics, the mapping between the semantic space and the genotypic space is generally not a bijection because several programs can have the same semantics. The target vector t itself is a point in the semantic space and, in general, the objective of GP is to find at least one program in the genotypic space that maps into t in the semantic space.

Several different methods to exploit semantic awareness in GP have so far been presented. For relatively complete surveys, the reader is referred to [[4], [5], [6]]. The work presented here proposes a new idea for exploiting semantic awareness in GP: semantics-based equivalence classes. Our concept of an equivalence class is such that, once an individual in a class is found, it must be possible, and easy, to generate all the other individuals in that class. In this way, if we are able to find one solution that is in the same equivalence class as a globally optimal solution, then we are able to solve the problem by reconstructing the global optimum analytically. Two possible realizations of this idea are proposed in this paper. In the first, two individuals P 1 and P 2 are considered in the same equivalence class if sP1=k+sP2. In such a situation, we are identifying a collection of affine subspaces in the semantic space. In this collection, each subspace is a straight line (or hyperplane, if we think in n dimensions) and all the lines belonging to this collection are parallel to each other. In a system like this, it makes sense to use the variance of the difference between the coordinates of the semantic vector and the target as fitness. When this variance is equal to zero, the semantic vector of the individual is in the same equivalence class as the target, and the search can terminate, analytically reconstructing a globally optimal solution. In the second proposed system, two individuals P 1 and P 2 are in the same equivalence class if sP1=ksP2, where k R is a constant such that k ≠0. In such a situation, we are defining a projective space (i.e. the space of all the straight lines – or hyperplanes, if we think in n dimensions – intersecting the origin) in the semantic space. The objective of GP then is to find any solution whose semantics is directly proportional to the target t (i.e. that stands on the same straight line as the one intersecting the target and the origin). It therefore makes sense to define a new fitness function, corresponding to the variance of the ratios between the coordinates of the semantics of an individual and t. When this variance equals zero, the individual is in the same equivalence class as the target, and the search can terminate, analytically reconstructing a globally optimal solution. The first of these two systems is called GPPLUS (given that addition is the operator that allows us to reconstruct the target), while the second one is called GPMUL (given that the target can be reconstructed using multiplication). Moreover, we introduce two new GP systems that are similar to GPPLUS and GPMUL, but in which a filter that allows us to reject individuals is applied. Specifically, a newly generated individual is rejected if another individual belonging to the same equivalence class already exists in the population. These “filtered” versions are called FGPPLUS and FGPMUL, respectively. The performance of GPPLUS, GPMUL, F GPPLUS, and FGPMUL is compared to one of the best known state-of-the-art semantics-based GP systems: Geometric Semantic GP (GSGP), introduced in Ref. [3] and described in detail in Ref. [7]. As test cases, we choose seven complex real-life symbolic regression problems, from as many different applied domains. Finally, in order to show the appropriateness of semantic filters also on systems that are not directly based on the idea of semantic equivalence classes, we apply filters to one of the most used GP techniques: linear scaling [8,9]. For this reason, linear scaling without filters (called LS) and linear scaling with filters (called FLS) are also included in the comparative study.

The paper is structured as follows: Section 2 discusses previous and related works. In Section 3, we present the general concept of a semantics-based equivalence class and we define GPPLUS, GPMUL, FGPPLUS, and FGPMUL. Section 4 presents the experimental study. Specifically, the experimental settings and test problems are presented and, subsequently, the obtained results are reported and discussed. Finally, Section 5 concludes the paper, by also discussing ideas for future research.

Section snippets

Previous and related work

The definition of semantics-based GP systems is a hot topic in the field of evolutionary computation and several works that have appeared so far can be considered as relating to this paper or having inspired the work presented here. In Ref. [3], Moraglio and co-authors introduce genetic operators that have a direct effect on the semantics of the offspring. GP using these operators was called Geometric Semantic Genetic Programming (GSGP). These operators induce a unimodal fitness landscape for

Methodology

Our objective is to use equivalence classes (EQCs) to create partitions of the solution space and to explore the space of EQCs instead of the space of single solutions. To achieve this objective, we start by defining equivalence relationships using criteria based on semantics. We introduce a relationship that we call equivalence function (EF ). EF receives two semantics as arguments, and returns a vector of the same cardinality:EF:Rn×RnRnWe say that two individuals are equivalent (i.e. they

Systems and test problems

The objective of the experimental study is to compare the methods GPPLUS, GPMUL, LS and their filtered variants FGPPLUS, FGPMUL, and FLS. To ensure a better understanding of the quality of the performance of these systems, we also consider the performance obtained by geometric semantic genetic programming (GSGP) [3]. We also decided not to report the results achieved by standard GP because a preliminary study indicated that standard GP was consistently outperformed by all the presented methods

Conclusions and future work

In this paper, the idea of semantics-based equivalence classes for Genetic Programming (GP) was presented. This idea is general, and it can be implemented in several different ways. In this paper, it was implemented by means of two simple GP systems, called GPPLUS and GPMUL. Each of these systems uses a different definition of equivalence: GPPLUS identifies a collection of affine subspaces in the semantic space, while GPMUL defines a projective space in the semantic space. Moreover, filtered

References (30)

  • M. Keijzer

    Improving symbolic regression with interval arithmetic and linear scaling

  • M. Keijzer

    Scaled symbolic regression

    Genet. Program. Evolvable Mach.

    (2004)
  • L. Vanneschi et al.

    A new implementation of geometric semantic GP and its application to problems in pharmacokinetics

  • I. Gonçalves, S. Silva, C. M. Fonseca, On the Generalization Ability of Geometric Semantic Genetic Programming,...
  • M. Graff et al.

    Semantic genetic programming operators based on projections in the phenotype space

    Res. Comput. Sci.

    (2015)
  • Cited by (9)

    • Optimizing genetic programming by exploiting semantic impact of sub trees

      2021, Swarm and Evolutionary Computation
      Citation Excerpt :

      This resulted in reducing the code bloat without adversely affecting the semantics of the tree. Ruberto et al. [22]. introduced the concept of equivalence classes in genetic programming to solve symbolic regression problems.

    • A semantic genetic programming framework based on dynamic targets

      2021, Genetic Programming and Evolvable Machines
    • SGP-DT: Towards effective symbolic regression with a semantic GP approach based on dynamic targets

      2020, GECCO 2020 Companion - Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
    View all citing articles on Scopus
    View full text