Elsevier

Applied Soft Computing

Volume 115, January 2022, 108143
Applied Soft Computing

Semantics in Multi-objective Genetic Programming

https://doi.org/10.1016/j.asoc.2021.108143Get rights and content

Highlights

  • Developed a new semantic-distance based approach for Multi-objective GP named SDO.

  • Show how this new method naturally promotes semantic diversity.

  • Results are significantly better compared to canonical EMO approaches (NSGA-II and SPEA2).

  • New method outperforms two other semantic based methods.

Abstract

Semantics has become a key topic of research in Genetic Programming (GP). Semantics refers to the outputs (behaviour) of a GP individual when this is run on a dataset. The majority of works that focus on semantic diversity in single-objective GP indicates that it is highly beneficial in evolutionary search. Surprisingly, there is minuscule research conducted in semantics in Multi-objective GP (MOGP). In this work we make a leap beyond our understanding of semantics in MOGP and propose SDO: Semantic-based Distance as an additional criteriOn. This naturally encourages semantic diversity in MOGP. To do so, we find a pivot in the less dense region of the first Pareto front (most promising front). This is then used to compute a distance between the pivot and every individual in the population. The resulting distance is then used as an additional criterion to be optimised to favour semantic diversity. We also use two other semantic-based methods as baselines, called Semantic Similarity-based Crossover and Semantic-based Crowding Distance. Furthermore, we also use the Non-dominated Sorting Genetic Algorithm II and the Strength Pareto Evolutionary Algorithm 2 for comparison too. We use highly unbalanced binary classification problems and consistently show how our proposed SDO approach produces more non-dominated solutions and better diversity, leading to better statistically significant results, using the hypervolume results as evaluation measure, compared to the rest of the other four methods.

Introduction

Genetic Programming [1], one of the four canonical Evolutionary Algorithms paradigms, was popularised by Koza in the early 1990s. Over the years, researchers have been interested in making GP more amenable to evolutionary search. A key element that has been proven to make GP more robust is semantics. The latter has become a key topic of research in GP. Semantics can be seen as the behaviour of a GP program. This behaviour is the output of a GP program when executed on a set of fitness cases.

The number of scientific publications in GP semantics has increased significantly thanks to promising results found by the research community. We discuss in Section 2 some relevant works of semantics in GP. Interestingly, the vast majority of these work have concentrated on Single-objective GP (SOGP), with minuscule progress in Multi-objective GP (MOGP), with the exception of  [2], [3], [4], [5]. Thus, this scientific work extends significantly this line of research and uses three forms of semantics in a MOGP setting. Each of these are compared independently against two well-established Evolutionary Multi-objective Optimisation (EMO) approaches: the Non-dominated Sorting Genetic Algorithm II (NSGA-II) [6] and the Strength Pareto Evolutionary Algorithm (SPEA2) [7]. The semantic-based MOGP approaches used in this study are:

    Semantic Similarity-based Crossover. (SSC).

    This is motivated by the SOGP approach presented in [8]. This approach was one of the early methods in SOGP semantics where the authors were able to promote it in continuous search spaces. We extended this well-known method in MOGP.

    Semantic-based Crowding Distance. (SCD).

    Here, the main idea is to replace the crowding distance, commonly used in EMO algorithms, by a semantic-based distance, originally studied in the first author’s MOGP works [3], [4].

    Semantic-based Distance as an additional criteriOn. (SDO).

    This approach draws from SCD and uses the resulting semantic distance as another component to optimise by an EMO algorithm, briefly studied in [5].

Using these three semantic-based methods allow us to show the following:

    Firstly,

    by using SSC, we show how the semantic distance computed in the crossover operator and used to successfully promote semantic diversity in single-objective GP does not have the same positive impact in MOGP.

    Secondly,

    by using SCD, inspired by the crowding distance commonly used in EMO, we show that a semantic distance can be naturally computed between every GP tree in the population and a “pivot”. The latter is an individual in the sparsest region of the first Pareto front. SCD then moves away from SSC, which tries to promote semantic diversity by forcing diversity to emerge by using crossover repeatedly, as proposed in [8].

    Finally,

    we build from our understanding drawn from SSC and SCD in semantics and propose a robust mechanism for the emergence of semantic diversity in MOGP. Particularly, we use the semantic distance value as an additional indicator to evolve the population. This naturally promotes semantic diversity in MOGP leading to better, statistically significant results based on the average hyper-volume of the evolved Pareto approximations with respect to the other four methods (two semantic-based methods and two EMO methods) in a range of highly imbalanced datasets.

In our previous work [5], we carried out an initial limited study on semantics in Multi-objective Genetic Programming (MOGP). Specifically, we initially proposed and used three semantic-based methods, named Semantic Similarity-based Crossover (SSC), Semantic-based Distance as an additional criteriOn (SDO) and Pivot Similarity Semantic-based Distance as an additional criteriOn (PSDO).

The main conclusions from our initial investigation is that the use of a semantic-based distance value as computed in either SDO or PSDO to be used as another objective to be optimised in an EMO setting is robust enough to outperform the results yield by the well-known NSGA-II and SPEA2 approaches. Furthermore, in our initial research, we found out that the distance computed from a pivot, which is the furthest point in the search space, to every individual in the population and used as an additional criterion to be optimised in a EMO setting tends to improve the performance of our semantic-based approaches. Moreover, we were able to fine-tune how this distance can be computed to significantly improve the evolutionary search. This is attained using the SDO approach, which is used again in this work. It is, however, worth saying that these conclusions were drawn from an initial limited study including a restricted statistical analysis impeding drawing general conclusions, limited results as well as a lack of explanation that help us to clearly indicate why SDO yields better results compared to their respective canonical methods as well as the other two semantic-based approaches. Furthermore, in our initial study [5], we omitted to discuss the limitations of SDO.

In this work, we have addressed all these issues. More specifically, the main contributions of this scientific study are as follows:

  • We consistently show how Semantic Similarity-based Crossover (SSC) used in single-objective GP and widely reported to be beneficial in GP does not have the same positive impact in a multi-objective GP (MOGP) setting.

  • From this, we show how a semantic-based distance approach can enhance the evolutionary search in MOGP. To this end we use two semantic-based approaches: Semantic-based Crowding Distance (SCD) and a Semantic-base Distance as an additional criteriOn (SDO).

  • We demonstrate how SDO yields better results against all the approaches used in this work, including the semantic-based methods and canonical EMO approaches.

  • Another major contribution of this scientific study is to include detailed results using two well-established EMO approaches NSGA-II and SPEA2. By doing so, as opposed to the limited results reported in [5], we are now in a position to draw sound conclusions by carrying out a systematic statistical analysis, explained in detail in Section 6.

  • Another important contribution in this work is that we are able to explain why the semantic-based technique employed in SDO tends to improve evolutionary search. We do so by extensively analysing the behaviour of the SDO in terms of number of unique solutions, duplicate frequency of solutions over generations, etc.

  • Finally, another major contribution of this work is the discussion of the limitations of this work by using a Multi-objective Evolutionary Algorithm Based on Decomposition  [9].

This work is organised and presented as follows. Relevant studies to this work are presented in Section 2. The fundamental background in semantics and in MOPG is discussed in Section 3. Section 4 presents the MOGP semantic methods proposed and used in this work. The setup of experiments is presented in Section 5. Section 6 presents in detail the results yield by all the MOGP semantic approaches (SSC, SDO and SCD) and by the EMO methods (NSGA-II and SPEA2). It also offers an explanation as to why SDO finds better results compared to all the other algorithms. Section ?? discusses the limitations of SDO in Multi-Objective Evolutionary Algorithms based on Decomposition. In Section 8, we draw some conclusions.

Section snippets

Semantics

Semantics has become a key topic of research in GP and multiple definitions have been proposed. Semantics can be seen as the behaviour (recorded outputs over a dataset) of a GP program. We give a formal definition of semantics in Section 3. Research in semantics in GP has grown substantially in the last decade as a consequence of the research community reporting better results when semantics has been promoted in evolutionary search as compared to those GP approaches that do not promote it

Background

This section defines some of the basic concepts relevant to this work, namely semantics, MO and EMO algorithms.

Semantic-based MOGP methods

Next, we present the semantic-based approaches employed in this work that are incorporated into the baseline MOGP algorithms, namely NSGA-II and SPEA-2.

Experimental setup

The use of benchmark problems has allowed the research community to test, validate and explain a plethora of evolutionary algorithms. In this work, we also adopt well-known, robust and tested benchmark problems used in other studies [5], [28], [36] that will allow us to (i) test the algorithms used in this work, (ii) to use well-defined metrics that allow us to compare one method against another one, (iii) to allow us to explain why one particular method behaves better than others, (iv) to draw

Results and analysis

Using MOEA/D

To highlight some of the limitations of the SDO approach we also look at a decomposition approach known as Multi-Objective Evolutionary Algorithm with Decomposition (MOEA/D) [9] . With MOEA/D we decompose the optimisation problem into a set of scalar optimisation problems. A scalar optimisation function g, along with a uniform distribution of weight vectors λi are used to define each sub-problem. It is important to note that each sub-problem relies only on neighbouring sub-problems for

Conclusions

This work proposes a new approach, named Semantic-based Distance as an additional criteriOn (SDO), which consists of using semantic distance values as another criterion to optimise and preferences solutions that are semantically attracted to the sparsest region of the first approximated Pareto front. We also use this distance in lieu of the crowding distance at the heart of the aforementioned EMO algorithm. Results for the new approach were tested against the canonical frameworks of NSGA-II and

CRediT authorship contribution statement

Edgar Galván: Conceptualization, Supervision, Methodology, Software, Visualisation, Writing – original draft, Validation, Writing – review & editing. Leonardo Trujillo: Writing – review & editing. Fergal Stapleton: Writing – review & editing, Software, Visualisation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ireland under Grant number 18/CRT/6049. The opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Science Foundation Ireland. The authors wish to acknowledge the DJEI/DES/SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support. We would

References (41)

  • ZhaoH.

    A multi-objective genetic programming approach to developing Pareto optimal decision trees

    Decis. Support Syst.

    (2007)
  • KozaJ.R.

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1992)
  • GalvánE. et al.

    Promoting semantic diversity in multi-objective genetic programming

  • Galván-LópezE. et al.

    On the use of semantics in multi-objective genetic programming

  • Galván-LópezE. et al.

    Stochastic semantic-based multi-objective genetic programming optimisation for classification of imbalanced data

  • GalvánE. et al.

    Semantic-based distance approaches in multi-objective genetic programming

  • DebK. et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Trans. Evol. Comput.

    (2002)
  • E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the Strength Pareto Evolutionary Algorithm, Tech. rep.,...
  • UyN.Q. et al.

    Semantically-based crossover in genetic programming: application to real-valued symbolic regression

    Genet. Program. Evol. Mach.

    (2011)
  • ZhangQ. et al.

    MOEA/D: A multiobjective evolutionary algorithm based on decomposition

    IEEE Trans. Evol. Comput.

    (2007)
  • MoraglioA. et al.

    Geometric semantic genetic programming

  • McPheeN.F. et al.

    Semantic building blocks in genetic programming

  • BeadleL. et al.

    Semantically driven crossover in genetic programming

  • BeadleL. et al.

    Semantically driven mutation in genetic programming

  • NguyenQ.U. et al.

    Semantic aware crossover for genetic programming: The case for real-valued function regression

  • ForstenlechnerS. et al.

    Towards effective semantic operators for program synthesis in genetic programming

  • DouT. et al.

    Comparison of semantic-based local search methods for multiobjective genetic programming

    Genet. Program. Evol. Mach.

    (2018)
  • MoraglioA. et al.

    Topological interpretation of crossover

  • Galván-LópezE. et al.

    An empirical investigation of how and why neutrality affects evolutionary search

  • Galván-LópezE. et al.

    Neutrality in evolutionary algorithms... What do we know?

    Evol. Syst.

    (2011)
  • Cited by (18)

    View all citing articles on Scopus
    View full text