Semantics in Multi-objective Genetic Programming

doi:10.1016/j.asoc.2021.108143

Applied Soft Computing

Volume 115, January 2022, 108143

https://doi.org/10.1016/j.asoc.2021.108143 Get rights and content

Highlights

•
Developed a new semantic-distance based approach for Multi-objective GP named SDO.
•
Show how this new method naturally promotes semantic diversity.
•
Results are significantly better compared to canonical EMO approaches (NSGA-II and SPEA2).
•
New method outperforms two other semantic based methods.

Abstract

Semantics has become a key topic of research in Genetic Programming (GP). Semantics refers to the outputs (behaviour) of a GP individual when this is run on a dataset. The majority of works that focus on semantic diversity in single-objective GP indicates that it is highly beneficial in evolutionary search. Surprisingly, there is minuscule research conducted in semantics in Multi-objective GP (MOGP). In this work we make a leap beyond our understanding of semantics in MOGP and propose SDO: Semantic-based Distance as an additional criteriOn. This naturally encourages semantic diversity in MOGP. To do so, we find a pivot in the less dense region of the first Pareto front (most promising front). This is then used to compute a distance between the pivot and every individual in the population. The resulting distance is then used as an additional criterion to be optimised to favour semantic diversity. We also use two other semantic-based methods as baselines, called Semantic Similarity-based Crossover and Semantic-based Crowding Distance. Furthermore, we also use the Non-dominated Sorting Genetic Algorithm II and the Strength Pareto Evolutionary Algorithm 2 for comparison too. We use highly unbalanced binary classification problems and consistently show how our proposed SDO approach produces more non-dominated solutions and better diversity, leading to better statistically significant results, using the hypervolume results as evaluation measure, compared to the rest of the other four methods.

Introduction

Genetic Programming [1], one of the four canonical Evolutionary Algorithms paradigms, was popularised by Koza in the early 1990s. Over the years, researchers have been interested in making GP more amenable to evolutionary search. A key element that has been proven to make GP more robust is semantics. The latter has become a key topic of research in GP. Semantics can be seen as the behaviour of a GP program. This behaviour is the output of a GP program when executed on a set of fitness cases.

The number of scientific publications in GP semantics has increased significantly thanks to promising results found by the research community. We discuss in Section 2 some relevant works of semantics in GP. Interestingly, the vast majority of these work have concentrated on Single-objective GP (SOGP), with minuscule progress in Multi-objective GP (MOGP), with the exception of [2], [3], [4], [5]. Thus, this scientific work extends significantly this line of research and uses three forms of semantics in a MOGP setting. Each of these are compared independently against two well-established Evolutionary Multi-objective Optimisation (EMO) approaches: the Non-dominated Sorting Genetic Algorithm II (NSGA-II) [6] and the Strength Pareto Evolutionary Algorithm (SPEA2) [7]. The semantic-based MOGP approaches used in this study are:

Semantic Similarity-based Crossover. (SSC).

This is motivated by the SOGP approach presented in [8]. This approach was one of the early methods in SOGP semantics where the authors were able to promote it in continuous search spaces. We extended this well-known method in MOGP.

Semantic-based Crowding Distance. (SCD).

Here, the main idea is to replace the crowding distance, commonly used in EMO algorithms, by a semantic-based distance, originally studied in the first author’s MOGP works [3], [4].

Semantic-based Distance as an additional criteriOn. (SDO).

This approach draws from SCD and uses the resulting semantic distance as another component to optimise by an EMO algorithm, briefly studied in [5].

Using these three semantic-based methods allow us to show the following:

Firstly,

by using SSC, we show how the semantic distance computed in the crossover operator and used to successfully promote semantic diversity in single-objective GP does not have the same positive impact in MOGP.

Secondly,

by using SCD, inspired by the crowding distance commonly used in EMO, we show that a semantic distance can be naturally computed between every GP tree in the population and a “pivot”. The latter is an individual in the sparsest region of the first Pareto front. SCD then moves away from SSC, which tries to promote semantic diversity by forcing diversity to emerge by using crossover repeatedly, as proposed in [8].

Finally,

we build from our understanding drawn from SSC and SCD in semantics and propose a robust mechanism for the emergence of semantic diversity in MOGP. Particularly, we use the semantic distance value as an additional indicator to evolve the population. This naturally promotes semantic diversity in MOGP leading to better, statistically significant results based on the average hyper-volume of the evolved Pareto approximations with respect to the other four methods (two semantic-based methods and two EMO methods) in a range of highly imbalanced datasets.

In our previous work [5], we carried out an initial limited study on semantics in Multi-objective Genetic Programming (MOGP). Specifically, we initially proposed and used three semantic-based methods, named Semantic Similarity-based Crossover (SSC), Semantic-based Distance as an additional criteriOn (SDO) and Pivot Similarity Semantic-based Distance as an additional criteriOn (PSDO).

The main conclusions from our initial investigation is that the use of a semantic-based distance value as computed in either SDO or PSDO to be used as another objective to be optimised in an EMO setting is robust enough to outperform the results yield by the well-known NSGA-II and SPEA2 approaches. Furthermore, in our initial research, we found out that the distance computed from a pivot, which is the furthest point in the search space, to every individual in the population and used as an additional criterion to be optimised in a EMO setting tends to improve the performance of our semantic-based approaches. Moreover, we were able to fine-tune how this distance can be computed to significantly improve the evolutionary search. This is attained using the SDO approach, which is used again in this work. It is, however, worth saying that these conclusions were drawn from an initial limited study including a restricted statistical analysis impeding drawing general conclusions, limited results as well as a lack of explanation that help us to clearly indicate why SDO yields better results compared to their respective canonical methods as well as the other two semantic-based approaches. Furthermore, in our initial study [5], we omitted to discuss the limitations of SDO.

In this work, we have addressed all these issues. More specifically, the main contributions of this scientific study are as follows:

•
We consistently show how Semantic Similarity-based Crossover (SSC) used in single-objective GP and widely reported to be beneficial in GP does not have the same positive impact in a multi-objective GP (MOGP) setting.
•
From this, we show how a semantic-based distance approach can enhance the evolutionary search in MOGP. To this end we use two semantic-based approaches: Semantic-based Crowding Distance (SCD) and a Semantic-base Distance as an additional criteriOn (SDO).
•
We demonstrate how SDO yields better results against all the approaches used in this work, including the semantic-based methods and canonical EMO approaches.
•
Another major contribution of this scientific study is to include detailed results using two well-established EMO approaches NSGA-II and SPEA2. By doing so, as opposed to the limited results reported in [5], we are now in a position to draw sound conclusions by carrying out a systematic statistical analysis, explained in detail in Section 6.
•
Another important contribution in this work is that we are able to explain why the semantic-based technique employed in SDO tends to improve evolutionary search. We do so by extensively analysing the behaviour of the SDO in terms of number of unique solutions, duplicate frequency of solutions over generations, etc.
•
Finally, another major contribution of this work is the discussion of the limitations of this work by using a Multi-objective Evolutionary Algorithm Based on Decomposition [9].

This work is organised and presented as follows. Relevant studies to this work are presented in Section 2. The fundamental background in semantics and in MOPG is discussed in Section 3. Section 4 presents the MOGP semantic methods proposed and used in this work. The setup of experiments is presented in Section 5. Section 6 presents in detail the results yield by all the MOGP semantic approaches (SSC, SDO and SCD) and by the EMO methods (NSGA-II and SPEA2). It also offers an explanation as to why SDO finds better results compared to all the other algorithms. Section ?? discusses the limitations of SDO in Multi-Objective Evolutionary Algorithms based on Decomposition. In Section 8, we draw some conclusions.

Section snippets

Semantics

Semantics has become a key topic of research in GP and multiple definitions have been proposed. Semantics can be seen as the behaviour (recorded outputs over a dataset) of a GP program. We give a formal definition of semantics in Section 3. Research in semantics in GP has grown substantially in the last decade as a consequence of the research community reporting better results when semantics has been promoted in evolutionary search as compared to those GP approaches that do not promote it

Background

This section defines some of the basic concepts relevant to this work, namely semantics, MO and EMO algorithms.

Semantic-based MOGP methods

Next, we present the semantic-based approaches employed in this work that are incorporated into the baseline MOGP algorithms, namely NSGA-II and SPEA-2.

Experimental setup

The use of benchmark problems has allowed the research community to test, validate and explain a plethora of evolutionary algorithms. In this work, we also adopt well-known, robust and tested benchmark problems used in other studies [5], [28], [36] that will allow us to (i) test the algorithms used in this work, (ii) to use well-defined metrics that allow us to compare one method against another one, (iii) to allow us to explain why one particular method behaves better than others, (iv) to draw

Results and analysis

Using MOEA/D

To highlight some of the limitations of the SDO approach we also look at a decomposition approach known as Multi-Objective Evolutionary Algorithm with Decomposition (MOEA/D) [9] . With MOEA/D we decompose the optimisation problem into a set of scalar optimisation problems. A scalar optimisation function $g$ , along with a uniform distribution of weight vectors $λ_{i}$ are used to define each sub-problem. It is important to note that each sub-problem relies only on neighbouring sub-problems for

Conclusions

This work proposes a new approach, named Semantic-based Distance as an additional criteriOn (SDO), which consists of using semantic distance values as another criterion to optimise and preferences solutions that are semantically attracted to the sparsest region of the first approximated Pareto front. We also use this distance in lieu of the crowding distance at the heart of the aforementioned EMO algorithm. Results for the new approach were tested against the canonical frameworks of NSGA-II and

CRediT authorship contribution statement

Edgar Galván: Conceptualization, Supervision, Methodology, Software, Visualisation, Writing – original draft, Validation, Writing – review & editing. Leonardo Trujillo: Writing – review & editing. Fergal Stapleton: Writing – review & editing, Software, Visualisation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ireland under Grant number 18/CRT/6049. The opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Science Foundation Ireland. The authors wish to acknowledge the DJEI/DES/SFI/HEA Irish Centre for High-End Computing (ICHEC) for the provision of computational facilities and support. We would

References (41)

ZhaoH.
A multi-objective genetic programming approach to developing Pareto optimal decision trees
Decis. Support Syst.
(2007)
KozaJ.R.
Genetic Programming: On the Programming of Computers by Means of Natural Selection
(1992)
GalvánE. et al.
Promoting semantic diversity in multi-objective genetic programming
Galván-LópezE. et al.
On the use of semantics in multi-objective genetic programming
Galván-LópezE. et al.
Stochastic semantic-based multi-objective genetic programming optimisation for classification of imbalanced data
GalvánE. et al.
Semantic-based distance approaches in multi-objective genetic programming
DebK. et al.
A fast and elitist multiobjective genetic algorithm: NSGA-II
IEEE Trans. Evol. Comput.
(2002)
E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the Strength Pareto Evolutionary Algorithm, Tech. rep.,...
UyN.Q. et al.
Semantically-based crossover in genetic programming: application to real-valued symbolic regression
Genet. Program. Evol. Mach.
(2011)
ZhangQ. et al.
MOEA/D: A multiobjective evolutionary algorithm based on decomposition
IEEE Trans. Evol. Comput.
(2007)

MoraglioA. et al.

Geometric semantic genetic programming

McPheeN.F. et al.

Semantic building blocks in genetic programming

BeadleL. et al.

Semantically driven crossover in genetic programming

BeadleL. et al.

Semantically driven mutation in genetic programming

NguyenQ.U. et al.

Semantic aware crossover for genetic programming: The case for real-valued function regression

ForstenlechnerS. et al.

Towards effective semantic operators for program synthesis in genetic programming

DouT. et al.

Comparison of semantic-based local search methods for multiobjective genetic programming

Genet. Program. Evol. Mach.

(2018)

MoraglioA. et al.

Topological interpretation of crossover

Galván-LópezE. et al.

An empirical investigation of how and why neutrality affects evolutionary search

Galván-LópezE. et al.

Neutrality in evolutionary algorithms... What do we know?

Evol. Syst.

(2011)

Cited by (18)

Data-driven modeling to predict adsorption of hydrogen on shale kerogen: Implication for underground hydrogen storage
2023, International Journal of Coal Geology
The interaction of hydrogen in shale gas formations holds significant interest for long-term subsurface hydrogen storage. Accurately and rapidly predicting hydrogen adsorption in these formations is crucial for assessing underground hydrogen storage potential. Many laboratory experiments and molecular simulations have been conducted to determine hydrogen adsorption. However, laboratory experiments and molecular simulations require complex setups and extensive calculations, which can be time-consuming. Consequently, end-users may prefer quick and accurate prediction of hydrogen adsorption to reduce the experimental and computational burden. This study introduces a novel model for predicting hydrogen adsorption using gradient boosting regression and available molecular simulation data from the literature. The data-driven model predicts hydrogen adsorption on kerogen structures based on pressure, temperature, adsorbed methane, hydrogen-to‑carbon ratio, oxygen-to‑carbon ratio, and kerogen density. We compared gradient-boosting regression with other machine learning tools, including artificial neural networks, symbolic regression assisted with genetic programming, decision trees, and random forests in terms of their capability to predict H₂ adsorption on shale kerogen. A simple mathematical equation based on symbolic regression via genetic programming has also been provided, with training and testing coefficients of determination of 88.4% and 85.8%, respectively. However, the digital model created using gradient boosting regression outperformed all other machine learning tools, achieving a coefficient of determination of 99.6% for training data and 94.6% for testing data. A sensitivity analysis was also conducted that demonstrates the robustness of the developed model. In the case of kerogen type A, the order of increasing hydrogen adsorption is KIA < KIIA<KIIIA. Conversely, for kerogen type B, the trend is KIIA<KIIC<KIIB<KIID in terms of increasing hydrogen adsorption. This developed digital model offers higher prediction accuracy and finds applications in storing hydrogen in shale gas formations. The proposed model offers a substantial time-saving advantage in predicting hydrogen adsorption compared to laborious and time-consuming laboratory experiments and/or molecular simulations.
Evolutionary Multi-objective Optimisation in Neurotrajectory Prediction
2023, Applied Soft Computing
Machine learning has rapidly evolved during the last decade, achieving expert human performance on notoriously challenging problems such as image classification. This success is partly due to the re-emergence of bio-inspired modern artificial neural networks (ANNs) along with the availability of computation power, vast labelled data and ingenious human-based expert knowledge as well as optimisation approaches that can find the correct configuration (and weights) for these networks. Neuroevolution is a term used for the latter when employing evolutionary algorithms. Most of the works in neuroevolution have focused their attention in a single type of ANNs, named Convolutional Neural Networks (CNNs). Moreover, most of these works have used a single optimisation approach. This work makes a progressive step forward in neuroevolution for vehicle trajectory prediction, referred to as neurotrajectory prediction, where multiple objectives must be considered. To this end, rich ANNs composed of CNNs and Long-short Term Memory Network are adopted. Two well-known and robust Evolutionary Multi-objective Optimisation (EMO) algorithms, named Non-dominated Sorting Genetic Algorithm-II (NSGA-II) and Multi-Objective Evolutionary Algorithm based on Decomposition (MOEA/D) are also adopted. The completely different underlying mechanism of each of these algorithms sheds light on the implications of using one over the other EMO approach in neurotrajectory prediction. In particular, the importance of considering objective scaling is highlighted, finding that MOEA/D can be more adept at focusing on specific objectives whereas, NSGA-II tends to be more invariant to objective scaling. Additionally, certain objectives are shown to be either beneficial or detrimental to finding valid models, for instance, inclusion of a distance feedback objective was considerably detrimental to finding valid models, while a lateral velocity objective was more beneficial.
NeuroLGP-SM: A Surrogate-assisted Neuroevolution Approach using Linear Genetic Programming
2024, arXiv
An Analysis on the Effects of Evolving the Monte Carlo Tree Search Upper Confidence for Trees Selection Policy on Unimodal, Multimodal and Deceptive Landscapes
2024, SSRN
A Parallel Genetic Algorithm for Multi-Criteria Path Routing on Complex Real-World Road Networks
2023, SSRN
Construction of a semantic distance for inferring structure of the variability between 19<sup>th</sup> century Rosa cultivars
2023, Acta Horticulturae

View all citing articles on Scopus

View full text

Semantics in Multi-objective Genetic Programming

Highlights

Abstract

Introduction

Section snippets

Semantics

Background

Semantic-based MOGP methods

Experimental setup

Results and analysis

Using MOEA/D

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgements

Decis. Support Syst.

Genetic Programming: On the Programming of Computers by Means of Natural Selection

Promoting semantic diversity in multi-objective genetic programming

On the use of semantics in multi-objective genetic programming

Stochastic semantic-based multi-objective genetic programming optimisation for classification of imbalanced data

Semantic-based distance approaches in multi-objective genetic programming

A fast and elitist multiobjective genetic algorithm: NSGA-II

IEEE Trans. Evol. Comput.

Semantically-based crossover in genetic programming: application to real-valued symbolic regression

Genet. Program. Evol. Mach.

MOEA/D: A multiobjective evolutionary algorithm based on decomposition

IEEE Trans. Evol. Comput.

Geometric semantic genetic programming

Semantic building blocks in genetic programming

Semantically driven crossover in genetic programming

Semantically driven mutation in genetic programming

Semantic aware crossover for genetic programming: The case for real-valued function regression

Towards effective semantic operators for program synthesis in genetic programming

Comparison of semantic-based local search methods for multiobjective genetic programming

Genet. Program. Evol. Mach.

Topological interpretation of crossover

An empirical investigation of how and why neutrality affects evolutionary search

Neutrality in evolutionary algorithms... What do we know?

Evol. Syst.