Soft target and functional complexity reduction: A hybrid regularization method for genetic programming

https://doi.org/10.1016/j.eswa.2021.114929Get rights and content

Highlights

  • Definition of soft target regularization for genetic programming (GP).

  • Definition of a novel measure of functional complexity for GP individuals.

  • Reduction of overfitting by using the soft regularization and complexity measure.

  • Definition of a hybrid system that removes overfitting in all the benchmarks.

  • Study of the relation between generalization, complexity, and individuals’ size.

Abstract

Regularization is frequently used in supervised machine learning to prevent models from overfitting. This paper tackles the problem of regularization in genetic programming. We apply, for the first time, soft target regularization, a method recently defined for artificial neural networks, to genetic programming. Also, we introduce a novel measure of functional complexity of the genetic programming individuals, aimed at quantifying their degree of curvature. We experimentally demonstrate that both the use of soft target regularization, and the minimization of the complexity during learning, are often able to reduce overfitting, but they are never able to eliminate it. On the other hand, we demonstrate that the integration of these two strategies into a novel hybrid genetic programming system can completely eliminate overfitting, for all the studied test cases. Last but not least, consistently with what found in the literature, we offer experimental evidence of the fact that the size of the genetic programming models has no correlation with their generalization ability.

Introduction

One of the most crucial aspects when training a supervised machine learning model is to avoid overfitting (N. A. of Sciences, 2018). Regularization (Zou and Hastie, 2005, Buhlmann and van de Geer, 2011, Kukacka et al., 2018) is one of the most popular sets of techniques used to counteract overfitting. It has been known for decades and it is gaining more and more popularity, thanks to the rapid increase in the amount of available data. In possibly its more general definition, regularization can be intended as an attempt to correct for model overfitting by introducing additional information to the cost function. Many machine learning methods work by generating models that are combinations, of a predefined form, of the input features. Those combinations are generally characterized by coefficients, or weights, and the learning phase generally consists in optimizing the weights to improve the model’s accuracy. It is the case, for instance, of linear regression, where the model is a linear combination of the features and learning consists in looking for the most appropriate coefficients for that combination, or artificial neural networks, where the outputs of the units that form the network are combined through weighted connections (synapsis) and learning consists in optimizing the weights. In those cases, regularization often consists in constraining, or shrinking, the coefficients towards zero. This discourages learning complex models, thus limiting overfitting. Typical regularization algorithms of this type are Lasso, Ridge and Elastic Net (Zou & Hastie, 2005), that are becoming more and more popular, thanks to the rapid diffusion of deep neural networks (Kukacka et al., 2018).

Genetic Programming (GP) (Koza, 1992) generates models using predefined sets of primitive functions F, suitable for the problem at hand, and terminal symbols T, which typically contain the input features. In principle, any combination of symbols in F and T can be generated by GP. GP works by initializing a population of models at random (i.e. generating a set of syntactically correct random combinations of symbols from F and T), and breading the population, by probabilistically selecting the best models and applying stochastic operators, like crossover and mutation, that randomly modify their syntax. Selection is generally guided by a cost function, called fitness, and is generally independent from the structure of the models. With this in mind, we can state that GP is, in principle, able to generate models of any shape, and no predefined form of the final model can be assumed in general. The combination of the features in the final model is implicit, and there are no explicit coefficients to optimize. For this reason, methods like Lasso, Ridge and Elastic Net are not appropriate for GP, unless we decide to constrain the individuals evolved by GP to some predefined forms, like for instance in (Icke and Bongard, 2013, Alonso et al., 2016, Montaña et al., 2016). Indeed, previous studies of regularization in GP rarely include these traditional methods, and instead typically rely on other ways to minimize the complexity of the model. This is often obtained by optimizing fitness together with some predefined complexity measure, that can be a quantification of, either the size of the model or some of its functional characteristics, like ruggedness.

In 2017, a regularization method, called SoftTarget, was introduced for artificial neural networks (Aghajanyan, 2017). The method is general enough for being applied to any iterative machine learning method because it does not assume any particular predefined form of the model. It is based on the idea that the model outputs in the early stages of the training process can provide some useful information about the problem and it works by integrating previous outputs in the fitness function. To the best of our knowledge, SoftTarget regularization has never been applied to GP before. This paper contains, at least, the following novel contributions:

  • We apply SoftTarget to GP for the first time.

  • We introduce a new complexity measure for GP, able to quantify its degree of curvature and completely parameter-free.

The previous two points have allowed us to generate two novel GP system: one using SoftTarget regularization, called softGP, and one co-optimizing error and complexity using nested tournaments (Vanneschi, Castelli, Scott, & Trujillo, 2019), called Nested. The research hypothesis that we want to demonstrate in this paper is that in cases standard GP suffers from severe overfitting, neither softGP, nor Nested, are sufficient to eliminate overfitting. Overfitting, in fact, is a complex phenomenon, and it depends on several different aspects, and taking care of only one of those aspects at the time is not enough. On the other hand, using SoftTarget and minimizing complexity are two independent factors, that can coexist in a unique GP system. So, we finally introduce a method, called Hybrid, that joins SoftTarget regularization and complexity minimization. The objectives of this paper are to experimentally demonstrate the following hypothesis:

  • On problems for which traditional GP suffers from overfitting, Hybrid outperforms not only traditional GP, but also softGP and Nested, without overfitting.

  • On problems for which traditional GP does not overfit, the performance of Hybrid is not worse than the one of traditional GP.

Last but not least, in this paper we also want to confirm a finding that has already been observed in the literature (see for instance (De Falco et al., 2007, Silva and Vanneschi, 2009, Vanneschi and Silva, 2009, Vladislavleva et al., 2009, Vanneschi et al., 2010, Castelli et al., 2011)): the size of an evolved model has poor or no correlation with its generalization ability. For this reason, complexity measures used for regularization should be based on other (possibly functional) characteristics, instead of size.

The paper is organized as follows: in Section 2, we revise previous and related work relative to the use of regularization in GP. Section 3 explains the functioning of SoftTarget regularization and presents softGP. Section 4 introduces our new measure of complexity, explains the functioning of nested tournaments and presents the Nested method. Section 5 contains our experimental study, and it is partitioned into a presentation of the studied test problems and the used parameter settings, and a discussion of the obtained results. Finally, Section 6 concludes the paper and suggests ideas for future research.

Section snippets

Previous and related work: Regularization in GP

Regularization techniques constitute a vast field of study and application in machine learning, and a general review is beyond the scope of this paper. The interested reader is referred to (Buhlmann & van de Geer, 2011) for a general introduction and to (Kukacka et al., 2018) for a survey of the modern techniques and trends. On the other hand, the focus of this section is on the analysis of previous work where regularization has been applied to GP.

Even though the term “regularization” has not

Soft Target Regularization

SoftTarget regularization was introduced by Aghajanyan for artificial neural networks in (Aghajanyan, 2017), and it was inspired by an idea presented few years earlier in (Hinton, Vinyals, & Dean, 2015). The objective is being able to reduce overfitting, without sacrificing the capacity of the model. The idea is that the models’ outputs in the early stages of the training process, even though not perfect, can provide some useful information about the problem itself, and thus do not deserve to

Measure of functional complexity for GP and nested tournament

Let (X,y) be a supervised training set, characterized by a set of input vectors X={x1,x2,,xn} and the vector of the corresponding expected outputs y={y1,y2,,yn}. Let dv be a metric of distance between vectors and ds a metric of distance between scalar values. Finally, let f be a GP program. We now define the vector of pairwise distances of the elements in X:z=dv(xi,xj)i=1,2,n,j=1,2,,n

Vector z can be created using the pseudo-code in Algorithm 2.

Algorithm 2. Pseudo-code for creating the

Experimental study

This section presents our experimental study. It is partitioned into a discussion of the studied test problems and employed experimental settings (Section 5.1) and a presentation of the obtained experimental results (Section 5.2).

Conclusions and future work

In this paper, we have defined three novel Genetic Programming (GP) systems: the first one, called softGP, uses for the first time SoftTarget regularization in GP; the second one, called Nested, optimizes a novel complexity measure, called I/O Distance Correlation, together with the error, using nested tournaments; the third one, called Hybrid, joins the two previous regularization strategies, integrating the use of SoftTarget with the minimization of the complexity of the solutions.

CRediT authorship contribution statement

Conceptualization, Funding acquisition, Investigation, Methodology, Validation, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by national funds through the FCT (Fundação para a Ciência e a Tecnologia) by the projects GADgET (DSAIPA/DS/0022/2018), BINDER (PTDC/CCIINF/29168/2017), and AICE (DSAIPA/DS/0113/2019). Mauro Castelli acknowledges the financial support from the Slovenian Research Agency (research core funding No. P5- 0410).

References (35)

  • M. Castelli et al.

    A quantitative study of learning and generalization in genetic programming

  • I. De Falco et al.

    Parsimony doesn’t mean simplicity: Genetic programming for inductive inference on noisy data

  • Fitzgerald, J. (2014). Bias and variance reduction strategies for improving generalisation performance of genetic...
  • C. Gagné et al.

    Genetic programming, validation sets, and parsimony pressure

  • Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. In arXiv [Online]. Available:...
  • I. Icke et al.

    Improving genetic programming based symbolic regression using deterministic machine learning

  • J.R. Koza

    Genetic programming: On the programming of computers by means of natural selection

    (1992)
  • Cited by (8)

    • Dilated MultiResUNet: Dilated multiresidual blocks network based on U-Net for biomedical image segmentation

      2021, Biomedical Signal Processing and Control
      Citation Excerpt :

      In this paper, we replace convolution operation with improved blocks and design two structures of Dilated MultiResUNet for two datasets based on MultiResUNet [21,22], Dilated Residual Network(DRN) [23]. We apply the regularization [24], DropBlock [25] and Group Normalization [26] to train small number of samples. In order to adapt U-net to multi-scale biomedical feature maps, MultiRes Block is constructed for MultiResUnet to extract and concatenate multi-scale features.

    • An investigation of geometric semantic gp with linear scaling

      2023, GECCO 2023 - Proceedings of the 2023 Genetic and Evolutionary Computation Conference
    • Explainable Artificial Intelligence by Genetic Programming: A Survey

      2023, IEEE Transactions on Evolutionary Computation
    • EGSGP: An Ensemble System Based on Geometric Semantic Genetic Programming

      2023, Communications in Computer and Information Science
    View all citing articles on Scopus
    1

    ORCID ID: https://orcid.org/0000-0003-4732-3328

    2

    ORCID ID: https://orcid.org/0000-0002-8793-1451

    View full text