Elsevier

Knowledge-Based Systems

Volume 260, 25 January 2023, 110129
Knowledge-Based Systems

A hierarchical estimation of multi-modal distribution programming for regression problems

https://doi.org/10.1016/j.knosys.2022.110129Get rights and content

Abstract

Estimation of distribution programming is an iterative method to evolve program trees. It estimates the distribution of the most suitable program trees and then produces a new generation of program trees by sampling from the distribution. This paper proposes a hierarchical estimation of multimodal distribution programming (HEMMDP). First, the population is divided into K subpopulations by a clustering algorithm where the distribution of each subpopulation is modified according to an objective function. Then, at each generation, a new subpopulation is generated from the modified distribution. The objective function aims to gradually improve the fitness of the program trees in each subpopulation. Finally, the appropriate program trees are added as new terminal nodes to the terminal set, resulting in a new hierarchy. The best-fitting program trees from each subpopulation with high synergistic value are chosen as basis functions. The proposed approach uses a linear function of the basis functions to solve the regression problem. The proposed method is evaluated on several real-world benchmark datasets. The datasets are divided into four classes: small-difficult, small-easy, large-difficult, and large-easy. The proposed method improves the results of the best methods for the regression problem by 232% and 62% for small difficult data sets and large difficult data sets, respectively.

Introduction

In many real-world studies, regression is highly used to predict dependent variables. In regression problems, different approaches can be considered to reach an appropriate solution depending on the level of complexity of the equations. The traditional approaches applied to solve regression problems are linear models [1], [2], [3], additive models [4], least squares [5], [6], [7], Bayesian models [8], [9], lasso regression [10], neural networks [11], [12], random forest [13], [14], quantile regression [15], [16], support vector regression [17], [18], [19], bagging and boosting ensembles [20]. These approaches usually apply a specific scheme for approximating the output, such as linear or polynomial functions [1], [2], [3], [8], [10], [13], [15].Splines are applied in additive models using a combination of polynomials. Gaussian and polynomial kernels are traditionally used in kernel-based methods to approximate the target function [17], [18], [19], [20]. Genetic programming (GP) [21] is one of the evolutionary computation techniques that is used for solving different problems [22], [23], [24], [25], [26], and the regression problem is one of the most common [27], [28], [29], [30], [31], [32], [33], [34]. GP has the benefit of not requiring the regression models to be specified beforehand to anticipate the outcome. It is highly flexible in providing regression models as it makes use of program trees to prepare the model baselines [21]. The length of a GP program tree can vary, and it can use a function or terminal multiple times in almost any location (subject to syntactic restrictions) [35]. In general, there is no limit to the functions and terminals used to build a GP program tree, which makes them highly expressive. However, due to some challenges, GP is less considered an effective method to solve the regression problem: (i) newly produced individuals are often less fit than their predecessors [35], [36], and theoretical interpretation of evolution is highly complicated [37]; therefore, upgrading the evolution for better solutions is complex; (ii) during the evolution of GP, abrupt jumps in the search space occur, which would lead to the search randomization and non-graduality; thus, a guided search would not take place and the effectiveness of the search method would consequently decrease [38], [39]. These challenges are mainly due to the destructive functions of the genetic operators, crossover, and mutation [35], [36], [40], [41]. The expression of the search method has theoretical shortcomings as the process of genetic operators in GP cannot be accurately explained. In addition, the main genetic operator in genetic programming, namely crossover, presents many problems that have considerably influenced the performance of the research method to find the best solutions [42], [43]. To take advantage of the evolution of GP trees and reduce the drawbacks mainly related to the use of genetic operators, it is recommended to apply genetic programming distribution estimation (EDA-GP) [44] or Estimation of Distribution Programming (EDP). The latter is a genetic programming technique that incorporates the idea of distributional estimation of algorithms [45] to address the challenges of genetic programming. EDP replaces the genetic operators, crossover, and mutation in GP by sampling candidate solutions from a probabilistic model. The probabilistic model is learned from the fine program trees and then sampled to produce a new generation of program trees. Thus, the EDP benefits from the advantages of using program trees from GP and does not suffer from the disadvantages of using genetic operators.

This paper proposes a new efficient learning method to solve multiple problems in GP and EDP, called Hierarchical Estimation of Multimodal Distribution Programming (HEMMDP). It uses EDP with an appropriate learning method to solve non-graduality in GP. For graduation to occur, the difference between the average fitness of previous generations and the average fitness of the current generation must be small, which exists in natural evolution and should exist in evolutionary algorithms to conduct the search process. HEMMDP proposes a multi-population and hierarchical model to solve the problems of EDP such as (i) multi-modality, and (ii) the computational cost of the conditional probabilities when increasing the nodes’ number in a tree. In the standard GP, the coefficients are estimated from the subtrees of the main regression tree. Moreover, most of the time, large subtrees are needed to estimate the optimized coefficients in the standard GP. In HEMMDP, by using subpopulations and obtaining multiple basis functions in parallel, it is no longer necessary to use large subtrees or other optimization methods to estimate the optimal values of the coefficients. The basis functions are used in a linear function to estimate the final output. The weights that are assigned to the basis functions are calculated separately via a different algorithm. Note that the basis functions and their probability distributions are independent and different from each other. Based on this fact and using a clustering method, HEMMDP divides the initial random population into several different subpopulations such that each subpopulation represents a basis function. Each subpopulation then evolves hierarchically to obtain evolved basis functions. Evolution is done gradually by learning the probability distribution of the basis functions and sampling it with each generation. A probabilistic graphical model is proposed to learn the probability distribution of the basis functions. Furthermore, the learning law is obtained by maximizing an objective function in such a way that its maximization increases the semantic similarity of the individual’s output with the expected output, and also it gradually changes the probability distribution of the subpopulations.

Compared with existing EDP techniques, the main contributions of this study are summarized as follows:

  • In HEMMDP, a multi-objective optimization problem is designed and solved, for learning the probability distributions in EDP, to conduct an effective search for GP solutions (i.e. gradual learning).

  • The multi-modality approach and hierarchical estimation of GP solutions is considered in EDP. Multi-modality is used to solve the problem of non-diverse solutions in EDP. Moreover, the hierarchical estimation of solutions, which is another aspect of HEMMDP, reduces the computational cost of EDP when the GP trees grow in depth.

The rest of this paper is organized as follows: Section 2 briefly describes the research pathway on the EDP. Section 3 describes the proposed HEMMDP method. The experimental results are presented in Section 4. Section 5 concludes the paper.

Section snippets

Related work

The most significant approaches to the standard EDP are described in this section. Probabilistic Incremental Program Evolution (PIPE) is the pioneering EDP method proposed by Salustowicz​ et al. [46]. In this method, univariate probabilistic distributions are used to model the program trees where nodes are assumed to be independent. The probability parameters are incrementally updated to increase the probability of creating the best tree in the population using the history of the past

The proposed method

In this section, the proposed method, called Hierarchical Estimation of Multimodal Distribution Programming (HEMMDP) is discussed. To solve the regression problem, HEMMDP uses the linear combination of several synergistic basis functions. These basis functions are developed and optimized in the main loop of the algorithm. Since the basic functions must be different and independent of each other, a subpopulation is considered for each basis function. The evolution of the basis functions is done

Experimental results

The experiments are presented in five sections. The experiments used to compare the results of HEMMDP with: (i) the best regression methods; (ii) well-established GP-based methods, (iii) the experiments that were used to analyze the impact of the important parameters of HEMMDP, (iv) the experiments used for the ablations of HEMMDP, and (V) the experiments used for the study on the robustness of HEMMDP against noise.

Conclusions

In this paper, a multi-population EDP approach is used to find the basis functions for a regression problem. The proposed method suggests a novel algorithm to address several issues in EDP. To cope with the non-graduality problem in GP, an suitable learning method is proposed for EDP. The method provides a multi-population and hierarchical model to address EDP issues such as multi-modality and the computational cost of conditional probabilities. The regression problem is considered as a

Code availability

The source code of HEMMDP required to reproduce the results is available at the public Github repository.

CRediT authorship contribution statement

Mohaddeseh Koosha: Investigation, Methodology, Visualization, Software. Ghazaleh Khodabandelou: Writing – original draft, Supervision. Mohammad Mehdi Ebadzadeh: Conceptualization, Methodology, Visualization, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (93)

  • WangL. et al.

    Groupwise retargeted least-squares regression

    IEEE Trans. Neural Netw. Learn Syst.

    (2018)
  • CattaneoM.D. et al.

    Lspartition: Partitioning-based least squares regression

    R J.

    (2020)
  • YousofH.M. et al.

    Bayesian semi-parametric logistic regression model with application to credit scoring data

    J. Data Sci.

    (2021)
  • Pérez-RodríguezP. et al.

    A Bayesian genomic regression model with skew normal random errors

    G3: Genes Genomes Genetics

    (2018)
  • McEligotA.J. et al.

    Logistic lasso regression for dietary intakes and breast cancer

    Nutrients

    (2020)
  • Pyo othersJ.C.

    A convolutional neural network regression for quantifying cyanobacteria using hyperspectral imagery

    Remote Sens. Environ.

    (2019)
  • MusikawanP. et al.

    Parallelized metaheuristic-ensemble of heterogeneous feedforward neural networks for regression problems

    IEEE Access

    (2019)
  • ZhongY. et al.

    Online random forests regression with memories

    Knowl. Based Syst.

    (2020)
  • Li othersY.

    Random forest regression for online capacity estimation of lithium-ion batteries

    Appl. Energy

    (2018)
  • DasK. et al.

    Quantile regression

    Nature Methods

    (2019)
  • DemirA. et al.

    Fintech, financial inclusion and income inequality: a quantile regression approach

    Euro. J. Finance

    (2020)
  • F. Zhang, L.J. O’Donnell, Support vector regression, in: Machine Learning: Methods and Applications To Brain Disorders,...
  • ParbatD. et al.

    A python based support vector regression model for prediction of COVID19 cases in India

    Chaos Solitons Fractals

    (2020)
  • AstudilloG. et al.

    Copper price prediction using support vector regression technique

    Appl. Sci. (Switzerland)

    (2020)
  • BaskinI.I. et al.

    Bagging and boosting of regression models

  • KozaJ.

    Genetic programming: on the programming of computers by means of natural selection, 33, (1)

    (1992)
  • ZhuL. et al.

    A decomposition-based multi-objective genetic programming hyper-heuristic approach for the multi-skill resource constrained project scheduling problem

    Knowl. Based Syst.

    (2021)
  • YazdaniS. et al.

    MBCGP-FE: A modified balanced cartesian genetic programming feature extractor

    Knowl. Based Syst.

    (2017)
  • MaJ. et al.

    A filter-based feature construction and feature selection approach for classification using genetic programming

    Knowl. Based Syst.

    (2020)
  • GomesF.M. et al.

    Multiple response optimization: Analysis of genetic programming for symbolic regression and assessment of desirability functions

    Knowl. Based Syst.

    (2019)
  • StanovovV. et al.

    The automatic design of parameter adaptation techniques for differential evolution with genetic programming

    Knowl. Based Syst.

    (2022)
  • HuangZ. et al.

    Semantic linear genetic programming for symbolic regression

    IEEE Trans. Cybern.

    (2022)
  • ChenQ. et al.

    Improving generalization of genetic programming for symbolic regression with angle-driven geometric semantic operators

    IEEE Trans. Evol. Comput.

    (2019)
  • ChenQ. et al.

    Preserving population diversity based on transformed semantics in genetic programming for symbolic regression

    IEEE Trans. Evol. Comput.

    (2021)
  • ZojajiZ. et al.

    Semantic schema based genetic programming for symbolic regression

    Appl. Soft. Comput.

    (2022)
  • HembergE. et al.

    An investigation of local patterns for estimation of distribution genetic programming

  • P.K. Wong, L.Y. Lo, M.L. Wong, K.S. Leung, Grammar-Based Genetic Programming with Bayesian network, in: Proceedings of...
  • DrosteS. et al.

    Theory of evolutionary algorithms and genetic programming

    (2003)
  • UyN.Q. et al.

    The role of syntactic and semantic locality of crossover in genetic programming

  • RothlaufF. et al.

    On the locality of grammatical evolution

  • ManzoniL. et al.

    Specializing context-free grammars with a (1 + 1)-EA

    IEEE Trans. Evol. Comput.

    (2020)
  • McKayR.I. et al.

    Grammar-based genetic programming: A survey

    Genet. Program Evolvable Mach.

    (2010)
  • LangdonW.B.

    Genetic programming convergence

    Genet. Program Evolvable Mach.

    (2021)
  • MajeedH. et al.

    Optimizing genetic programming by exploiting semantic impact of sub trees

    Swarm Evol. Comput.

    (2021)
  • ShanY. et al.

    A survey of probabilistic model building genetic programming, 33

    (2007)
  • SalustowiczR. et al.

    Probabilistic incremental program evolution

    Evol. Comput.

    (1997)
  • View full text