Elsevier

Neurocomputing

Volume 423, 29 January 2021, Pages 609-619
Neurocomputing

Interaction-transformation symbolic regression with extreme learning machine

https://doi.org/10.1016/j.neucom.2020.10.062Get rights and content

Abstract

Symbolic Regression searches for a mathematical expression that fits the input data set by minimizing the approximation error. The search space explored by this technique is composed of any mathematical function representable as an expression tree. This provides more flexibility for fitting the data but it also makes the task more challenging. The search space induced by this representation becomes filled with redundancy and ruggedness, sometimes requiring a higher computational budget in order to achieve good results. Recently, a new representation for Symbolic Regression was proposed, called Interaction-Transformation, which can represent function forms as a composition of interactions between predictors and the application of a single transformation function. In this work, we show how this representation can be modeled as a multi-layer neural network with the weights adjusted following the Extreme Learning Machine procedure. The results show that this approach is capable of finding equally good or better results than the current state-of-the-art with a smaller computational cost.

Introduction

Regression analysis [1] is a process used to describe the distribution of a dependent variable, also called target or outcome, given the values of a set of independent variables, called predictors. This relationship can then be used for prediction, the understanding of the system being studied, or even the inference of causal relationships. When describing the regression models 1, X is the n×d matrix of n samples, each one composed of d predictors and Xi is the i-th sample. The corresponding target values of each sample is described by a vector of n elements called y. A regression model is often described as:yi=f(Xi)+i,where f(Xi) is a function of the predictors that outputs a corresponding target value and i is a residual or normally distributed random error centered on 0 and with a variance σ2. A simple example of a regression model is the linear model:yi=flinear(Xi)+i=β0+β1·x1βd·xd+i,with xj representing the j-th variable for the sample Xi. This model is often called interpretable since the impact of each predictor to the target value is clearly described by β. With this model, a regression algorithm has the task of finding the optimal β, also called free parameters, that minimizes the prediction error.

As with the linear regression model, many regression algorithms [2] try to adjust the free parameters of a regression model described by a closed function form. Examples of such models are Polynomial Regression [3] and Multilayer Perceptron [4], [5]. Depending on the complexity of the model, we may lose the interpretation capabilities. In such cases, the model is termed a black-box model.

A different approach to regression analysis is given by Symbolic Regression [6], [7], [8], [9], as described in Section 2, where both the function form and the free parameters are adjusted concomitantly. This model is represented by an expression tree and this task is often performed with the use of Genetic Programming algorithms [6], [10], an evolutionary meta-heuristic that evolves a population of expressions. The main motivation behind this approach is that the returned model is expected to be close or equal to the generator function of the studied process, thus improving the understanding of the system.

The main challenge of applying this algorithm is that the explored search space is huge, rugged, noisy, and with many local optima. This makes the performed search inefficient and often leading to bloat [11], [12], [13], [14], i.e. when the generated models are comparable to a black-box model w.r.t. interpretability.

Recently, a new representation for Symbolic Regression was proposed in [15], named Interaction-Transformation (IT) expression, which will be detailed in Section 2. This representation restricts the search space by removing function forms that can lead to bloating. An Interaction-Transformation function is a function of the form:f(Xi)=β0+j=1nβj·gj(pj(Xi)),with the j-th interaction function described as:pj(Xi)=k=1dxksjk,where gj:RdR is called a transformation function, pj(Xi) is an interaction function, sjk is the strength of the interaction of the k-th predictor for the j-th interaction, and n is the number of additive terms of the expression.

In this same paper, the author introduced the algorithm called SymTree as a simple search heuristic that incrementally expands the current best expression by adding new interactions or experimenting with different transformation functions. Some experiments performed on this paper with synthetic benchmark functions pointed out that SymTree was capable of finding regression models closer to the original generating function than those found by traditional nonlinear and symbolic regression algorithms. This study was further extended in [16] to 20 different equations from physics and engineering resulting in the discovery of the correct generating function in 14 cases and a similar function in 5 of them. One problem with the SymTree algorithm is that, in the worst case scenario, the complexity for t steps of the algorithm is O(d2t), for a d-dimensional data.

Following Eq. 1, we can model a layered computational model, with the first layer adjusting the strength of interactions (sjk), the second layer just serving as a pass-through to the final layer, called transformation layer, in which the free parameters βj are adjusted. In this context, we want to find S,β such that:yi=f(Xi,S,β)+i,minimizes the approximation error.

With this perceptron model, we can fix the size of the expression and adjust the free parameters. But, to use a gradient descent algorithm to find the optimal values, we depend on the gradient of the error function that, for the model described in Eq. 1, is undefined if we have any xj0.

As such, in this paper, we propose the use of an Extreme Learning Machine [17], [18] approach with the values of the parameters of the first layer being generated at random and the values of the last layer adjusted by a gradient descent algorithm with l0 and l1 regularization. The details of this new approach will be explained in Section 3.

A set of experiments is described in Section 4 to verify the performance of different variations of the proposed algorithm and compare them with the current state-of-the-art in Symbolic Regression. The results reported in Section 5 show that the proposed approach outperforms the state-of-the-art while being up to 30 times faster than its closest competitor. Following the encouraging results, in Section 6 we describe some ideas for further improvements together with some final discussions about the proposed algorithm.

Section snippets

Interaction-transformation representation for symbolic regression

Symbolic regression [6], [7], [8], [9] searches for a function form that describes the relationship of the predictors with the target variable while minimizing the approximation error. A common algorithm used to generate symbolic regression model is the genetic programming [6], [8], [19], a population based search algorithm that evolves expression trees.

In Genetic Programming (GP), the solutions are represented as expression trees where each node may represent an n-ary function, a predictor, or

Interaction-transformation extreme learning machine

This section will give the details of the proposed IT-based Symbolic Regression algorithm. In the first part, we will explain how to represent an IT expression as a Directed Acyclic Graph, followed by a formalization of the optimization problem used to adjust the free parameters, and finally a description of the whole process in algorithmic form.

Experiments

In order to assess the performance of IT-ELM, we have followed the same experimental setup as proposed in [21]. For this purpose we have considered 8 of the real-world data sets used on that paper, using the same 5-fold split, as provided by the authors. A summary of the data sets is provided in Table 1 together with the abbreviation we will use on the next tables.

For each fold, we have trained the model using the training set and measured the Root Mean Squared Error (RMSE) on the training and

Results

In this section, we will analyze the different aspects of the obtained results. First, we will compare the three strategies for determining the values of n and τ and how different regularization impacts the results. From now on, we will refer to the different strategies as named in the previous section and prepend the names with l0 for Orthogonal Matching Pursuit and l1 for Lasso. Following, we will verify how often the different combination of parameters was selected during the grid search

Conclusion

In this paper, we have introduced an adaptation of the Extreme Learning Machine model to search for Interaction-Transformation expressions for Symbolic Regression. The proposed algorithm is composed of three stages: building the feed forward network, optimizing the free parameters, and extracting the symbolic expression.

In the first stage, a multi-layer feed forward network is built with one layer representing the interaction between predictors, another layer applies transformation functions to

CRediT authorship contribution statement

Fabricio Olivetti de Franca: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Supervision, Funding acquisition. Maira Zabuscha de Lima: Conceptualization, Methodology, Formal analysis.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This project was supported by the by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Grant No. 2018/14173–8. Also, the experiments made use of Intel® AI DevCloud, which Intel® provided free access.

Fabrício Olivetti de França received his B.S.E.E. from Catholic University of Santos (2002), and an MSc (2005) and Ph.D. (2010) in Computer and Electrical Engineering from University of Campinas. In 2012 he joined Federal University of ABC as an adjunct professor. Currently, his main research topics are Symbolic Regression, Evolutionary Computation and Functional Data Structures.

References (32)

  • J. Albinati et al.

    The effect of distinct geometric semantic crossover operators in regression problems, in

  • E.J. Vladislavleva et al.

    Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming

    IEEE Trans. Evol. Comput.

    (2009)
  • J.V.C. Fracasso et al.

    Multi-objective semantic mutation for genetic programming

  • I. Arnaldo et al.

    Multiple regression genetic programming, in

  • W. La Cava, J.H. Moore, Semantic variation operators for multidimensional genetic programming, in: Proceedings of the...
  • F.O. de França

    A greedy search tree heuristic for symbolic regression

    Inf. Sci.

    (2018)
  • Cited by (18)

    • Integration of symbolic regression and domain knowledge for interpretable modeling of remaining fatigue life under multistep loading

      2022, International Journal of Fatigue
      Citation Excerpt :

      Nevertheless, it must be noted that the evolutionary algorithm (i.e., GP) is essentially stochastic and the expression space which it searches in is usually huge. Thus, the search process for interpretable formulas using SR could become very cumbersome and time-consuming [44,45]. To alleviate this deficiency, in the next subsection, domain knowledge, summarized from six semiempirical damage models, is introduced to restrict the SR search space, thereby guiding the evolution of SR formulas.

    • Noise immune state of charge estimation of li-ion battery via the extreme learning machine with mixture generalized maximum correntropy criterion

      2022, Energy
      Citation Excerpt :

      However, the implementation of DNNs requires a lot of experimental data and calculation load, and it will bring the disadvantage of slow training speed. In contrast to DNN, ELM model [28,29] with fast learning speed and good generalization performance has been also widely employed for classification [30], regression [31], and so on. Taking advantage of ELM model with simple single hidden layer structure, it can also be a competitive candidates for SOC estimation in comparison to DNN in terms of learning rate.

    View all citing articles on Scopus

    Fabrício Olivetti de França received his B.S.E.E. from Catholic University of Santos (2002), and an MSc (2005) and Ph.D. (2010) in Computer and Electrical Engineering from University of Campinas. In 2012 he joined Federal University of ABC as an adjunct professor. Currently, his main research topics are Symbolic Regression, Evolutionary Computation and Functional Data Structures.

    Maira Zabuscha de Lima is a major in Mathematics (2008) from University of São Paulo (2018) and a degree in Computer Science from Federal University of ABC. Currently she works as an IT consultant at VILT as a Java developer.

    View full text