Interaction-transformation symbolic regression with extreme learning machine

doi:10.1016/j.neucom.2020.10.062

Neurocomputing

Volume 423, 29 January 2021, Pages 609-619

https://doi.org/10.1016/j.neucom.2020.10.062 Get rights and content

Abstract

Symbolic Regression searches for a mathematical expression that fits the input data set by minimizing the approximation error. The search space explored by this technique is composed of any mathematical function representable as an expression tree. This provides more flexibility for fitting the data but it also makes the task more challenging. The search space induced by this representation becomes filled with redundancy and ruggedness, sometimes requiring a higher computational budget in order to achieve good results. Recently, a new representation for Symbolic Regression was proposed, called Interaction-Transformation, which can represent function forms as a composition of interactions between predictors and the application of a single transformation function. In this work, we show how this representation can be modeled as a multi-layer neural network with the weights adjusted following the Extreme Learning Machine procedure. The results show that this approach is capable of finding equally good or better results than the current state-of-the-art with a smaller computational cost.

Introduction

Regression analysis [1] is a process used to describe the distribution of a dependent variable, also called target or outcome, given the values of a set of independent variables, called predictors. This relationship can then be used for prediction, the understanding of the system being studied, or even the inference of causal relationships. When describing the regression models ¹, $X$ is the $n \times d$ matrix of n samples, each one composed of d predictors and $X_{i}$ is the i-th sample. The corresponding target values of each sample is described by a vector of n elements called $y$ . A regression model is often described as: $y_{i} = f (X_{i}) + ∊_{i},$ where $f (X_{i})$ is a function of the predictors that outputs a corresponding target value and $∊_{i}$ is a residual or normally distributed random error centered on 0 and with a variance $σ^{2}$ . A simple example of a regression model is the linear model: $y_{i} = f_{linear} (X_{i}) + ∊_{i} = β_{0} + β_{1} \cdot x_{1} \dots β_{d} \cdot x_{d} + ∊_{i},$ with $x_{j}$ representing the j-th variable for the sample $X_{i}$ . This model is often called interpretable since the impact of each predictor to the target value is clearly described by $β$ . With this model, a regression algorithm has the task of finding the optimal $β$ , also called free parameters, that minimizes the prediction error.

As with the linear regression model, many regression algorithms [2] try to adjust the free parameters of a regression model described by a closed function form. Examples of such models are Polynomial Regression [3] and Multilayer Perceptron [4], [5]. Depending on the complexity of the model, we may lose the interpretation capabilities. In such cases, the model is termed a black-box model.

A different approach to regression analysis is given by Symbolic Regression [6], [7], [8], [9], as described in Section 2, where both the function form and the free parameters are adjusted concomitantly. This model is represented by an expression tree and this task is often performed with the use of Genetic Programming algorithms [6], [10], an evolutionary meta-heuristic that evolves a population of expressions. The main motivation behind this approach is that the returned model is expected to be close or equal to the generator function of the studied process, thus improving the understanding of the system.

The main challenge of applying this algorithm is that the explored search space is huge, rugged, noisy, and with many local optima. This makes the performed search inefficient and often leading to bloat [11], [12], [13], [14], i.e. when the generated models are comparable to a black-box model w.r.t. interpretability.

Recently, a new representation for Symbolic Regression was proposed in [15], named Interaction-Transformation (IT) expression, which will be detailed in Section 2. This representation restricts the search space by removing function forms that can lead to bloating. An Interaction-Transformation function is a function of the form: $f (X_{i}) = β_{0} + \sum_{j = 1}^{n} β_{j} \cdot g_{j} (p_{j} (X_{i})),$ with the j-th interaction function described as: $p_{j} (X_{i}) = \prod_{k = 1}^{d} x_{k}^{s_{jk}},$ where $g_{j} : R^{d} \to R$ is called a transformation function, $p_{j} (X_{i})$ is an interaction function, $s_{jk}$ is the strength of the interaction of the k-th predictor for the j-th interaction, and n is the number of additive terms of the expression.

In this same paper, the author introduced the algorithm called SymTree as a simple search heuristic that incrementally expands the current best expression by adding new interactions or experimenting with different transformation functions. Some experiments performed on this paper with synthetic benchmark functions pointed out that SymTree was capable of finding regression models closer to the original generating function than those found by traditional nonlinear and symbolic regression algorithms. This study was further extended in [16] to 20 different equations from physics and engineering resulting in the discovery of the correct generating function in 14 cases and a similar function in 5 of them. One problem with the SymTree algorithm is that, in the worst case scenario, the complexity for t steps of the algorithm is $O (d^{2 t})$ , for a d-dimensional data.

Following Eq. 1, we can model a layered computational model, with the first layer adjusting the strength of interactions ( $s_{jk}$ ), the second layer just serving as a pass-through to the final layer, called transformation layer, in which the free parameters $β_{j}$ are adjusted. In this context, we want to find $S, β$ such that: $y_{i} = f (X_{i}, S, β) + ∊_{i},$ minimizes the approximation error.

With this perceptron model, we can fix the size of the expression and adjust the free parameters. But, to use a gradient descent algorithm to find the optimal values, we depend on the gradient of the error function that, for the model described in Eq. 1, is undefined if we have any $x_{j} ⩽ 0$ .

As such, in this paper, we propose the use of an Extreme Learning Machine [17], [18] approach with the values of the parameters of the first layer being generated at random and the values of the last layer adjusted by a gradient descent algorithm with $l 0$ and $l 1$ regularization. The details of this new approach will be explained in Section 3.

A set of experiments is described in Section 4 to verify the performance of different variations of the proposed algorithm and compare them with the current state-of-the-art in Symbolic Regression. The results reported in Section 5 show that the proposed approach outperforms the state-of-the-art while being up to 30 times faster than its closest competitor. Following the encouraging results, in Section 6 we describe some ideas for further improvements together with some final discussions about the proposed algorithm.

Section snippets

Interaction-transformation representation for symbolic regression

Symbolic regression [6], [7], [8], [9] searches for a function form that describes the relationship of the predictors with the target variable while minimizing the approximation error. A common algorithm used to generate symbolic regression model is the genetic programming [6], [8], [19], a population based search algorithm that evolves expression trees.

In Genetic Programming (GP), the solutions are represented as expression trees where each node may represent an n-ary function, a predictor, or

Interaction-transformation extreme learning machine

This section will give the details of the proposed IT-based Symbolic Regression algorithm. In the first part, we will explain how to represent an IT expression as a Directed Acyclic Graph, followed by a formalization of the optimization problem used to adjust the free parameters, and finally a description of the whole process in algorithmic form.

Experiments

In order to assess the performance of IT-ELM, we have followed the same experimental setup as proposed in [21]. For this purpose we have considered 8 of the real-world data sets used on that paper, using the same 5-fold split, as provided by the authors. A summary of the data sets is provided in Table 1 together with the abbreviation we will use on the next tables.

For each fold, we have trained the model using the training set and measured the Root Mean Squared Error (RMSE) on the training and

Results

In this section, we will analyze the different aspects of the obtained results. First, we will compare the three strategies for determining the values of n and $τ$ and how different regularization impacts the results. From now on, we will refer to the different strategies as named in the previous section and prepend the names with $l 0$ for Orthogonal Matching Pursuit and $l 1$ for Lasso. Following, we will verify how often the different combination of parameters was selected during the grid search

Conclusion

In this paper, we have introduced an adaptation of the Extreme Learning Machine model to search for Interaction-Transformation expressions for Symbolic Regression. The proposed algorithm is composed of three stages: building the feed forward network, optimizing the free parameters, and extracting the symbolic expression.

In the first stage, a multi-layer feed forward network is built with one layer representing the interaction between predictors, another layer applies transformation functions to

CRediT authorship contribution statement

Fabricio Olivetti de Franca: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data curation, Writing - original draft, Writing - review & editing, Supervision, Funding acquisition. Maira Zabuscha de Lima: Conceptualization, Methodology, Formal analysis.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This project was supported by the by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Grant No. 2018/14173–8. Also, the experiments made use of Intel $®$ AI DevCloud, which Intel $®$ provided free access.

Fabrício Olivetti de França received his B.S.E.E. from Catholic University of Santos (2002), and an MSc (2005) and Ph.D. (2010) in Computer and Electrical Engineering from University of Campinas. In 2012 he joined Federal University of ABC as an adjunct professor. Currently, his main research topics are Symbolic Regression, Evolutionary Computation and Functional Data Structures.

References (32)

K. Hornik et al.
Multilayer feedforward networks are universal approximators
Neural Networks
(1989)
J.W. Davidson et al.
Symbolic and numerical regression: experiments and applications
Inf. Sci.
(2003)
G.-B. Huang et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
R.A. Berk
(2008)
R.E. Kass
Nonlinear regression analysis and its applications
J. Am. Stat. Assoc.
(1990)
P. Chaudhuri et al.
Piecewise-polynomial regression trees
Statistica Sinica
(1994)
S.S. Haykin, et al., Neural networks and learning machines/simon haykin....
J.R. Koza
(1994)
W.B. Langdon
Size fair and homologous tree genetic programming crossovers
R. Poli et al.
A field guide to genetic programming
Lulu. Com.
(2008)

J. Albinati et al.

The effect of distinct geometric semantic crossover operators in regression problems, in

E.J. Vladislavleva et al.

Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming

IEEE Trans. Evol. Comput.

(2009)

J.V.C. Fracasso et al.

Multi-objective semantic mutation for genetic programming

I. Arnaldo et al.

Multiple regression genetic programming, in

W. La Cava, J.H. Moore, Semantic variation operators for multidimensional genetic programming, in: Proceedings of the...

F.O. de França

A greedy search tree heuristic for symbolic regression

Inf. Sci.

(2018)

Cited by (18)

Understanding conflict origin and dynamics on Twitter: A real-time detection system
2023, Expert Systems with Applications
Social Network Sites provide a venue for people worldwide to share their point of view and interact with each other, offering a virtual space with freedom for expressing ideas and opinions. The interaction dynamics often creates clusters of users sharing similar interest and opinions, thus creating an information bubble or echo chamber. In certain topics, such as politics, different groups tend to collide and start arguments characterized by conflicts of opinion. This fact has been increasingly observed during the COVID-19 pandemic, fed by misinformation and anti-science movements. One approach to address these issues is to use statistical measures of the posts revolving around the topic of interest, such as the number of shares, likes, and replies. In this paper we propose a methodology to extract a feature set from trending topics of the Twitter social network and apply two white-box models, a Symbolic Regression, named ITEA, and a Decision Tree, for the automated detection and understanding of conflicts. Our experiments show that both models obtain close extrapolation accuracy to the baseline black-box model (Random Forest). As a highlight of this paper, both white-box models are fully described to be used by any practitioner. Additionally, the model created by ITEA allows us to extract some insights from the generated models. Although these models do not allow for a complete comprehension of the dynamics of a conflict, it certainly points towards a direction for a more thorough investigation.
Integration of symbolic regression and domain knowledge for interpretable modeling of remaining fatigue life under multistep loading
2022, International Journal of Fatigue
Citation Excerpt :
Nevertheless, it must be noted that the evolutionary algorithm (i.e., GP) is essentially stochastic and the expression space which it searches in is usually huge. Thus, the search process for interpretable formulas using SR could become very cumbersome and time-consuming [44,45]. To alleviate this deficiency, in the next subsection, domain knowledge, summarized from six semiempirical damage models, is introduced to restrict the SR search space, thereby guiding the evolution of SR formulas.
This research work aims to explore the integration of data-driven symbolic regression (SR) and domain knowledge to model the remaining fatigue life under multistep loading. To this end, six classical semiempirical damage models are analyzed to distill reliable domain knowledge as the restrictions on the structures of SR formulas. Meanwhile, a total of 194 experimental results involving fifteen materials and structures as well as three kinds of loading spectrums are collected for data support. As a major contribution of this research work, a novel model without including fitting parameters is successfully discovered for remaining fatigue life estimation under two-step loading. This model can be interpreted in the framework of conventional damage models, and shows good extendibility to multistep loading through proper definitions of the damage indicator and the damage transition. Extensive model evaluations demonstrate that the discovered model is better than five existing damage models in terms of predictive accuracy and application scope, showing great applicability for remaining life estimation under multistep loading.
Noise immune state of charge estimation of li-ion battery via the extreme learning machine with mixture generalized maximum correntropy criterion
2022, Energy
Citation Excerpt :
However, the implementation of DNNs requires a lot of experimental data and calculation load, and it will bring the disadvantage of slow training speed. In contrast to DNN, ELM model [28,29] with fast learning speed and good generalization performance has been also widely employed for classification [30], regression [31], and so on. Taking advantage of ELM model with simple single hidden layer structure, it can also be a competitive candidates for SOC estimation in comparison to DNN in terms of learning rate.
The state of charge (SOC) plays a crucial role in battery management system, which directly reflects the usage of the battery. Recently the extreme learning machine (ELM) model as a data-driven method has been utilized to estimate SOC due to its simple single hidden layer structure and fast learning performance. The battery management system, however, may usually work in complex working conditions, which means that the non-Gaussian complex noise (or outliers) interference problem may exist in some measurement data for model training. So the performance of the classical ELM with mean square error (MSE) criterion may be degraded under this case. This work considers Non-Gaussian noise interference issue, the MSE in ELM is substituted by mixture generalized maximum correntropy criterion (MGMCC), and a novel robust ELM model is developed to improve the SOC estimation capability which mainly relies on the stable and robust nonlinear similarity characteristics of the MGMCC. A data set from a Panasonic 18,650 battery cell is used to verify the robustness of the proposed model, the experiment results demonstrate that it can achieve better estimation performance in terms of different evaluation metrics compared with the traditional methods.
AI-Aristotle: A physics-informed framework for systems biology gray-box identification
2024, PLoS Computational Biology
AI-Aristotle: A Physics-Informed framework for Systems Biology Gray-Box Identification
2023, arXiv
Transformation-Interaction-Rational Representation for Symbolic Regression: A Detailed Analysis of SRBench Results
2023, ACM Transactions on Evolutionary Learning and Optimization

View all citing articles on Scopus

Maira Zabuscha de Lima is a major in Mathematics (2008) from University of São Paulo (2018) and a degree in Computer Science from Federal University of ABC. Currently she works as an IT consultant at VILT as a Java developer.

View full text

Interaction-transformation symbolic regression with extreme learning machine

Abstract

Introduction

Section snippets

Interaction-transformation representation for symbolic regression

Interaction-transformation extreme learning machine

Experiments

Results

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgment

Neural Networks

Inf. Sci.

Neurocomputing

Nonlinear regression analysis and its applications

J. Am. Stat. Assoc.

Piecewise-polynomial regression trees

Statistica Sinica

Size fair and homologous tree genetic programming crossovers

A field guide to genetic programming

Lulu. Com.

The effect of distinct geometric semantic crossover operators in regression problems, in

Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming

IEEE Trans. Evol. Comput.

Multi-objective semantic mutation for genetic programming

Multiple regression genetic programming, in

A greedy search tree heuristic for symbolic regression

Inf. Sci.