Elsevier

Neurocomputing

Volume 137, 5 August 2014, Pages 293-301
Neurocomputing

An improved Gene Expression Programming approach for symbolic regression problems

https://doi.org/10.1016/j.neucom.2013.05.062Get rights and content

Abstract

Gene Expression Programming (GEP) is a powerful evolutionary method for knowledge discovery and model learning. Based on the basic GEP algorithm, this paper proposes an improved algorithm named S_GEP, which is especially suitable for dealing with symbolic regression problems. The major advantages for this S_GEP method include: (1) A new method for evaluating individual without expression tree; (2) a corresponding expression tree construction schema for the new evaluating individual method if required by some special complex problems; and (3) a new approach for manipulating numeric constants so as to improve the convergence. A thorough comparative study between our proposed S_GEP method with the primitive GEP, as well as other methods are included in this paper. The comparative results show that the proposed S_GEP method can significantly improve the GEP performance. Several well-studied benchmark test cases and real-world test cases demonstrate the efficiency and capability of our proposed S_GEP for symbolic regression problems.

Introduction

Gene Expression Programming (GEP) is developed by a Portuguese scientist, named Ferreira, in 2001, which were derived and improved from Genetic Algorithm (GA) and Genetic Programming (GP) [1]. It is a new revolutionary member of the genetic computing family, benefiting from the genetic expression of the knowledge discovery technologies, owning to the merits of GP and GA, that evolves computer programs. In fact, they can take many forms such as mathematical expressions, neural networks, decision trees, polynomial constructs, logical expressions, and so on [1].

With simple, linear and compact chromosomes and easy genetic operators, GEP is a powerful global search tool. Since Ferreira released the first GEP research results, GEP has become an active research area of evolutionary computation and has been applied in many fields and well solved a large variety of complex problems, such as classification [2], [3], symbolic regression and function mining [4], [5], [6], time-series analysis [7], optimization [8], and so on.

In recent years, numerous researchers have investigated GEP and proposed a series of improved GEP methods, processing data in specific fields with high effectiveness and efficiency. Li et al. [9], proposed a prefix K-expression structure to try to preserve good structures, which achieves a better convergence and efficiency in the classification; Duan et al. [10], posed a new dynamic adjustment of individual coding length through the ORF filter operator to reduce the situation of GEP lowering efficiency caused by an overly long string of individuals. With all the theories aforementioned, however, the primitive GEP should be applied to describe the genotype into the expression tree, and further to traverse the expression tree to calculate the fitness. As the tree construction and traverse were of very time-consuming operation, it greatly affected the efficiency of the algorithm; Elena et al. [11], presented an adaptive GEP algorithm, which automatically adapted the number of genes used by the chromosome. The adaptation process taken place at chromosome level, allowing chromosomes in the population to evolve with different number of genes to reduce the computational effort; Ryana and Hiblerb [12] presented a Robust Gene Expression Programming which used the simple grammar of prefix expressions and the simple encoding of bit vectors, reaping the benefits of encoding the expressive structures of trees and the power of breaking the “phenotype barrier”.

Symbolic regression, namely symbolic function identification, is a function discovery approach for analysis and modeling of numeric multivariate data sets for a purpose of getting insights about data-generating systems [14]. Symbolic regression has had both successful academic [15], [16], [17], [18]and industrial applications [19], [20].

Based on the basic GEP proposed by Ferreira, this paper describes the power S_GEP, which is specifically suitable for symbolic regression problems. The S_GEP has several other improvements containing: a new method for decoding and evaluating chromosome based on our previous research in literature [13], and a corresponding expression tree (ET) constructing and its traversing schema, a new approach for manipulating constants. The proposed new method for decoding and evaluating chromosome does not require constructing and traversing the ET but directly using stacks to decode chromosome and evaluate the fitness, to reduce the time–space complexity. The proposed new approach for manipulating numeric constants improves the convergence of population. Experimental results obviously indicate that S_GEP outperforms classical GEP with less computational effort and higher effectiveness.

Section snippets

Brief overview of symbolic regression

In detail, the task of regression is to identify the variables (inputs) in the data that are related to the changes in the important control variables (outputs), to express these relationships in mathematical models, and to analyze the quality and generality of the constructed models [14]. Symbolic regression differs from traditional regression since it does not rely on a specific a priori determined model structure. The only assumption made in symbolic regression is that the response surface

Numeric constants in S_GEP

In traditional GEP algorithms, the translation of the head/tail domain is done in the usual fashion, but, after translation, additional processing is needed in order to replace the question marks in the tree by the numeric constants they represent. S_GEP introduces a rather simple yet highly effective strategy to handle this problem, thus enabling GEP to discover functions with constant terms and accelerating convergence procedure. The strategies allow the constant to be a special terminal

Experiment and result analyses

In this section, several experiments using S_GEP for selected symbolic regression problems are designed to justify the advantage of S_GEP for symbolic regression problems. Two of these testing cases are real-world project problem, and other testing cases all have been studied by other researchers with regard to symbolic regression issue. Various experiments in this section are for two main purposes:(1) testing for improvements featured of S_GEP, and (2) testing S_GEP capability for both

Conclusions

In this paper we present an improved gene expression programming (S_GEP) specially suitable for symbolic regression and we compare its performance against traditional GEP algorithms in several instances of symbolic regression problems. Besides, we experimentally clearly show that all improvements proposed in S_GEP over the primitive GEP and other related methods are advantageous. It not only can direct fast evaluate chromosomes based on the stack instead of building and then traversing the

Acknowledgments

We highly appreciate that this work is supported by the National Science Foundation of China Grant #60763012, and the National Science Foundation of Guangxi Grant #2012GXNSFBA053161, and the scientific research project of the Guangxi Education Department #201010LX293. Chang-an Yuan is the corresponding author.

YuZhong Peng received his master׳s degree in Computer Application Technology from Guangxi Teachers Education University, China, in 2009. He is an Associate Professor at Guangxi Teachers Education University. His research interests are in Evolutionary Computation and Data modeling.

Refrences (24)

  • Noah Ryan et al.

    Robust Gene Expression Programming

    Proced. Comput. Sci.

    (2011)
  • C. Ferreira

    Gene Expression Programming: a new adaptive algorithm for solving problems

    Complex Syst.

    (2001)
  • C Zhou et al.

    Evolving accurate and compact classification rules with Gene Expression Programming

    IEEE Trans. Evolut. Comput.

    (2003)
  • V K Karakasis et al.

    Efficient evolution of accurate classification rules using a combination of gene expression programming and clonal selection

    IEEE Trans. Evolut. Comput.

    (2008)
  • H.S. Lopes et al.

    EGIPSYS: an enhanced Gene Expression Programming approach for symbolic regression problems

    Int. J. Appl. Math. Comput. Sci.

    (2004)
  • C.A Yuan et al.

    Convergency of genetic regression in data mining based on Gene Expression Programming and optimized solution

    Int. J. Comput. Appl.

    (2006)
  • A. Gandomi et al.

    Novel approach to strength modeling of concrete under triaxial compression

    J. Mater. Civ. Eng.

    (2012)
  • Zuo Jie et al.

    Time series prediction based on Gene Expression Programming[C]

    WAIM

    (2004)
  • YUAN Chang-an PENG Yu-zhong

    Multi-cellular Gene Expression Programming algorithm for function optimization

    Control Theory Appl.

    (2010)
  • X. Li,C. Zhou,W. Xiao, P.C. Nelson, Prefix Gene Expression Programming[C], in: Genetic and Evolutionary Computation...
  • Lei Duan et al.

    Design and implementation of ORF filter in Gene Expression Programming

    J. Sichuan Univ. (Eng. Sci. Ed.)

    (2007)
  • B. Elena et al.

    AdaGEP – an adaptive Gene Expression Programming algorithm[C], ninth international symposium on symbolic and numeric algorithms for scientific computing

    IEEE Comput. Soc.

    (2008)
  • Cited by (0)

    YuZhong Peng received his master׳s degree in Computer Application Technology from Guangxi Teachers Education University, China, in 2009. He is an Associate Professor at Guangxi Teachers Education University. His research interests are in Evolutionary Computation and Data modeling.

    Dr. ChangAn Yuan received the Ph.D. degree in Computer Application Technology from the Sichuan University, China, in 2006. He is a professor at Guangxi Teachers Education University. His research interests include Computational intelligence and Data mining.

    Xiao Qin received her master׳s degree in Computer Application Technology from Guangxi Teachers Education University, China, in 2009. She is an Associate Professor at Guangxi Teachers Education University. Her research interests are in Evolutionary Computation and Data mining.

    Dr. JiangTao Huang received his Ph.D. degree in Computer Application Technology from the Sichuan University, China, in 2011. He is an Associate Fellow at Guangxi Teachers Education University. His research interests are in Data mining and Information Fusion.

    YaBing Shi received her master׳s degree in Computer Application Technology from Guangxi Teachers Education University, China, in 2009. She is a lecturer at Guangxi Teachers Education University. Her research interests are in Data mining.

    View full text