An improved Gene Expression Programming approach for symbolic regression problems
Introduction
Gene Expression Programming (GEP) is developed by a Portuguese scientist, named Ferreira, in 2001, which were derived and improved from Genetic Algorithm (GA) and Genetic Programming (GP) [1]. It is a new revolutionary member of the genetic computing family, benefiting from the genetic expression of the knowledge discovery technologies, owning to the merits of GP and GA, that evolves computer programs. In fact, they can take many forms such as mathematical expressions, neural networks, decision trees, polynomial constructs, logical expressions, and so on [1].
With simple, linear and compact chromosomes and easy genetic operators, GEP is a powerful global search tool. Since Ferreira released the first GEP research results, GEP has become an active research area of evolutionary computation and has been applied in many fields and well solved a large variety of complex problems, such as classification [2], [3], symbolic regression and function mining [4], [5], [6], time-series analysis [7], optimization [8], and so on.
In recent years, numerous researchers have investigated GEP and proposed a series of improved GEP methods, processing data in specific fields with high effectiveness and efficiency. Li et al. [9], proposed a prefix K-expression structure to try to preserve good structures, which achieves a better convergence and efficiency in the classification; Duan et al. [10], posed a new dynamic adjustment of individual coding length through the ORF filter operator to reduce the situation of GEP lowering efficiency caused by an overly long string of individuals. With all the theories aforementioned, however, the primitive GEP should be applied to describe the genotype into the expression tree, and further to traverse the expression tree to calculate the fitness. As the tree construction and traverse were of very time-consuming operation, it greatly affected the efficiency of the algorithm; Elena et al. [11], presented an adaptive GEP algorithm, which automatically adapted the number of genes used by the chromosome. The adaptation process taken place at chromosome level, allowing chromosomes in the population to evolve with different number of genes to reduce the computational effort; Ryana and Hiblerb [12] presented a Robust Gene Expression Programming which used the simple grammar of prefix expressions and the simple encoding of bit vectors, reaping the benefits of encoding the expressive structures of trees and the power of breaking the “phenotype barrier”.
Symbolic regression, namely symbolic function identification, is a function discovery approach for analysis and modeling of numeric multivariate data sets for a purpose of getting insights about data-generating systems [14]. Symbolic regression has had both successful academic [15], [16], [17], [18]and industrial applications [19], [20].
Based on the basic GEP proposed by Ferreira, this paper describes the power S_GEP, which is specifically suitable for symbolic regression problems. The S_GEP has several other improvements containing: a new method for decoding and evaluating chromosome based on our previous research in literature [13], and a corresponding expression tree (ET) constructing and its traversing schema, a new approach for manipulating constants. The proposed new method for decoding and evaluating chromosome does not require constructing and traversing the ET but directly using stacks to decode chromosome and evaluate the fitness, to reduce the time–space complexity. The proposed new approach for manipulating numeric constants improves the convergence of population. Experimental results obviously indicate that S_GEP outperforms classical GEP with less computational effort and higher effectiveness.
Section snippets
Brief overview of symbolic regression
In detail, the task of regression is to identify the variables (inputs) in the data that are related to the changes in the important control variables (outputs), to express these relationships in mathematical models, and to analyze the quality and generality of the constructed models [14]. Symbolic regression differs from traditional regression since it does not rely on a specific a priori determined model structure. The only assumption made in symbolic regression is that the response surface
Numeric constants in S_GEP
In traditional GEP algorithms, the translation of the head/tail domain is done in the usual fashion, but, after translation, additional processing is needed in order to replace the question marks in the tree by the numeric constants they represent. S_GEP introduces a rather simple yet highly effective strategy to handle this problem, thus enabling GEP to discover functions with constant terms and accelerating convergence procedure. The strategies allow the constant to be a special terminal
Experiment and result analyses
In this section, several experiments using S_GEP for selected symbolic regression problems are designed to justify the advantage of S_GEP for symbolic regression problems. Two of these testing cases are real-world project problem, and other testing cases all have been studied by other researchers with regard to symbolic regression issue. Various experiments in this section are for two main purposes:(1) testing for improvements featured of S_GEP, and (2) testing S_GEP capability for both
Conclusions
In this paper we present an improved gene expression programming (S_GEP) specially suitable for symbolic regression and we compare its performance against traditional GEP algorithms in several instances of symbolic regression problems. Besides, we experimentally clearly show that all improvements proposed in S_GEP over the primitive GEP and other related methods are advantageous. It not only can direct fast evaluate chromosomes based on the stack instead of building and then traversing the
Acknowledgments
We highly appreciate that this work is supported by the National Science Foundation of China Grant #60763012, and the National Science Foundation of Guangxi Grant #2012GXNSFBA053161, and the scientific research project of the Guangxi Education Department #201010LX293. Chang-an Yuan is the corresponding author.
YuZhong Peng received his master׳s degree in Computer Application Technology from Guangxi Teachers Education University, China, in 2009. He is an Associate Professor at Guangxi Teachers Education University. His research interests are in Evolutionary Computation and Data modeling.
Refrences (24)
- et al.
Robust Gene Expression Programming
Proced. Comput. Sci.
(2011) Gene Expression Programming: a new adaptive algorithm for solving problems
Complex Syst.
(2001)- et al.
Evolving accurate and compact classification rules with Gene Expression Programming
IEEE Trans. Evolut. Comput.
(2003) - et al.
Efficient evolution of accurate classification rules using a combination of gene expression programming and clonal selection
IEEE Trans. Evolut. Comput.
(2008) - et al.
EGIPSYS: an enhanced Gene Expression Programming approach for symbolic regression problems
Int. J. Appl. Math. Comput. Sci.
(2004) - et al.
Convergency of genetic regression in data mining based on Gene Expression Programming and optimized solution
Int. J. Comput. Appl.
(2006) - et al.
Novel approach to strength modeling of concrete under triaxial compression
J. Mater. Civ. Eng.
(2012) - et al.
Time series prediction based on Gene Expression Programming[C]
WAIM
(2004) Multi-cellular Gene Expression Programming algorithm for function optimization
Control Theory Appl.
(2010)- X. Li,C. Zhou,W. Xiao, P.C. Nelson, Prefix Gene Expression Programming[C], in: Genetic and Evolutionary Computation...
Design and implementation of ORF filter in Gene Expression Programming
J. Sichuan Univ. (Eng. Sci. Ed.)
AdaGEP – an adaptive Gene Expression Programming algorithm[C], ninth international symposium on symbolic and numeric algorithms for scientific computing
IEEE Comput. Soc.
Cited by (0)
YuZhong Peng received his master׳s degree in Computer Application Technology from Guangxi Teachers Education University, China, in 2009. He is an Associate Professor at Guangxi Teachers Education University. His research interests are in Evolutionary Computation and Data modeling.
Dr. ChangAn Yuan received the Ph.D. degree in Computer Application Technology from the Sichuan University, China, in 2006. He is a professor at Guangxi Teachers Education University. His research interests include Computational intelligence and Data mining.
Xiao Qin received her master׳s degree in Computer Application Technology from Guangxi Teachers Education University, China, in 2009. She is an Associate Professor at Guangxi Teachers Education University. Her research interests are in Evolutionary Computation and Data mining.
Dr. JiangTao Huang received his Ph.D. degree in Computer Application Technology from the Sichuan University, China, in 2011. He is an Associate Fellow at Guangxi Teachers Education University. His research interests are in Data mining and Information Fusion.
YaBing Shi received her master׳s degree in Computer Application Technology from Guangxi Teachers Education University, China, in 2009. She is a lecturer at Guangxi Teachers Education University. Her research interests are in Data mining.