Abstract
Background
Symbolic regression is one of the most common applications of genetic programming (GP), which is a popular evolutionary algorithm in automatic computer program generation. Despite existing success of GP on symbolic regression, the accuracy and efficiency of GP can still be improved especially on complicated symbolic regression problems, enabling GP to be applied to more fields.
Purpose
This paper proposes a novel memetic GP framework to improve the accuracy and search efficiency of GP on complicated symbolic regression problems. The proposed framework consists of two components: feature construction and feature combination. The first component focuses on constructing diverse features. The second component aims to filter redundant features and linearly combines these independent features.
Methods
The first component (feature construction) focuses on constructing polynomial features derived from polynomial functions, and evolves features by a GP solver. In addition, a gradient-based nonlinear least squares algorithm named Levenberg-Marquardt (LM) is embedded in the second component (feature combination) to locally adjust the weights of independent features. A filtering mechanism is put forward to discard redundant features in the second component. Hence, the polynomial features and evolved features can work together in the framework to improve the performance of GP.
Results
Experimental results demonstrate that the proposed framework offers enhanced performance compared with several state-of-the-art algorithms in terms of accuracy and search efficiency on nine benchmark regression problems and three real-world regression problems.
Conclusion
In this study, a novel memetic genetic programming framework is proposed to improve the performance of GP on symbolic regression. Experimental results demonstrate that the proposed framework can improve the accuracy and search efficiency of GP on complicated symbolic regression problems compared with four state-of-the-art algorithms.
Similar content being viewed by others
References
Arnaldo I, Krawiec K, O’Reilly UM (2014) Multiple regression genetic programming. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation. ACM, pp 879–886
Arnaldo I, O’Reilly UM, Veeramachaneni K (2015) Building predictive models via feature synthesis. In: Proceedings of the 2015 annual conference on genetic and evolutionary computation. ACM, pp 983–990
Barrero, DF (2011) Relibility of performance measures in tree-based genetic programming: a study on Koza’s computational effort. Ph.D. thesis, School of Computing of the University of Alcala
Beadle L, Johnson CG (2008) Semantically driven crossover in genetic programming. In: IEEE congress on evolutionary computation. IEEE, pp 111–116
Beadle L, Johnson CG (2009) Semantically driven mutation in genetic programming. In: IEEE congress on evolutionary computation. IEEE, pp 1336–1342
Brameier MF, Banzhaf W (2007) Linear genetic programming. Springer, Berlin
Chen Q, Xue B, Zhang M (2018) Improving generalisation of genetic programming for symbolic regression with angle-driven geometric semantic operators. IEEE Trans Evolut Comput. https://doi.org/10.1109/TEVC.2018.2869621
Chen Q, Zhang M, Xue B (2017) Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans Evolut Comput 21(5):792–806. https://doi.org/10.1109/TEVC.2017.2683489
Chen X, Ong Y, Lim M, Tan KC (2011) A multi-facet survey on memetic computation. IEEE Trans Evolut Comput 15(5):591–607. https://doi.org/10.1109/TEVC.2011.2132725
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evolut Comput 6(2):182–197
Eremeev AV, Kovalenko YV (2019) A memetic algorithm with optimal recombination for the asymmetric travelling salesman problem. Memet Comput 12(1):23–36
Espejo PG, Ventura S, Herrera F (2010) A survey on the application of genetic programming to classification. IEEE Trans Syst Man Cybern C (Appl Rev) 40(2):121–144
Fenton M, Lynch D, Kucera S, Claussen H, O’Neill M (2017) Multilayer optimization of heterogeneous networks using grammatical genetic programming. IEEE Trans Cybern 47(9):2938–2950. https://doi.org/10.1109/TCYB.2017.2688280
Ferreira C (2001) Gene expression programming: a new adaptive algorithm for solving problems. arXiv preprint arXiv:cs/0102027
Fonlupt C, Robilliard D, Marion-Poty V (2011) Linear imperative programming with differential evolution. In: 2011 IEEE symposium on differential evolution (SDE), pp 1–8. https://doi.org/10.1109/SDE.2011.5952066
Hinchliffe M, Hiden H, McKay B, Willis M, Tham M, Barton G (1996) Modelling chemical process systems using a multi-gene genetic programming algorithm. In: Koza JR (ed) Late breaking papers at the genetic programming 1996 conference Stanford University July 28–31, 1996. Stanford Bookstore, Stanford University, CA, USA, pp 56–65
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Kommenda M, Kronberger G, Winkler S, Affenzeller M, Wagner S (2013) Effects of constant optimization by nonlinear least squares minimization in symbolic regression. In: Proceedings of the 15th annual conference companion on Genetic and evolutionary computation. ACM, pp 1121–1128
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection, vol 1. MIT press, Cambridge
Krawiec K, Lichocki P (2009) Approximating geometric crossover in semantic space. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 987–994
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441
McConaghy T (2011) Ffx: fast, scalable, deterministic symbolic regression technology. In: Genetic programming theory and practice IX. Springer, pp 235–260
McDermott J, White DR, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaskowski W, Krawiec K, Harper R, De Jong K (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 791–798
Meuth R, Lim MH, Ong YS, Wunsch DC (2009) A proposition on memes and meta-memes in computing for higher-order learning. Memet Comput 1(2):85–100
Miller JF (2011) Cartesian genetic programming. Springer, Berlin, pp 17–34
Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. In: International conference on parallel problem solving from nature. Springer, pp 21–31
Muñoz L, Trujillo L, Silva S, Castelli M, Vanneschi L (2019) Evolving multidimensional transformations for symbolic regression with m3gp. Memet Comput 11(2):111–126
Nguyen QU, Nguyen XH, O’Neill M (2009) Semantic aware crossover for genetic programming: the case for real-valued function regression. In: European conference on genetic programming. Springer, pp 292–302
Nguyen S, Zhang M, Johnston M, Tan KC (2015) Automatic programming via iterated local search for dynamic job shop scheduling. IEEE Trans Cybern 45(1):1–14. https://doi.org/10.1109/TCYB.2014.2317488
Nguyen S, Zhang M, Tan KC (2017) Surrogate-assisted genetic programming with simplified models for automated design of dispatching rules. IEEE Trans Cybern 47(9):2951–2965. https://doi.org/10.1109/TCYB.2016.2562674
Orzechowski P, Cava WL, Moore JH (2018) Where are we now? A large benchmark study of recent symbolic regression methods. CoRR arXiv:1804.09331
Pawlak TP, Wieloch B, Krawiec K (2015) Semantic backpropagation for designing search operators in genetic programming. IEEE Trans Evolut Comput 19(3):326–340. https://doi.org/10.1109/TEVC.2014.2321259
Price K, Storn R (1995) Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous space. Technical report, International Computer Science Institue, Berkley
Ryan C, Keijzer M (2003) An analysis of diversity of constants of genetic programming. In: European conference on genetic programming. Springer, pp 404–413
Schmidt M, Lipson H (2009) Distilling free-form natural laws from experimental data. Science 324(5923):81–85. https://doi.org/10.1126/science.1165893
Searson DP, Leahy DE, Willis MJ (2010) Gptips: an open source genetic programming toolbox for multigene symbolic regression. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1. Citeseer, pp 77–80
Smits GF, Kotanchek M (2005) Pareto-front exploitation in symbolic regression. In: Genetic programming theory and practice II. Springer, pp 283–299
Suganuma M, Shirakawa S, Nagao, T (2017) A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the genetic and evolutionary computation conference, pp 497–504
Tan LT, Chen WN, Zhang J (2018) A histogram estimation of distribution algorithm for resource scheduling. In: Proceedings of the genetic and evolutionary computation conference companion. ACM, pp 143–144
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288
Topchy A, Punch WF (2001) Faster genetic programming based on local gradient search of numeric leaf values. In: Proceedings of the 3rd annual conference on genetic and evolutionary computation. Morgan Kaufmann Publishers Inc., pp 155–162
Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build 49:560–567
Uy NQ, Hoai NX, O’Neill M (2009) Semantics based mutation in genetic programming: the case for real-valued symbolic regression. In: 15th International conference on soft computing, Mendel, vol 9, pp 73–91
Uy NQ, Hoai NX, O’Neill M, McKay RI, Galván-López E (2011) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolv Mach 12(2):91–119
Vanneschi L, Mauri G, Valsecchi A, Cagnoni S (2006) Heterogeneous cooperative coevolution: strategies of integration between gp and ga. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, pp 361–368
Virgolin M, Alderliesten T, Bosman PAN (2019) Linear scaling with and within semantic backpropagation-based genetic programming for symbolic regression. In: Proceedings of the genetic and evolutionary computation conference, GECCO 2019, Prague, Czech Republic, July 13–17, 2019, pp 1084–1092
Vladislavleva E, Smits G, Den Hertog D (2010) On the importance of data balancing for symbolic regression. IEEE Trans Evolut Comput 14(2):252–277
Vladislavleva EJ, Smits GF, den Hertog D (2009) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans Evolut Comput 13(2):333–349. https://doi.org/10.1109/TEVC.2008.926486
Wieloch B, Krawiec K (2013) Running programs backwards: instruction inversion for effective search in semantic spaces. In: Proceedings of the 15th annual conference on genetic and evolutionary computation
Ong YS, Keane AJ (2004) Meta-lamarckian learning in memetic algorithms. IEEE Trans Evolut Comput 8(2):99–110. https://doi.org/10.1109/TEVC.2003.819944
Ong Y-S, Lim M-H, Zhu N, Wong K-W (2006) Classification of adaptive memetic algorithms: a comparative study. IEEE Tran Syst Man Cybern B (Cybern) 36(1):141–152. https://doi.org/10.1109/TSMCB.2005.856143
Zhang Q, Zhou C, Xiao W, Nelson PC (2007) Improving gene expression programming performance by using differential evolution. In: Sixth international conference on machine learning and applications (ICMLA 2007). IEEE, pp 31–37
Zhong J, Feng L, Ong Y (2017) Gene expression programming: a survey. IEEE Comput Intell Mag 12(3):54–72. https://doi.org/10.1109/MCI.2017.2708618
Zhong J, Ong YS, Cai W (2016) Self-learning gene expression programming. IEEE Trans Evol Comput 20(1):65–80
Acknowledgements
This work is supported by the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X183), the Guangdong Natural Science Foundation Research Team (Grant No. 2018B030312003), and the Fundamental Research Funds for the Central Universities (Grant No. D2191200).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cheng, T., Zhong, J. An efficient memetic genetic programming framework for symbolic regression. Memetic Comp. 12, 299–315 (2020). https://doi.org/10.1007/s12293-020-00311-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12293-020-00311-8