Skip to main content

Advertisement

Log in

An efficient memetic genetic programming framework for symbolic regression

  • Regular Research Paper
  • Published:
Memetic Computing Aims and scope Submit manuscript

Abstract

Background

Symbolic regression is one of the most common applications of genetic programming (GP), which is a popular evolutionary algorithm in automatic computer program generation. Despite existing success of GP on symbolic regression, the accuracy and efficiency of GP can still be improved especially on complicated symbolic regression problems, enabling GP to be applied to more fields.

Purpose

This paper proposes a novel memetic GP framework to improve the accuracy and search efficiency of GP on complicated symbolic regression problems. The proposed framework consists of two components: feature construction and feature combination. The first component focuses on constructing diverse features. The second component aims to filter redundant features and linearly combines these independent features.

Methods

The first component (feature construction) focuses on constructing polynomial features derived from polynomial functions, and evolves features by a GP solver. In addition, a gradient-based nonlinear least squares algorithm named Levenberg-Marquardt (LM) is embedded in the second component (feature combination) to locally adjust the weights of independent features. A filtering mechanism is put forward to discard redundant features in the second component. Hence, the polynomial features and evolved features can work together in the framework to improve the performance of GP.

Results

Experimental results demonstrate that the proposed framework offers enhanced performance compared with several state-of-the-art algorithms in terms of accuracy and search efficiency on nine benchmark regression problems and three real-world regression problems.

Conclusion

In this study, a novel memetic genetic programming framework is proposed to improve the performance of GP on symbolic regression. Experimental results demonstrate that the proposed framework can improve the accuracy and search efficiency of GP on complicated symbolic regression problems compared with four state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/index.php.

References

  1. Arnaldo I, Krawiec K, O’Reilly UM (2014) Multiple regression genetic programming. In: Proceedings of the 2014 annual conference on genetic and evolutionary computation. ACM, pp 879–886

  2. Arnaldo I, O’Reilly UM, Veeramachaneni K (2015) Building predictive models via feature synthesis. In: Proceedings of the 2015 annual conference on genetic and evolutionary computation. ACM, pp 983–990

  3. Barrero, DF (2011) Relibility of performance measures in tree-based genetic programming: a study on Koza’s computational effort. Ph.D. thesis, School of Computing of the University of Alcala

  4. Beadle L, Johnson CG (2008) Semantically driven crossover in genetic programming. In: IEEE congress on evolutionary computation. IEEE, pp 111–116

  5. Beadle L, Johnson CG (2009) Semantically driven mutation in genetic programming. In: IEEE congress on evolutionary computation. IEEE, pp 1336–1342

  6. Brameier MF, Banzhaf W (2007) Linear genetic programming. Springer, Berlin

    MATH  Google Scholar 

  7. Chen Q, Xue B, Zhang M (2018) Improving generalisation of genetic programming for symbolic regression with angle-driven geometric semantic operators. IEEE Trans Evolut Comput. https://doi.org/10.1109/TEVC.2018.2869621

    Article  Google Scholar 

  8. Chen Q, Zhang M, Xue B (2017) Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans Evolut Comput 21(5):792–806. https://doi.org/10.1109/TEVC.2017.2683489

    Article  Google Scholar 

  9. Chen X, Ong Y, Lim M, Tan KC (2011) A multi-facet survey on memetic computation. IEEE Trans Evolut Comput 15(5):591–607. https://doi.org/10.1109/TEVC.2011.2132725

    Article  Google Scholar 

  10. Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009) Modeling wine preferences by data mining from physicochemical properties. Decis Support Syst 47(4):547–553

    Article  Google Scholar 

  11. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evolut Comput 6(2):182–197

    Article  Google Scholar 

  12. Eremeev AV, Kovalenko YV (2019) A memetic algorithm with optimal recombination for the asymmetric travelling salesman problem. Memet Comput 12(1):23–36

    Article  Google Scholar 

  13. Espejo PG, Ventura S, Herrera F (2010) A survey on the application of genetic programming to classification. IEEE Trans Syst Man Cybern C (Appl Rev) 40(2):121–144

    Article  Google Scholar 

  14. Fenton M, Lynch D, Kucera S, Claussen H, O’Neill M (2017) Multilayer optimization of heterogeneous networks using grammatical genetic programming. IEEE Trans Cybern 47(9):2938–2950. https://doi.org/10.1109/TCYB.2017.2688280

    Article  Google Scholar 

  15. Ferreira C (2001) Gene expression programming: a new adaptive algorithm for solving problems. arXiv preprint arXiv:cs/0102027

  16. Fonlupt C, Robilliard D, Marion-Poty V (2011) Linear imperative programming with differential evolution. In: 2011 IEEE symposium on differential evolution (SDE), pp 1–8. https://doi.org/10.1109/SDE.2011.5952066

  17. Hinchliffe M, Hiden H, McKay B, Willis M, Tham M, Barton G (1996) Modelling chemical process systems using a multi-gene genetic programming algorithm. In: Koza JR (ed) Late breaking papers at the genetic programming 1996 conference Stanford University July 28–31, 1996. Stanford Bookstore, Stanford University, CA, USA, pp 56–65

  18. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680

    Article  MathSciNet  Google Scholar 

  19. Kommenda M, Kronberger G, Winkler S, Affenzeller M, Wagner S (2013) Effects of constant optimization by nonlinear least squares minimization in symbolic regression. In: Proceedings of the 15th annual conference companion on Genetic and evolutionary computation. ACM, pp 1121–1128

  20. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection, vol 1. MIT press, Cambridge

    MATH  Google Scholar 

  21. Krawiec K, Lichocki P (2009) Approximating geometric crossover in semantic space. In: Proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, pp 987–994

  22. Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441

    Article  MathSciNet  Google Scholar 

  23. McConaghy T (2011) Ffx: fast, scalable, deterministic symbolic regression technology. In: Genetic programming theory and practice IX. Springer, pp 235–260

  24. McDermott J, White DR, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaskowski W, Krawiec K, Harper R, De Jong K (2012) Genetic programming needs better benchmarks. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 791–798

  25. Meuth R, Lim MH, Ong YS, Wunsch DC (2009) A proposition on memes and meta-memes in computing for higher-order learning. Memet Comput 1(2):85–100

    Article  Google Scholar 

  26. Miller JF (2011) Cartesian genetic programming. Springer, Berlin, pp 17–34

    Book  Google Scholar 

  27. Moraglio A, Krawiec K, Johnson CG (2012) Geometric semantic genetic programming. In: International conference on parallel problem solving from nature. Springer, pp 21–31

  28. Muñoz L, Trujillo L, Silva S, Castelli M, Vanneschi L (2019) Evolving multidimensional transformations for symbolic regression with m3gp. Memet Comput 11(2):111–126

    Article  Google Scholar 

  29. Nguyen QU, Nguyen XH, O’Neill M (2009) Semantic aware crossover for genetic programming: the case for real-valued function regression. In: European conference on genetic programming. Springer, pp 292–302

  30. Nguyen S, Zhang M, Johnston M, Tan KC (2015) Automatic programming via iterated local search for dynamic job shop scheduling. IEEE Trans Cybern 45(1):1–14. https://doi.org/10.1109/TCYB.2014.2317488

    Article  Google Scholar 

  31. Nguyen S, Zhang M, Tan KC (2017) Surrogate-assisted genetic programming with simplified models for automated design of dispatching rules. IEEE Trans Cybern 47(9):2951–2965. https://doi.org/10.1109/TCYB.2016.2562674

    Article  Google Scholar 

  32. Orzechowski P, Cava WL, Moore JH (2018) Where are we now? A large benchmark study of recent symbolic regression methods. CoRR arXiv:1804.09331

  33. Pawlak TP, Wieloch B, Krawiec K (2015) Semantic backpropagation for designing search operators in genetic programming. IEEE Trans Evolut Comput 19(3):326–340. https://doi.org/10.1109/TEVC.2014.2321259

    Article  Google Scholar 

  34. Price K, Storn R (1995) Differential evolution-a simple and efficient adaptive scheme for global optimization over continuous space. Technical report, International Computer Science Institue, Berkley

  35. Ryan C, Keijzer M (2003) An analysis of diversity of constants of genetic programming. In: European conference on genetic programming. Springer, pp 404–413

  36. Schmidt M, Lipson H (2009) Distilling free-form natural laws from experimental data. Science 324(5923):81–85. https://doi.org/10.1126/science.1165893

    Article  Google Scholar 

  37. Searson DP, Leahy DE, Willis MJ (2010) Gptips: an open source genetic programming toolbox for multigene symbolic regression. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1. Citeseer, pp 77–80

  38. Smits GF, Kotanchek M (2005) Pareto-front exploitation in symbolic regression. In: Genetic programming theory and practice II. Springer, pp 283–299

  39. Suganuma M, Shirakawa S, Nagao, T (2017) A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the genetic and evolutionary computation conference, pp 497–504

  40. Tan LT, Chen WN, Zhang J (2018) A histogram estimation of distribution algorithm for resource scheduling. In: Proceedings of the genetic and evolutionary computation conference companion. ACM, pp 143–144

  41. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288

    MathSciNet  MATH  Google Scholar 

  42. Topchy A, Punch WF (2001) Faster genetic programming based on local gradient search of numeric leaf values. In: Proceedings of the 3rd annual conference on genetic and evolutionary computation. Morgan Kaufmann Publishers Inc., pp 155–162

  43. Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build 49:560–567

    Article  Google Scholar 

  44. Uy NQ, Hoai NX, O’Neill M (2009) Semantics based mutation in genetic programming: the case for real-valued symbolic regression. In: 15th International conference on soft computing, Mendel, vol 9, pp 73–91

  45. Uy NQ, Hoai NX, O’Neill M, McKay RI, Galván-López E (2011) Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genet Program Evolv Mach 12(2):91–119

    Article  Google Scholar 

  46. Vanneschi L, Mauri G, Valsecchi A, Cagnoni S (2006) Heterogeneous cooperative coevolution: strategies of integration between gp and ga. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, pp 361–368

  47. Virgolin M, Alderliesten T, Bosman PAN (2019) Linear scaling with and within semantic backpropagation-based genetic programming for symbolic regression. In: Proceedings of the genetic and evolutionary computation conference, GECCO 2019, Prague, Czech Republic, July 13–17, 2019, pp 1084–1092

  48. Vladislavleva E, Smits G, Den Hertog D (2010) On the importance of data balancing for symbolic regression. IEEE Trans Evolut Comput 14(2):252–277

    Article  Google Scholar 

  49. Vladislavleva EJ, Smits GF, den Hertog D (2009) Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. IEEE Trans Evolut Comput 13(2):333–349. https://doi.org/10.1109/TEVC.2008.926486

    Article  Google Scholar 

  50. Wieloch B, Krawiec K (2013) Running programs backwards: instruction inversion for effective search in semantic spaces. In: Proceedings of the 15th annual conference on genetic and evolutionary computation

  51. Ong YS, Keane AJ (2004) Meta-lamarckian learning in memetic algorithms. IEEE Trans Evolut Comput 8(2):99–110. https://doi.org/10.1109/TEVC.2003.819944

    Article  Google Scholar 

  52. Ong Y-S, Lim M-H, Zhu N, Wong K-W (2006) Classification of adaptive memetic algorithms: a comparative study. IEEE Tran Syst Man Cybern B (Cybern) 36(1):141–152. https://doi.org/10.1109/TSMCB.2005.856143

    Article  Google Scholar 

  53. Zhang Q, Zhou C, Xiao W, Nelson PC (2007) Improving gene expression programming performance by using differential evolution. In: Sixth international conference on machine learning and applications (ICMLA 2007). IEEE, pp 31–37

  54. Zhong J, Feng L, Ong Y (2017) Gene expression programming: a survey. IEEE Comput Intell Mag 12(3):54–72. https://doi.org/10.1109/MCI.2017.2708618

    Article  Google Scholar 

  55. Zhong J, Ong YS, Cai W (2016) Self-learning gene expression programming. IEEE Trans Evol Comput 20(1):65–80

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X183), the Guangdong Natural Science Foundation Research Team (Grant No. 2018B030312003), and the Fundamental Research Funds for the Central Universities (Grant No. D2191200).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinghui Zhong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, T., Zhong, J. An efficient memetic genetic programming framework for symbolic regression. Memetic Comp. 12, 299–315 (2020). https://doi.org/10.1007/s12293-020-00311-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12293-020-00311-8

Keywords

Navigation