skip to main content
10.1145/3583131.3595918acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Relieving Genetic Programming from Coefficient Learning for Symbolic Regression via Correlation and Linear Scaling

Published:12 July 2023Publication History

ABSTRACT

The difficulty of learning optimal coefficients in regression models using only genetic operators has long been a challenge in genetic programming for symbolic regression. As a simple but effective remedy it has been proposed to perform linear scaling of model outputs prior to a fitness evaluation. Recently, the use of a correlation coefficient-based fitness function with a post-processing linear scaling step for model alignment has been shown to outperform error-based fitness functions in generating symbolic regression models. In this study, we compare the impact of four evaluation strategies on relieving genetic programming (GP) from learning coefficients in symbolic regression and focusing on learning the more crucial model structure. The results from 12 datasets, including ten real-world tasks and two synthetic datasets, confirm that all these strategies assist GP to varying degrees in learning coefficients. Among the them, correlation fitness with one-time linear scaling as post-processing, due to be the most efficient while bringing notable benefits to the performance, is the recommended strategy to relieve GP from learning coefficients.

References

  1. Francesco Archetti, Stefano Lanzeni, Enza Messina, and Leonardo Vanneschi. 2007. Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines 8, 4 (2007), 413--432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bice Cavallo. 2020. Functional relations and Spearman correlation between consistency indices. Journal of the Operational Research Society 71, 2 (2020), 301--311.Google ScholarGoogle ScholarCross RefCross Ref
  3. Qi Chen, Bing Xue, and Mengjie Zhang. 2015. Generalisation and domain adaptation in GP with gradient descent for symbolic regression. In 2015 IEEE congress on evolutionary computation (CEC). IEEE, 1137--1144.Google ScholarGoogle Scholar
  4. Qi Chen, Bing Xue, and Mengjie Zhang. 2019. Improving Generalisation of Genetic Programming for Symbolic Regression with Angle-Driven Geometric Semantic Operators. IEEE Transactions on Evolutionary Computation 23, 3 (2019), 488--502.Google ScholarGoogle ScholarCross RefCross Ref
  5. Qi Chen, Bing Xue, and Mengjie Zhang. 2019. Improving Generalization of Genetic Programming for Symbolic Regression With Angle-Driven Geometric Semantic Operators. IEEE Transactions on Evolutionary Computation 23, 3 (2019), 488--502. Google ScholarGoogle ScholarCross RefCross Ref
  6. Qi Chen, Bing Xue, and Mengjie Zhang. 2020. Improving symbolic regression based on correlation between residuals and variables. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 922--930.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Davide Chicco, Matthijs J Warrens, and Giuseppe Jurman. 2021. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science 7 (2021), e623.Google ScholarGoogle ScholarCross RefCross Ref
  8. Grant Dick. 2022. Genetic Programming, Standardisation, and Stochastic Gradient Descent Revisited: Initial Findings on SRBench. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (Boston, Massachusetts) (GECCO '22). Association for Computing Machinery, New York, NY, USA, 2265--2273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (jul 2012), 2171--2175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nelson Fumo and MA Rafe Biswas. 2015. Regression analysis for prediction of residential energy consumption. Renewable and sustainable energy reviews 47 (2015), 332--343.Google ScholarGoogle Scholar
  11. Nathan Haut, Wolfgang Banzhaf, and Bill Punch. 2023. Correlation Versus RMSE Loss Functions in Symbolic Regression Tasks. In Genetic Programming Theory and Practice XIX. Springer, 31--55.Google ScholarGoogle Scholar
  12. Quang Nhat Huynh, Shelvin Chand, Hemant Kumar Singh, and Tapabrata Ray. 2018. Genetic Programming With Mixed-Integer Linear Programming-Based Library Search. IEEE Transactions on Evolutionary Computation 22, 5 (2018), 733--747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Maarten Keijzer. 2003. Improving symbolic regression with interval arithmetic and linear scaling. In Genetic programming. Springer, 70--82.Google ScholarGoogle Scholar
  14. Michael Kommenda, Bogdan Burlacu, Gabriel Kronberger, and Michael Affenzeller. 2020. Parameter identification for symbolic regression using nonlinear least squares. Genetic Programming and Evolvable Machines 21, 3 (2020), 471--501.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John R Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Tarald O Kvålseth. 1985. Cautionary note about R 2. The American Statistician 39, 4 (1985), 279--285.Google ScholarGoogle Scholar
  17. M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  18. Ji Ni and Peter Rockett. 2014. Tikhonov regularization as a complexity measure in multiobjective genetic programming. IEEE Transactions on Evolutionary Computation 19, 2 (2014), 157--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michael O'Neill, Leonardo Vanneschi, Steven Gustafson, and Wolfgang Banzhaf. 2010. Open issues in genetic programming. Genetic Programming and Evolvable Machines 11, 3 (2010), 339--363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dulce G Pereira, Anabela Afonso, and Fátima Melo Medeiros. 2015. Overview of Friedman's test and post-hoc analysis. Communications in Statistics-Simulation and Computation 44, 10 (2015), 2636--2653.Google ScholarGoogle ScholarCross RefCross Ref
  22. Peter Rockett. 2022. Constant optimization and feature standardization in multiobjective genetic programming. Genetic Programming and Evolvable Machines 23, 1 (2022), 37--69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Conor Ryan and Maarten Keijzer. 2003. An analysis of diversity of constants of genetic programming. In European Conference on Genetic Programming. Springer, 404--413.Google ScholarGoogle ScholarCross RefCross Ref
  24. Patrick Schober, Christa Boer, and Lothar A Schwarte. 2018. Correlation coefficients: appropriate use and interpretation. Anesthesia & Analgesia 126, 5 (2018), 1763--1768.Google ScholarGoogle ScholarCross RefCross Ref
  25. Dominik Sobania, Martin Briesch, David Wittenberg, and Franz Rothlauf. 2022. Analyzing optimized constants in genetic programming on a real-world regression problem. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 606--607.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Alexander Topchy, William F Punch, et al. 2001. Faster genetic programming based on local gradient search of numeric leaf values. In Proceedings of the genetic and evolutionary computation conference (GECCO-2001), Vol. 155162. Morgan Kaufmann San Francisco, CA.Google ScholarGoogle Scholar
  27. Marco Virgolin, Tanja Alderliesten, and Peter AN Bosman. 2019. Linear scaling with and within semantic backpropagation-based genetic programming for symbolic regression. In Proceedings of the genetic and evolutionary computation conference. 1084--1092.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. David R White, James Mcdermott, Mauro Castelli, Luca Manzoni, Brian W Goldman, Gabriel Kronberger, Wojciech Jaśkowski, Una-May O'Reilly, and Sean Luke. 2013. Better GP benchmarks: community survey results and proposals. Genetic Programming and Evolvable Machines 14, 1 (2013), 3--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dabao Zhang. 2017. A coefficient of determination for generalized linear models. The American Statistician 71, 4 (2017), 310--316.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mengjie Zhang and Will Smart. 2004. Genetic programming with gradient descent search for multiclass object classification. In European Conference on Genetic Programming. Springer, 399--408.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Relieving Genetic Programming from Coefficient Learning for Symbolic Regression via Correlation and Linear Scaling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
      July 2023
      1667 pages
      ISBN:9798400701191
      DOI:10.1145/3583131

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)122
      • Downloads (Last 6 weeks)8

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader