Symbolic-regression boosting

Sipper, Moshe; Moore, Jason H.

doi:10.1007/s10710-021-09400-0

Symbolic-regression boosting

Letter
Published: 23 March 2021

Volume 22, pages 357–381, (2021)
Cite this article

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

523 Accesses
4 Citations
7 Altmetric
Explore all metrics

Abstract

Modifying standard gradient boosting by replacing the embedded weak learner in favor of a strong(er) one, we present SyRBo: symbolic-regression boosting. Experiments over 98 regression datasets show that by adding a small number of boosting stages—between 2 and 5—to a symbolic regressor, statistically significant improvements can often be attained. We note that coding SyRBo on top of any symbolic regressor is straightforward, and the added cost is simply a few more evolutionary rounds. SyRBo is essentially a simple add-on that can be readily added to an extant symbolic regressor, often with beneficial results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

T. Chen, C. Guestrin, XGBoost: a scalable tree boosting system, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. ACM, New York, NY, USA, pp. 785–794 (2016). https://doi.org/10.1145/2939672.2939785
M. Fink, P. Perona, Mutual boosting for contextual inference, in Advances in Neural Information Processing Systems. ed. by S. Thrun, L.K. Saul, B. Schölkopf, vol. 16, pp. 1515–1522 (2004). https://proceedings.neurips.cc/paper/2003/file/070dbb6024b5ef93784428afc71f2146-Paper.pdf
Y. Freund, R.E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MathSciNet Google Scholar
J.H. Friedman, Greedy function approximation: a gradient boosting machine. Ann. stat. 29, 1189–1232 (2001)
Article MathSciNet Google Scholar
GPLearn. https://gplearn.readthedocs.io/ (2020). Accessed 20 Nov 2020
M.B. Harries, Boosting a strong learner: evidence against the minimum margin, in Proceedings of the 16th International Conference on Machine Learning, ICML ’99. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 171–180 (1999)
H. Iba, Bagging, boosting, and bloating in genetic programming, in Proceedings of the 1st Annual Conference on Genetic and Evolutionary Computation, vol. 2, pp. 1053–1060 (1999)
S. Karakatič, V. Podgorelec, Building boosted classification tree ensemble with genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 165–166 (2018)
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.Y. Liu, LightGBM: a highly efficient gradient boosting decision tree, in Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17. Curran Associates Inc., Red Hook, NY, USA, pp. 3149–3157 (2017)
E. Oliveira, A. Pozo, S.R. Vergilio, Using boosting techniques to improve software reliability models based on genetic programming, in 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06). IEEE, pp. 643–650 (2006)
L.O.V. Oliveira, F.E. Otero, G.L. Pappa, J. Albinati, Sequential symbolic regression with genetic programming, in Genetic Programming Theory and Practice XII. ed. by R. Riolo, W.P. Worzel, M. Kotanchek (Springer International Publishing, Cham, 2015), pp. 73–90
Chapter Google Scholar
P. Orzechowski, W. La Cava, J.H. Moore, Where are we now? a large benchmark study of recent symbolic regression methods, in Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1183–1190 (2018)
G. Paris, D. Robilliard, C. Fonlupt, Applying boosting techniques to genetic programming, in International Conference on Artificial Evolution (Evolution Artificielle). Springer, pp. 267–278 (2001)
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)
Google Scholar
Scikit-learn: machine learning in python. https://scikit-learn.org/ (2020). Accessed 20 Nov 2020
J. Wickramaratna, S. Holden, B. Buxton, Performance degradation in boosting, in International Workshop on Multiple Classifier Systems. Springer, pp. 11–21 (2001)

Download references

Acknowledgements

This work was supported by National Institutes of Health (USA) Grants LM010098, LM012601, AI116794. We thank Hagai Ravid for spotting an error in an earlier version of the code.

Author information

Authors and Affiliations

Department of Computer Science, Ben-Gurion University, Beer Sheva, 84105, Israel
Moshe Sipper
Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, 19104-6021, USA
Jason H. Moore

Authors

Moshe Sipper
View author publications
You can also search for this author in PubMed Google Scholar
Jason H. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moshe Sipper.

Appendix: detailed results

The results of all experiments over all datasets are given in Tables 3, 4, 5, and 6 for number of stages equal to 2, 3, 4, and 5, respectively. As noted in Sect. 3, for each of the 98 datasets we recorded the mean absolute error attained per algorithm over each of the 30 replicate runs, per each of the 5 test folds. We then computed the median of these scores, which are presented under ‘mean absolute error’ in the tables. Under ‘pval’ we show the results of the 10,000-round permutation tests between the scores of SyRBo and SymbolicRegressor, with a ‘!’ denoting a significant win for SyRBo and a ‘=’ denoting an insignificant loss for SyRBo. Under ‘run times’ we show the median run times for SyRBo and SymbolicRegressor. ‘SR’ denotes SymbolicRegressor.

Table 3 2-stage SyRBo: results of all datasets

Full size table

Table 4 3-stage SyRBo: results of all datasets

Full size table

Table 5 4-stage SyRBo: results of all datasets

Full size table

Table 6 5-stage SyRBo: results of all datasets

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sipper, M., Moore, J.H. Symbolic-regression boosting. Genet Program Evolvable Mach 22, 357–381 (2021). https://doi.org/10.1007/s10710-021-09400-0

Download citation

Received: 26 November 2020
Revised: 03 February 2021
Accepted: 09 February 2021
Published: 23 March 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10710-021-09400-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Symbolic-regression boosting

Abstract

Access this article

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: detailed results

Appendix: detailed results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation