Abstract
Genetic programming (GP) is an evolutionary machine learning method that can be used to address a wide range of both classification and regression conundrums. However, traditional GP algorithms can lead to unnecessary code growth known as bloating. This can slow down the convergence time, lead to over-fitting, and increase the computational cost required by the algorithm. The main focus of this paper is to control bloating caused by symbolic regression in GP trees. To address the bloating issue, this paper introduces a novel tree substitution method to reduce the tree size while increasing the exploring ability of the GP algorithm. The proposed method incorporates a comprehensive analysis to detect bloating in parent trees. When a bloated tree is detected, a new, smaller tree is generated, leveraging the function frequency of the identified bloated tree. A set of regression experiments have been conducted on six real-world datasets. Results showed that the proposed GP method obtains a reduction in the size of the best individual while maintaining similar performance as standard GP with a tree height limit.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alfaro-Cid, E., Esparcia-Alcázar, A., Sharman, K., Vega, F.F.D.: Prune and plant: a new bloat control method for genetic programming. In: 2008 Eighth International Conference on Hybrid Intelligent Systems, pp. 31–35 (2008)
Dignum, S., Poli, R.: Crossover, sampling, bloat and the harmful effects of size limits. In: O’Neill, M., et al. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 158–169. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78671-9_14
Dignum, S., Poli, R.: Operator equalisation and bloat free GP. In: O’Neill, M., et al. (eds.) EuroGP 2008. LNCS, vol. 4971, pp. 110–121. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78671-9_10
Gardner, M.A., Gagné, C., Parizeau, M.: Bloat control in genetic programming with a histogram-based accept-reject method. In: Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation, New York, NY, USA, pp. 187–188 (2011)
Kinzett, D., Johnston, M., Zhang, M.: Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evol. Intell. 2(4), 151–168 (2009)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Koza, J.R.: Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4, 87–112 (1994)
Luke, S., Panait, L.: A comparison of bloat control methods for genetic programming. Evol. Comput. 14(3), 309–344 (2006)
O’Neill, M.: Riccardo Poli, William B. Langdon, Nicholas F. Mcphee: a field guide to genetic programming. Genetic Program. Evol. Mach. 10(2), 229–230 (2009)
Panait, L., Luke, S.: Alternative bloat control methods. In: Deb, K. (ed.) GECCO 2004. LNCS, vol. 3103, pp. 630–641. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24855-2_71
Poli, R., Langdon, W., Mcphee, N.: A field guide to genetic programming (2008)
Raymond, C., Chen, Q., Xue, B., Zhang, M.: Genetic programming with rademacher complexity for symbolic regression. In: 2019 IEEE Congress on Evolutionary Computation (CEC), pp. 2657–2664 (2019)
Silva, S., Costa, E.: Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet. Program. Evol. Mach. 10(2), 141–179 (2009)
Silva, S., Dignum, S.: Extending operator equalisation: fitness based self adaptive length distribution for bloat free GP. In: Vanneschi, L., Gustafson, S., Moraglio, A., De Falco, I., Ebner, M. (eds.) EuroGP 2009. LNCS, vol. 5481, pp. 159–170. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01181-8_14
Silva, S., Dignum, S., Vanneschi, L.: Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genet. Program Evolvable Mach. 13, 197–238 (2011)
Silva, S., Vanneschi, L.: The importance of being flat-studying the program length distributions of operator equalisation. In: Riolo, R., Vladislavleva, E., Moore, J. (eds.) Genetic Programming Theory and Practice IX. Genetic and Evolutionary Computation, pp. 211–233. Springer, New York (2011). https://doi.org/10.1007/978-1-4614-1770-5_12
Uy, N.Q., Chu, T.H.: Semantic approximation for reducing code bloat in genetic programming. Swarm Evol. Comput. 58, 100729 (2020)
Vanneschi, L., Castelli, M., Silva, S.: Measuring bloat, overfitting and functional complexity in genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, New York, NY, USA, pp. 877–884 (2010)
Acknowledgement
This work is supported in part by the Marsden Fund of New Zealand Government under Contract MFP-VUW2016 and MFP-VUW1913.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Rimas, M., Chen, Q., Zhang, M. (2024). Bloating Reduction in Symbolic Regression Through Function Frequency-Based Tree Substitution in Genetic Programming. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_34
Download citation
DOI: https://doi.org/10.1007/978-981-99-8391-9_34
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)