ABSTRACT
Evolutionary feature construction is an important technique in the machine learning domain for enhancing learning performance. However, traditional genetic programming-based feature construction methods often suffer from bloat, which means the sizes of constructed features increase excessively without improved performance. To address this issue, this paper proposes a double-stage lexicase selection operator to control bloat while not damaging search effectiveness. This new operator contains a two-stage selection process, where the first stage selects individuals based on fitness values and the second stage selects individuals based on tree sizes. Therefore, the proposed operator can control bloat meanwhile leveraging the advantage of the lexicase selection operator. Experimental results on 98 regression datasets show that compared to the traditional bloat control method of having a depth limit, the proposed selection operator not only significantly reduces the sizes of constructed features on all datasets but also keeps a similar level of predictive performance. A comparative experiment with seven bloat control methods shows that the double lexicase selection operator achieves the best trade-off between the model performance and the model size.
Supplemental Material
Available for Download
Supplemental material.
- Eva Alfaro-Cid, Anna Esparcia-Alcázar, Ken Sharman, and Francisco Fernández de Vega. 2008. Prune and plant: a new bloat control method for genetic programming. In 2008 Eighth International Conference on Hybrid Intelligent Systems. IEEE, 31--35.Google ScholarDigital Library
- Eva Alfaro-Cid, JJ Merelo, F Fernández de Vega, Anna Isabel Esparcia-Alcázar, and Ken Sharman. 2010. Bloat control operators and diversity in genetic programming: A comparative study. Evolutionary Computation 18, 2 (2010), 305--332.Google ScholarDigital Library
- Wolfgang Banzhaf, Peter Nordin, Robert E Keller, and Frank D Francone. 1998. Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc.Google Scholar
- Ying Bi, Bing Xue, and Mengjie Zhang. 2022. Genetic Programming-Based Evolutionary Deep Learning for Data-Efficient Image Classification. IEEE Transactions on Evolutionary Computation (2022). Google ScholarCross Ref
- Markus Brameier, Wolfgang Banzhaf, and Wolfgang Banzhaf. 2007. Linear genetic programming. Vol. 1. Springer.Google Scholar
- Qi Chen, Bing Xue, and Mengjie Zhang. 2020. Preserving Population Diversity Based on Transformed Semantics in Genetic Programming for Symbolic Regression. IEEE Transactions on Evolutionary Computation 25, 3 (2020), 433--447.Google ScholarDigital Library
- Qi Chen, Bing Xue, and Mengjie Zhang. 2022. Rademacher Complexity for Enhancing the Generalization of Genetic Programming for Symbolic Regression. IEEE Transactions on Cybernetics 52, 4 (2022), 2382--2395.Google ScholarCross Ref
- Thi Huong Chu, Quang Uy Nguyen, and Michael O'Neill. 2018. Semantic tournament selection for genetic programming based on statistical analysis of error vectors. Information Sciences 436 (2018), 352--366.Google ScholarCross Ref
- Allan de Lima, Samuel Carvalho, Douglas Mota Dias, Enrique Naredo, Joseph P Sullivan, and Conor Ryan. 2022. Lexi2: lexicase selection with lexicographic parsimony pressure. In Proceedings of the Genetic and Evolutionary Computation Conference. 929--937.Google ScholarDigital Library
- Stephen Dignum and Riccardo Poli. 2007. Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. 1588--1595.Google ScholarDigital Library
- David Kinzett, Mark Johnston, and Mengjie Zhang. 2009. Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evolutionary Intelligence 2, 4 (2009), 151--168.Google ScholarCross Ref
- John R Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4, 2 (1994), 87--112.Google Scholar
- William La Cava, Thomas Helmuth, Lee Spector, and Jason H Moore. 2019. A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection. Evolutionary Computation 27, 3 (2019), 377--402.Google ScholarDigital Library
- William La Cava, Tilak Raj Singh, James Taggart, Srinivas Suri, and Jason H Moore. 2018. Learning concise representations for regression by evolving networks of trees. In International Conference on Learning Representations.Google Scholar
- William B Langdon. 2000. Size fair and homologous tree genetic programming crossovers. Genetic Programming and Evolvable Machines 1, 1/2 (2000), 95--119.Google ScholarDigital Library
- William B Langdon and Riccardo Poli. 1998. Fitness causes bloat. In Soft Computing in Engineering Design and Manufacturing. Springer, 13--22.Google Scholar
- Dazhuang Liu, Marco Virgolin, Tanja Alderliesten, and Peter AN Bosman. 2022. Evolvability Degeneration in Multi-Objective Genetic Programming for Symbolic Regression. arXiv preprint arXiv:2202.06983 (2022).Google Scholar
- Sean Luke and Liviu Panait. 2002. Fighting bloat with nonparametric parsimony pressure. In International Conference on Parallel Problem Solving from Nature. Springer, 411--421.Google ScholarCross Ref
- Sean Luke and Liviu Panait. 2006. A comparison of bloat control methods for genetic programming. Evolutionary Computation 14, 3 (2006), 309--344.Google ScholarDigital Library
- Yi Mei, Qi Chen, Andrew Lensen, Bing Xue, and Mengjie Zhang. 2022. Explainable Artificial Intelligence by Genetic Programming: A Survey. IEEE Transactions on Evolutionary Computation (2022).Google Scholar
- Kaustuv Nag and Nikhil R Pal. 2019. Feature extraction and selection for parsimonious classifiers with multiobjective genetic programming. IEEE Transactions on Evolutionary Computation 24, 3 (2019), 454--466.Google Scholar
- Quang Uy Nguyen and Thi Huong Chu. 2020. Semantic approximation for reducing code bloat in genetic programming. Swarm and Evolutionary Computation 58 (2020), 100729.Google ScholarCross Ref
- Ji Ni, Russ H Drieberg, and Peter I Rockett. 2012. The use of an analytic quotient operator in genetic programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 146--152.Google ScholarDigital Library
- Kyle Nickerson, Antonina Kolokolova, and Ting Hu. 2022. Creating Diverse Ensembles for Classification with Genetic Programming and Neuro-MAP-Elites. In European Conference on Genetic Programming (Part of EvoStar). Springer, 212--227.Google Scholar
- Randal S Olson, William La Cava, Patryk Orzechowski, Ryan J Urbanowicz, and Jason H Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining 10, 1 (2017), 1--13.Google Scholar
- Caitlin A Owen, Grant Dick, and Peter A Whigham. 2022. Standardisation and Data Augmentation in Genetic Programming. IEEE Transactions on Evolutionary Computation 26, 6 (2022), 1596--1608.Google ScholarCross Ref
- Michael Defoin Platel, Manuel Clergue, and Philippe Collard. 2003. Maximum homologous crossover for linear genetic programming. In European Conference on Genetic Programming. Springer, 194--203.Google Scholar
- Riccardo Poli. 2003. A simple but theoretically-motivated method to control bloat in genetic programming. In European Conference on Genetic Programming. Springer, 204--217.Google ScholarCross Ref
- Riccardo Poli and Nicholas Freitag McPhee. 2003. General schema theory for genetic programming with subtree-swapping crossover: Part II. Evolutionary Computation 11, 2 (2003), 169--206.Google ScholarDigital Library
- Sara Silva and Ernesto Costa. 2009. Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10, 2 (2009), 141--179.Google ScholarDigital Library
- Terence Soule and James A Foster. 1998. Removal bias: a new cause of code growth in tree based evolutionary programming. In 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE, 781--786.Google ScholarCross Ref
- Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The push3 execution stack and the evolution of control. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation. 1689--1696.Google ScholarDigital Library
- Walter Alden Tackett. 1994. Recombination, selection, and the genetic construction of computer programs. Ph. D. Dissertation. University of Southern California Los Angeles.Google Scholar
- Binh Tran, Bing Xue, and Mengjie Zhang. 2019. Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recognition 93 (2019), 404--417.Google ScholarDigital Library
- Shaolin Wang, Yi Mei, and Mengjie Zhang. 2022. A Multi-Objective Genetic Programming Algorithm with α dominance and Archive for Uncertain Capacitated Arc Routing Problem. IEEE Transactions on Evolutionary Computation (2022). Google ScholarDigital Library
- Huayang Xie and Mengjie Zhang. 2012. Parent selection pressure auto-tuning for tournament selection in genetic programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 1--19.Google ScholarDigital Library
- Byoung-Tak Zhang and Heinz Mühlenbein. 1995. Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3, 1 (1995), 17--38.Google ScholarDigital Library
- Hu Zhang, Hengzhe Zhang, and Aimin Zhou. 2020. A Multi-metric Selection Strategy for Evolutionary Symbolic Regression. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 585--591.Google ScholarDigital Library
- Hengzhe Zhang, Aimin Zhou, and Xin Lin. 2020. Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis. Complex & Intelligent Systems 6, 3 (2020), 741--753.Google ScholarCross Ref
- Hengzhe Zhang, Aimin Zhou, and Hu Zhang. 2022. An Evolutionary Forest for Regression. IEEE Transactions on Evolutionary Computation 26, 4 (2022), 735--749.Google ScholarCross Ref
- Yang Zhang and Peter Rockett. 2007. A Comparison of three evolutionary strategies for multiobjective genetic programming. Artificial Intelligence Review 27, 2 (2007), 149--163.Google ScholarDigital Library
Index Terms
- A Double Lexicase Selection Operator for Bloat Control in Evolutionary Feature Construction for Regression
Recommendations
Automatically Choosing Selection Operator Based on Semantic Information in Evolutionary Feature Construction
PRICAI 2023: Trends in Artificial IntelligenceAbstractIn recent years, genetic programming-based evolutionary feature construction has shown great potential in various applications. However, a critical challenge in applying this technique is the need to select an appropriate selection operator with ...
Bloat control in genetic programming with a histogram-based accept-reject method
GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computationRecent bloat control methods such as dynamic depth limit (DynLimit) and Dynamic Operator Equalization (DynOpEq) aim at modifying the tree size distribution in a population of genetic programs. Although they are quite efficient for that purpose, these ...
Studying bloat control and maintenance of effective code in linear genetic programming for symbolic regression
Linear Genetic Programming (LGP) is an Evolutionary Computation algorithm, inspired in the Genetic Programming (GP) algorithm. Instead of using the standard tree representation of GP, LGP evolves a linear program, which causes a graph-based data flow ...
Comments