skip to main content
10.1145/3583131.3590365acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

A Double Lexicase Selection Operator for Bloat Control in Evolutionary Feature Construction for Regression

Published:12 July 2023Publication History

ABSTRACT

Evolutionary feature construction is an important technique in the machine learning domain for enhancing learning performance. However, traditional genetic programming-based feature construction methods often suffer from bloat, which means the sizes of constructed features increase excessively without improved performance. To address this issue, this paper proposes a double-stage lexicase selection operator to control bloat while not damaging search effectiveness. This new operator contains a two-stage selection process, where the first stage selects individuals based on fitness values and the second stage selects individuals based on tree sizes. Therefore, the proposed operator can control bloat meanwhile leveraging the advantage of the lexicase selection operator. Experimental results on 98 regression datasets show that compared to the traditional bloat control method of having a depth limit, the proposed selection operator not only significantly reduces the sizes of constructed features on all datasets but also keeps a similar level of predictive performance. A comparative experiment with seven bloat control methods shows that the double lexicase selection operator achieves the best trade-off between the model performance and the model size.

Skip Supplemental Material Section

Supplemental Material

References

  1. Eva Alfaro-Cid, Anna Esparcia-Alcázar, Ken Sharman, and Francisco Fernández de Vega. 2008. Prune and plant: a new bloat control method for genetic programming. In 2008 Eighth International Conference on Hybrid Intelligent Systems. IEEE, 31--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Eva Alfaro-Cid, JJ Merelo, F Fernández de Vega, Anna Isabel Esparcia-Alcázar, and Ken Sharman. 2010. Bloat control operators and diversity in genetic programming: A comparative study. Evolutionary Computation 18, 2 (2010), 305--332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Wolfgang Banzhaf, Peter Nordin, Robert E Keller, and Frank D Francone. 1998. Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc.Google ScholarGoogle Scholar
  4. Ying Bi, Bing Xue, and Mengjie Zhang. 2022. Genetic Programming-Based Evolutionary Deep Learning for Data-Efficient Image Classification. IEEE Transactions on Evolutionary Computation (2022). Google ScholarGoogle ScholarCross RefCross Ref
  5. Markus Brameier, Wolfgang Banzhaf, and Wolfgang Banzhaf. 2007. Linear genetic programming. Vol. 1. Springer.Google ScholarGoogle Scholar
  6. Qi Chen, Bing Xue, and Mengjie Zhang. 2020. Preserving Population Diversity Based on Transformed Semantics in Genetic Programming for Symbolic Regression. IEEE Transactions on Evolutionary Computation 25, 3 (2020), 433--447.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Qi Chen, Bing Xue, and Mengjie Zhang. 2022. Rademacher Complexity for Enhancing the Generalization of Genetic Programming for Symbolic Regression. IEEE Transactions on Cybernetics 52, 4 (2022), 2382--2395.Google ScholarGoogle ScholarCross RefCross Ref
  8. Thi Huong Chu, Quang Uy Nguyen, and Michael O'Neill. 2018. Semantic tournament selection for genetic programming based on statistical analysis of error vectors. Information Sciences 436 (2018), 352--366.Google ScholarGoogle ScholarCross RefCross Ref
  9. Allan de Lima, Samuel Carvalho, Douglas Mota Dias, Enrique Naredo, Joseph P Sullivan, and Conor Ryan. 2022. Lexi2: lexicase selection with lexicographic parsimony pressure. In Proceedings of the Genetic and Evolutionary Computation Conference. 929--937.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stephen Dignum and Riccardo Poli. 2007. Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. 1588--1595.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David Kinzett, Mark Johnston, and Mengjie Zhang. 2009. Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evolutionary Intelligence 2, 4 (2009), 151--168.Google ScholarGoogle ScholarCross RefCross Ref
  12. John R Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4, 2 (1994), 87--112.Google ScholarGoogle Scholar
  13. William La Cava, Thomas Helmuth, Lee Spector, and Jason H Moore. 2019. A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection. Evolutionary Computation 27, 3 (2019), 377--402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. William La Cava, Tilak Raj Singh, James Taggart, Srinivas Suri, and Jason H Moore. 2018. Learning concise representations for regression by evolving networks of trees. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  15. William B Langdon. 2000. Size fair and homologous tree genetic programming crossovers. Genetic Programming and Evolvable Machines 1, 1/2 (2000), 95--119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. William B Langdon and Riccardo Poli. 1998. Fitness causes bloat. In Soft Computing in Engineering Design and Manufacturing. Springer, 13--22.Google ScholarGoogle Scholar
  17. Dazhuang Liu, Marco Virgolin, Tanja Alderliesten, and Peter AN Bosman. 2022. Evolvability Degeneration in Multi-Objective Genetic Programming for Symbolic Regression. arXiv preprint arXiv:2202.06983 (2022).Google ScholarGoogle Scholar
  18. Sean Luke and Liviu Panait. 2002. Fighting bloat with nonparametric parsimony pressure. In International Conference on Parallel Problem Solving from Nature. Springer, 411--421.Google ScholarGoogle ScholarCross RefCross Ref
  19. Sean Luke and Liviu Panait. 2006. A comparison of bloat control methods for genetic programming. Evolutionary Computation 14, 3 (2006), 309--344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yi Mei, Qi Chen, Andrew Lensen, Bing Xue, and Mengjie Zhang. 2022. Explainable Artificial Intelligence by Genetic Programming: A Survey. IEEE Transactions on Evolutionary Computation (2022).Google ScholarGoogle Scholar
  21. Kaustuv Nag and Nikhil R Pal. 2019. Feature extraction and selection for parsimonious classifiers with multiobjective genetic programming. IEEE Transactions on Evolutionary Computation 24, 3 (2019), 454--466.Google ScholarGoogle Scholar
  22. Quang Uy Nguyen and Thi Huong Chu. 2020. Semantic approximation for reducing code bloat in genetic programming. Swarm and Evolutionary Computation 58 (2020), 100729.Google ScholarGoogle ScholarCross RefCross Ref
  23. Ji Ni, Russ H Drieberg, and Peter I Rockett. 2012. The use of an analytic quotient operator in genetic programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 146--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kyle Nickerson, Antonina Kolokolova, and Ting Hu. 2022. Creating Diverse Ensembles for Classification with Genetic Programming and Neuro-MAP-Elites. In European Conference on Genetic Programming (Part of EvoStar). Springer, 212--227.Google ScholarGoogle Scholar
  25. Randal S Olson, William La Cava, Patryk Orzechowski, Ryan J Urbanowicz, and Jason H Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining 10, 1 (2017), 1--13.Google ScholarGoogle Scholar
  26. Caitlin A Owen, Grant Dick, and Peter A Whigham. 2022. Standardisation and Data Augmentation in Genetic Programming. IEEE Transactions on Evolutionary Computation 26, 6 (2022), 1596--1608.Google ScholarGoogle ScholarCross RefCross Ref
  27. Michael Defoin Platel, Manuel Clergue, and Philippe Collard. 2003. Maximum homologous crossover for linear genetic programming. In European Conference on Genetic Programming. Springer, 194--203.Google ScholarGoogle Scholar
  28. Riccardo Poli. 2003. A simple but theoretically-motivated method to control bloat in genetic programming. In European Conference on Genetic Programming. Springer, 204--217.Google ScholarGoogle ScholarCross RefCross Ref
  29. Riccardo Poli and Nicholas Freitag McPhee. 2003. General schema theory for genetic programming with subtree-swapping crossover: Part II. Evolutionary Computation 11, 2 (2003), 169--206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sara Silva and Ernesto Costa. 2009. Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10, 2 (2009), 141--179.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Terence Soule and James A Foster. 1998. Removal bias: a new cause of code growth in tree based evolutionary programming. In 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE, 781--786.Google ScholarGoogle ScholarCross RefCross Ref
  32. Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The push3 execution stack and the evolution of control. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation. 1689--1696.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Walter Alden Tackett. 1994. Recombination, selection, and the genetic construction of computer programs. Ph. D. Dissertation. University of Southern California Los Angeles.Google ScholarGoogle Scholar
  34. Binh Tran, Bing Xue, and Mengjie Zhang. 2019. Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recognition 93 (2019), 404--417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shaolin Wang, Yi Mei, and Mengjie Zhang. 2022. A Multi-Objective Genetic Programming Algorithm with α dominance and Archive for Uncertain Capacitated Arc Routing Problem. IEEE Transactions on Evolutionary Computation (2022). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Huayang Xie and Mengjie Zhang. 2012. Parent selection pressure auto-tuning for tournament selection in genetic programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 1--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Byoung-Tak Zhang and Heinz Mühlenbein. 1995. Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3, 1 (1995), 17--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hu Zhang, Hengzhe Zhang, and Aimin Zhou. 2020. A Multi-metric Selection Strategy for Evolutionary Symbolic Regression. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 585--591.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hengzhe Zhang, Aimin Zhou, and Xin Lin. 2020. Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis. Complex & Intelligent Systems 6, 3 (2020), 741--753.Google ScholarGoogle ScholarCross RefCross Ref
  40. Hengzhe Zhang, Aimin Zhou, and Hu Zhang. 2022. An Evolutionary Forest for Regression. IEEE Transactions on Evolutionary Computation 26, 4 (2022), 735--749.Google ScholarGoogle ScholarCross RefCross Ref
  41. Yang Zhang and Peter Rockett. 2007. A Comparison of three evolutionary strategies for multiobjective genetic programming. Artificial Intelligence Review 27, 2 (2007), 149--163.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Double Lexicase Selection Operator for Bloat Control in Evolutionary Feature Construction for Regression

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
      July 2023
      1667 pages
      ISBN:9798400701191
      DOI:10.1145/3583131

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)83
      • Downloads (Last 6 weeks)10

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader