research-article

A Double Lexicase Selection Operator for Bloat Control in Evolutionary Feature Construction for Regression

Authors:
Hengzhe Zhang

Victoria University of Wellington, Wellington, New Zealand

Victoria University of Wellington, Wellington, New Zealand

https://orcid.org/0000-0002-2254-8304
View Profile

,
Qi Chen

Victoria University of Wellington, Wellington, New Zealand

Victoria University of Wellington, Wellington, New Zealand

https://orcid.org/0000-0001-9367-4757
View Profile

,
Bing Xue

Victoria University of Wellington, Wellington, New Zealand

Victoria University of Wellington, Wellington, New Zealand

https://orcid.org/0000-0002-4865-8026
View Profile

,
Wolfgang Banzhaf

Michigan State University, East Lansing, United States of America

Michigan State University, East Lansing, United States of America

https://orcid.org/0000-0002-6382-3245
View Profile

,
Mengjie Zhang

Victoria University of Wellington, Wellington, New Zealand

Victoria University of Wellington, Wellington, New Zealand

https://orcid.org/0000-0003-4463-9538
View Profile

GECCO '23: Proceedings of the Genetic and Evolutionary Computation ConferenceJuly 2023Pages 1194–1202https://doi.org/10.1145/3583131.3590365

Published:12 July 2023Publication History

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 1194–1202

ABSTRACT

Evolutionary feature construction is an important technique in the machine learning domain for enhancing learning performance. However, traditional genetic programming-based feature construction methods often suffer from bloat, which means the sizes of constructed features increase excessively without improved performance. To address this issue, this paper proposes a double-stage lexicase selection operator to control bloat while not damaging search effectiveness. This new operator contains a two-stage selection process, where the first stage selects individuals based on fitness values and the second stage selects individuals based on tree sizes. Therefore, the proposed operator can control bloat meanwhile leveraging the advantage of the lexicase selection operator. Experimental results on 98 regression datasets show that compared to the traditional bloat control method of having a depth limit, the proposed selection operator not only significantly reduces the sizes of constructed features on all datasets but also keeps a similar level of predictive performance. A comparative experiment with seven bloat control methods shows that the double lexicase selection operator achieves the best trade-off between the model performance and the model size.

Supplemental Material

Available for Download

pdf

p1194-zhang-suppl.pdf (318.3 KB)

Supplemental material.

References

Eva Alfaro-Cid, Anna Esparcia-Alcázar, Ken Sharman, and Francisco Fernández de Vega. 2008. Prune and plant: a new bloat control method for genetic programming. In 2008 Eighth International Conference on Hybrid Intelligent Systems. IEEE, 31--35.Google ScholarDigital Library
Eva Alfaro-Cid, JJ Merelo, F Fernández de Vega, Anna Isabel Esparcia-Alcázar, and Ken Sharman. 2010. Bloat control operators and diversity in genetic programming: A comparative study. Evolutionary Computation 18, 2 (2010), 305--332.Google ScholarDigital Library
Wolfgang Banzhaf, Peter Nordin, Robert E Keller, and Frank D Francone. 1998. Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc.Google Scholar
Ying Bi, Bing Xue, and Mengjie Zhang. 2022. Genetic Programming-Based Evolutionary Deep Learning for Data-Efficient Image Classification. IEEE Transactions on Evolutionary Computation (2022). Google ScholarCross Ref
Markus Brameier, Wolfgang Banzhaf, and Wolfgang Banzhaf. 2007. Linear genetic programming. Vol. 1. Springer.Google Scholar
Qi Chen, Bing Xue, and Mengjie Zhang. 2020. Preserving Population Diversity Based on Transformed Semantics in Genetic Programming for Symbolic Regression. IEEE Transactions on Evolutionary Computation 25, 3 (2020), 433--447.Google ScholarDigital Library
Qi Chen, Bing Xue, and Mengjie Zhang. 2022. Rademacher Complexity for Enhancing the Generalization of Genetic Programming for Symbolic Regression. IEEE Transactions on Cybernetics 52, 4 (2022), 2382--2395.Google ScholarCross Ref
Thi Huong Chu, Quang Uy Nguyen, and Michael O'Neill. 2018. Semantic tournament selection for genetic programming based on statistical analysis of error vectors. Information Sciences 436 (2018), 352--366.Google ScholarCross Ref
Allan de Lima, Samuel Carvalho, Douglas Mota Dias, Enrique Naredo, Joseph P Sullivan, and Conor Ryan. 2022. Lexi2: lexicase selection with lexicographic parsimony pressure. In Proceedings of the Genetic and Evolutionary Computation Conference. 929--937.Google ScholarDigital Library
Stephen Dignum and Riccardo Poli. 2007. Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. 1588--1595.Google ScholarDigital Library
David Kinzett, Mark Johnston, and Mengjie Zhang. 2009. Numerical simplification for bloat control and analysis of building blocks in genetic programming. Evolutionary Intelligence 2, 4 (2009), 151--168.Google ScholarCross Ref
John R Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4, 2 (1994), 87--112.Google Scholar
William La Cava, Thomas Helmuth, Lee Spector, and Jason H Moore. 2019. A probabilistic and multi-objective analysis of lexicase selection and ε-lexicase selection. Evolutionary Computation 27, 3 (2019), 377--402.Google ScholarDigital Library
William La Cava, Tilak Raj Singh, James Taggart, Srinivas Suri, and Jason H Moore. 2018. Learning concise representations for regression by evolving networks of trees. In International Conference on Learning Representations.Google Scholar
William B Langdon. 2000. Size fair and homologous tree genetic programming crossovers. Genetic Programming and Evolvable Machines 1, 1/2 (2000), 95--119.Google ScholarDigital Library
William B Langdon and Riccardo Poli. 1998. Fitness causes bloat. In Soft Computing in Engineering Design and Manufacturing. Springer, 13--22.Google Scholar
Dazhuang Liu, Marco Virgolin, Tanja Alderliesten, and Peter AN Bosman. 2022. Evolvability Degeneration in Multi-Objective Genetic Programming for Symbolic Regression. arXiv preprint arXiv:2202.06983 (2022).Google Scholar
Sean Luke and Liviu Panait. 2002. Fighting bloat with nonparametric parsimony pressure. In International Conference on Parallel Problem Solving from Nature. Springer, 411--421.Google ScholarCross Ref
Sean Luke and Liviu Panait. 2006. A comparison of bloat control methods for genetic programming. Evolutionary Computation 14, 3 (2006), 309--344.Google ScholarDigital Library
Yi Mei, Qi Chen, Andrew Lensen, Bing Xue, and Mengjie Zhang. 2022. Explainable Artificial Intelligence by Genetic Programming: A Survey. IEEE Transactions on Evolutionary Computation (2022).Google Scholar
Kaustuv Nag and Nikhil R Pal. 2019. Feature extraction and selection for parsimonious classifiers with multiobjective genetic programming. IEEE Transactions on Evolutionary Computation 24, 3 (2019), 454--466.Google Scholar
Quang Uy Nguyen and Thi Huong Chu. 2020. Semantic approximation for reducing code bloat in genetic programming. Swarm and Evolutionary Computation 58 (2020), 100729.Google ScholarCross Ref
Ji Ni, Russ H Drieberg, and Peter I Rockett. 2012. The use of an analytic quotient operator in genetic programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 146--152.Google ScholarDigital Library
Kyle Nickerson, Antonina Kolokolova, and Ting Hu. 2022. Creating Diverse Ensembles for Classification with Genetic Programming and Neuro-MAP-Elites. In European Conference on Genetic Programming (Part of EvoStar). Springer, 212--227.Google Scholar
Randal S Olson, William La Cava, Patryk Orzechowski, Ryan J Urbanowicz, and Jason H Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining 10, 1 (2017), 1--13.Google Scholar
Caitlin A Owen, Grant Dick, and Peter A Whigham. 2022. Standardisation and Data Augmentation in Genetic Programming. IEEE Transactions on Evolutionary Computation 26, 6 (2022), 1596--1608.Google ScholarCross Ref
Michael Defoin Platel, Manuel Clergue, and Philippe Collard. 2003. Maximum homologous crossover for linear genetic programming. In European Conference on Genetic Programming. Springer, 194--203.Google Scholar
Riccardo Poli. 2003. A simple but theoretically-motivated method to control bloat in genetic programming. In European Conference on Genetic Programming. Springer, 204--217.Google ScholarCross Ref
Riccardo Poli and Nicholas Freitag McPhee. 2003. General schema theory for genetic programming with subtree-swapping crossover: Part II. Evolutionary Computation 11, 2 (2003), 169--206.Google ScholarDigital Library
Sara Silva and Ernesto Costa. 2009. Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genetic Programming and Evolvable Machines 10, 2 (2009), 141--179.Google ScholarDigital Library
Terence Soule and James A Foster. 1998. Removal bias: a new cause of code growth in tree based evolutionary programming. In 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE, 781--786.Google ScholarCross Ref
Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The push3 execution stack and the evolution of control. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation. 1689--1696.Google ScholarDigital Library
Walter Alden Tackett. 1994. Recombination, selection, and the genetic construction of computer programs. Ph. D. Dissertation. University of Southern California Los Angeles.Google Scholar
Binh Tran, Bing Xue, and Mengjie Zhang. 2019. Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recognition 93 (2019), 404--417.Google ScholarDigital Library
Shaolin Wang, Yi Mei, and Mengjie Zhang. 2022. A Multi-Objective Genetic Programming Algorithm with α dominance and Archive for Uncertain Capacitated Arc Routing Problem. IEEE Transactions on Evolutionary Computation (2022). Google ScholarDigital Library
Huayang Xie and Mengjie Zhang. 2012. Parent selection pressure auto-tuning for tournament selection in genetic programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 1--19.Google ScholarDigital Library
Byoung-Tak Zhang and Heinz Mühlenbein. 1995. Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3, 1 (1995), 17--38.Google ScholarDigital Library
Hu Zhang, Hengzhe Zhang, and Aimin Zhou. 2020. A Multi-metric Selection Strategy for Evolutionary Symbolic Regression. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 585--591.Google ScholarDigital Library
Hengzhe Zhang, Aimin Zhou, and Xin Lin. 2020. Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis. Complex & Intelligent Systems 6, 3 (2020), 741--753.Google ScholarCross Ref
Hengzhe Zhang, Aimin Zhou, and Hu Zhang. 2022. An Evolutionary Forest for Regression. IEEE Transactions on Evolutionary Computation 26, 4 (2022), 735--749.Google ScholarCross Ref
Yang Zhang and Peter Rockett. 2007. A Comparison of three evolutionary strategies for multiobjective genetic programming. Artificial Intelligence Review 27, 2 (2007), 149--163.Google ScholarDigital Library

Index Terms

A Double Lexicase Selection Operator for Bloat Control in Evolutionary Feature Construction for Regression
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Bio-inspired approaches
        Genetic programming

Recommendations

Automatically Choosing Selection Operator Based on Semantic Information in Evolutionary Feature Construction
PRICAI 2023: Trends in Artificial Intelligence
Abstract
In recent years, genetic programming-based evolutionary feature construction has shown great potential in various applications. However, a critical challenge in applying this technique is the need to select an appropriate selection operator with ...
Read More
Bloat control in genetic programming with a histogram-based accept-reject method
GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computation

Recent bloat control methods such as dynamic depth limit (DynLimit) and Dynamic Operator Equalization (DynOpEq) aim at modifying the tree size distribution in a population of genetic programs. Although they are quite efficient for that purpose, these ...
Read More
Studying bloat control and maintenance of effective code in linear genetic programming for symbolic regression

Linear Genetic Programming (LGP) is an Evolutionary Computation algorithm, inspired in the Genetic Programming (GP) algorithm. Instead of using the standard tree representation of GP, LGP evolves a linear program, which causes a graph-based data flow ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference
July 2023
1667 pages
ISBN:9798400701191
DOI:10.1145/3583131
Chair:
Sara Silva,
Program Chair:
Luís Paquete
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 July 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evolutionary feature construction
genetic programming
bloat control
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,669of4,410submissions,38%
Upcoming Conference
GECCO '24

Sponsor:

sigevo

Genetic and Evolutionary Computation Conference

July 14 - 18, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Double Lexicase Selection Operator for Bloat Control in Evolutionary Feature Construction for Regression

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Automatically Choosing Selection Operator Based on Semantic Information in Evolutionary Feature Construction

Bloat control in genetic programming with a histogram-based accept-reject method

Studying bloat control and maintenance of effective code in linear genetic programming for symbolic regression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Double Lexicase Selection Operator for Bloat Control in Evolutionary Feature Construction for Regression

GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Automatically Choosing Selection Operator Based on Semantic Information in Evolutionary Feature Construction

Bloat control in genetic programming with a histogram-based accept-reject method

Studying bloat control and maintenance of effective code in linear genetic programming for symbolic regression

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media