Abstract
Evolutionary feature construction has been successfully applied to various scenarios. In particular, multi-tree genetic programming-based feature construction methods have demonstrated promising results. However, existing crossover operators in multi-tree genetic programming mainly focus on exchanging genetic materials between two trees, neglecting the interaction between multi-trees within an individual. To increase search effectiveness, we take inspiration from the geometric semantic crossover operator used in single-tree genetic programming and propose a macro geometric semantic crossover operator for multi-tree genetic programming. This operator is designed for feature construction, with the goal of generating offspring containing informative and complementary features. Our experiments on 98 regression datasets show that the proposed geometric semantic macro-crossover operator significantly improves the predictive performance of the constructed features. Moreover, experiments conducted on a state-of-the-art regression benchmark demonstrate that multi-tree genetic programming with the geometric semantic macro-crossover operator can significantly outperform all 22 machine learning algorithms on the benchmark.
Similar content being viewed by others
Notes
Source code: https://tinyurl.com/MAPMX-GPFC
References
H. Zhang, A. Zhou, H. Zhang, An evolutionary forest for regression. IEEE Trans. Evol. Comput. 26(4), 735–749 (2022)
B. Tran, B. Xue, M. Zhang, Genetic programming for multiple-feature construction on high-dimensional classification. Pattern Recogn. 93, 404–417 (2019)
A. Lensen, B. Xue, M. Zhang, Genetic programming for evolving similarity functions for clustering: Representations and analysis. Evol. Comput. 28(4), 531–561 (2020)
A. Lensen, M. Zhang, B. Xue, Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genet. Program. Evolvable Mach. 21(3), 399–431 (2020)
W. La Cava, J.H. Moore, Learning feature spaces for regression with genetic programming. Genet. Program. Evolvable Mach. 21, 433–467 (2020)
J.R. Koza, Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4(2), 87–112 (1994)
H. Zhang, A. Zhou, H. Qian, H. Zhang, PS-Tree: a piecewise symbolic regression tree. Swarm Evol. Comput. 71, 101061 (2022)
L. Vanneschi, M. Castelli, S. Silva, A survey of semantic methods in genetic programming. Genet. Program Evolvable Mach. 15, 195–214 (2014)
A. Moraglio, K. Krawiec, C.G. Johnson, Geometric semantic genetic programming. In: International Conference on Parallel Problem Solving from Nature. pp. 21–31. Springer (2012)
L. Vanneschi, M. Castelli, L. Manzoni, S. Silva, A new implementation of geometric semantic GP and its application to problems in pharmacokinetics. In: Genetic Programming: 16th European Conference, EuroGP 2013, Vienna, Austria, April 3-5, 2013. Proceedings 16. pp. 205–216. Springer (2013)
M. Castelli, S. Silva, L. Vanneschi, A c++ framework for geometric semantic genetic programming. Genet. Program. Evolvable Mach. 16, 73–81 (2015)
J.F.B. Martins, L.O.V. Oliveira, L.F. Miranda, F. Casadei, G.L. Pappa, Solving the exponential growth of symbolic regression trees in geometric semantic genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference. pp. 1151–1158 (2018)
K. Krawiec, T. Pawlak, Approximating geometric crossover by semantic backpropagation. In: Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. pp. 941–948 (2013)
K. Krawiec, T. Pawlak, Locally geometric semantic crossover: a study on the roles of semantics and homology in recombination operators. Genet. Program. Evolvable Mach. 14, 31–63 (2013)
T.P. Pawlak, B. Wieloch, K. Krawiec, Semantic backpropagation for designing search operators in genetic programming. IEEE Trans. Evol. Comput. 19(3), 326–340 (2014)
Q. Chen, B. Xue, M. Zhang, Improving generalization of genetic programming for symbolic regression with angle-driven geometric semantic operators. IEEE Trans. Evol. Comput. 23(3), 488–502 (2018)
T.P. Pawlak, B. Wieloch, K. Krawiec, Review and comparative analysis of geometric semantic crossovers. Genet. Program. Evolvable Mach. 16, 351–386 (2015)
Q.U. Nguyen, T.A. Pham, X.H. Nguyen, J. McDermott, Subtree semantic geometric crossover for genetic programming. Genet. Program. Evolvable Mach. 17, 25–53 (2016)
M. Castelli, L. Manzoni, L. Vanneschi, S. Silva, A. Popovič, Self-tuning geometric semantic genetic programming. Genet. Program. Evolvable Mach. 17, 55–74 (2016)
M. Castelli, L. Vanneschi, L. Manzoni, A. Popovič, Semantic genetic programming for fast and accurate data knowledge discovery. Swarm Evol. Comput. 26, 1–7 (2016)
I. Bakurov, M. Castelli, F. Fontanella, A.S. di Freca, L. Vanneschi, A novel binary classification approach based on geometric semantic genetic programming. Swarm Evol. Comput. 69, 101028 (2022)
W. La Cava, T.R. Singh, J. Taggart, S. Suri, J.H. Moore, Learning concise representations for regression by evolving networks of trees. In: International Conference on Learning Representations (2018)
L. Muñoz, L. Trujillo, S. Silva, M. Castelli, L. Vanneschi, Evolving multidimensional transformations for symbolic regression with M3GP. Memetic Comput. 11, 111–126 (2019)
B. Al-Helali, Q. Chen, B. Xue, M. Zhang, Multitree genetic programming with new operators for transfer learning in symbolic regression with incomplete data. IEEE Trans. Evol. Comput. 25(6), 1049–1063 (2021)
S. Nguyen, D. Thiruvady, M. Zhang, D. Alahakoon, Automated design of multipass heuristics for resource-constrained job scheduling with self-competitive genetic programming. IEEE Trans. Cybern. 52(9), 8603–8616 (2021)
K. Krawiec, Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet. Program. Evolvable Mach. 3, 329–343 (2002)
K. Neshatian, M. Zhang, P. Andreae, A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)
K. Nag, N.R. Pal, Feature extraction and selection for parsimonious classifiers with multiobjective genetic programming. IEEE Trans. Evol. Comput. 24(3), 454–466 (2019)
M. Muharram, G.D. Smith, Evolutionary constructive induction. IEEE Trans. Knowl. Data Eng. 17(11), 1518–1528 (2005)
I. Arnaldo, U.M. O’Reilly, K. Veeramachaneni, Building predictive models via feature synthesis. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. pp. 983–990 (2015)
J. Ma, X. Gao, A filter-based feature construction and feature selection approach for classification using genetic programming. Knowl.-Based Syst. 196, 105806 (2020)
Y. Bi, B. Xue, M. Zhang, Genetic programming with a new representation to automatically learn features and evolve ensembles for image classification. IEEE Trans. Cybern. 51(4), 1769–1783 (2020)
H. Zhang, A. Zhou, Q. Chen, B. Xue, M. Zhang, SR-Forest: a genetic programming based heterogeneous ensemble learning method. IEEE Trans. Evol. Comput. https://doi.org/10.1109/TEVC.2023.3243172 (2023)
Q. Chen, M. Zhang, B. Xue, Genetic programming with embedded feature construction for high-dimensional symbolic regression. In: Intelligent and Evolutionary Systems: The 20th Asia Pacific Symposium, IES 2016, Canberra, Australia, November 2016, Proceedings. pp. 87–102. Springer (2017)
W. La Cava, L. Spector, K. Danai, Epsilon-lexicase selection for regression. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016. pp. 741–748 (2016)
W. La Cava, T. Helmuth, L. Spector, J.H. Moore, A probabilistic and multi-objective analysis of lexicase selection and \(\varepsilon\)-lexicase selection. Evol. Comput. 27(3), 377–402 (2019)
J.B. Mouret, J. Clune, Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015)
A. Cully, J. Clune, D. Tarapore, J.B. Mouret, Robots that can adapt like animals. Nature 521(7553), 503–507 (2015)
H. Zhang, Q. Chen, A. Tonda, B. Xue, W. Banzhaf, M. Zhang, MAP-Elites with cosine-similarity for evolutionary ensemble learning. In: Genetic Programming: 26th European Conference, EuroGP 2023, Held as Part of EvoStar 2023, Brno, Czech Republic, April 12–14, 2023, Proceedings. pp. 84–100. Springer (2023)
J.P. Aumasson, D.J. Bernstein, Siphash: a fast short-input prf. In: Progress in Cryptology-INDOCRYPT 2012: 13th International Conference on Cryptology in India, Kolkata, India, December 9-12, 2012. Proceedings 13. pp. 489–508. Springer (2012)
J.D. Romano, T.T. Le, W. La Cava, J.T. Gregg, D.J. Goldberg, P. Chakraborty, N.L. Ray, D. Himmelstein, W. Fu, J.H. Moore, PMLB v1.0: an open-source dataset collection for benchmarking machine learning methods. Bioinformatics 38(3), 878–880 (2022)
J. Ni, R.H. Drieberg, P.I. Rockett, The use of an analytic quotient operator in genetic programming. IEEE Trans. Evol. Comput. 17(1), 146–152 (2012)
N.F. McPhee, M.K. Dramdahl, D. Donatucci, Impact of crossover bias in genetic programming. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. pp. 1079–1086 (2015)
F. Ramsey, D. Schafer, The statistical sleuth: a course in methods of data analysis. Cengage Learning (2012)
Q.U. Nguyen, T.H. Chu, Semantic approximation for reducing code bloat in genetic programming. Swarm Evol. Comput. 58, 100729 (2020)
Acknowledgements
The authors would like to acknowledge the assistance of the volunteer evaluators and the helpful comments of the reviewers, which have significantly improved the paper.
Funding
This work was supported in part by the Marsden Fund of New Zealand Government under Contracts VUW1913, VUW1914, VUW2016, MBIE Data Science SSIF Fund under the contract RTVU1914, Huayin Medical under grant E3791/4165, and MBIE Endeavor Research Programme under contracts C11X2001 and UOCX2104.
Author information
Authors and Affiliations
Contributions
Hengzhe Zhang, Qi Chen, and Mengjie Zhang designed the algorithm and experimental protocol. Hengzhe Zhang implemented the code and conducted the experiments. All authors analyzed the results. Hengzhe Zhang drafted the paper, and all authors edited the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors are not aware of any competing interests.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, H., Chen, Q., Xue, B. et al. A geometric semantic macro-crossover operator for evolutionary feature construction in regression. Genet Program Evolvable Mach 25, 2 (2024). https://doi.org/10.1007/s10710-023-09465-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10710-023-09465-z