Abstract
An approach to evolutionary ensemble learning for classification is proposed using genetic programming in which boosting is used to construct a stack of programs. Each application of boosting identifies a single champion and a residual dataset, i.e. the training records that thus far were not correctly classified. The next program is only trained against the residual, with the process iterating until some maximum ensemble size or no further residual remains. Training against a residual dataset actively reduces the cost of training. Deploying the ensemble as a stack also means that only one classifier might be necessary to make a prediction, so improving interpretability. Benchmarking studies are conducted to illustrate competitiveness with the prediction accuracy of current state-of-the-art evolutionary ensemble learning algorithms, while providing solutions that are orders of magnitude simpler. Further benchmarking with a high cardinality dataset indicates that the proposed method is also more accurate and efficient than XGBoost.
Supported by 2Keys Corporation - An Interac Company.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Classification tasks often assume the majority vote, although voting/weighting schemes might be evolved [3].
- 2.
- 3.
Laptop with Intel i7 10700k CPU, 4.3 GHz single core.
- 4.
2SEGP parameterization: pop. size 500, ensemble size 50, max. tree size 500.
References
Agapitos, A., Loughran, R., Nicolau, M., Lucas, S.M., O’Neill, M., Brabazon, A.: A survey of statistical machine learning elements in genetic programming. IEEE Trans. Evol. Comput. 23(6), 1029–1048 (2019)
Badran, K.M.S., Rockett, P.I.: Multi-class pattern classification using single, multi-dimensional feature-space feature extraction evolved by multi-objective genetic programming and its application to network intrusion detection. Genet. Program Evolvable Mach. 13(1), 33–63 (2012)
Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program Evolvable Mach. 2(4), 381–407 (2001)
Cava, W.G.L., Silva, S., Danai, K., Spector, L., Vanneschi, L., Moore, J.H.: Multidimensional genetic programming for multiclass classification. Swarm Evol. Comput. 44, 260–272 (2019)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Curry, R., Lichodzijewski, P., Heywood, M.I.: Scaling genetic programming to large datasets using hierarchical dynamic subset selection. IEEE Trans. Syst. Man, Cybern. - Part B 37(4), 1065–1073 (2007)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)
Fahlman, S.E., Lebiere, C.: The cascade-correlation learning architecture. In: Advances in Neural Information Processing Systems, vol. 2, pp. 524–532. Morgan Kaufmann (1989)
Folino, G., Pizzuti, C., Spezzano, G.: Training distributed GP ensemble with a selective algorithm based on clustering and pruning for pattern classification. IEEE Trans. Evol. Comput. 12(4), 458–468 (2008)
García, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014)
Gathercole, C., Ross, P.: Dynamic training subset selection for supervised learning in genetic programming. In: Davidor, Y., Schwefel, H.-P., Männer, R. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58484-6_275
Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1053–1060. Morgan Kaufmann (1999)
Imamura, K., Soule, T., Heckendorn, R.B., Foster, J.A.: Behavioral diversity and a probabilistically optimal GP ensemble. Genet. Program Evolvable Mach. 4(3), 235–253 (2003)
Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 363–370. ACM (2008)
Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 853–860. ACM (2010)
McIntyre, A.R., Heywood, M.I.: Classification as clustering: a pareto cooperative-competitive GP approach. Evol. Comput. 19(1), 137–166 (2011)
Muni, D.P., Pal, N.R., Das, J.: A novel approach to design classifiers using genetic programming. IEEE Trans. Evol. Comput. 8(2), 183–196 (2004)
Muñoz, L., Silva, S., Trujillo, L.: M3GP – multiclass classification with GP. In: Machado, P., et al. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 78–91. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_7
Potter, M.A., Jong, K.A.D.: Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evol. Comput. 8(1), 1–29 (2000)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)
Rodrigues, N.M., Batista, J.E., Silva, S.: Ensemble genetic programming. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds.) EuroGP 2020. LNCS, vol. 12101, pp. 151–166. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44094-7_10
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)
Sipper, M., Moore, J.H.: Symbolic-regression boosting. CoRR abs/2206.12082 (2022)
Song, D., Heywood, M.I., Zincir-Heywood, A.N.: Training genetic programming on half a million patterns: an example from anomaly detection. IEEE Trans. Evol. Comput. 9(3), 225–239 (2005)
Soule, T.: Voting teams: a cooperative approach to non-typical problems using genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 916–922. Morgan Kaufmann (1999)
Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1708–1715. ACM (2007)
Virgolin, M.: Genetic programming is naturally suited to evolve bagging ensembles. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 830–839. ACM (2021)
Wang, S., Mei, Y., Zhang, M.: Novel ensemble genetic programming hyper-heuristics for uncertain capacitated arc routing problem. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1093–1101. ACM (2019)
Wu, S.X., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1403–1410. ACM (2011)
Acknowledgements
This research was enabled by the support of the Natural Science and Engineering Research Council (NSERC) of Canada Alliance Grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, Z. et al. (2023). A Boosting Approach to Constructing an Ensemble Stack. In: Pappa, G., Giacobini, M., Vasicek, Z. (eds) Genetic Programming. EuroGP 2023. Lecture Notes in Computer Science, vol 13986. Springer, Cham. https://doi.org/10.1007/978-3-031-29573-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-29573-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29572-0
Online ISBN: 978-3-031-29573-7
eBook Packages: Computer ScienceComputer Science (R0)