Skip to main content

A Boosting Approach to Constructing an Ensemble Stack

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2023)

Abstract

An approach to evolutionary ensemble learning for classification is proposed using genetic programming in which boosting is used to construct a stack of programs. Each application of boosting identifies a single champion and a residual dataset, i.e. the training records that thus far were not correctly classified. The next program is only trained against the residual, with the process iterating until some maximum ensemble size or no further residual remains. Training against a residual dataset actively reduces the cost of training. Deploying the ensemble as a stack also means that only one classifier might be necessary to make a prediction, so improving interpretability. Benchmarking studies are conducted to illustrate competitiveness with the prediction accuracy of current state-of-the-art evolutionary ensemble learning algorithms, while providing solutions that are orders of magnitude simpler. Further benchmarking with a high cardinality dataset indicates that the proposed method is also more accurate and efficient than XGBoost.

Supported by 2Keys Corporation - An Interac Company.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Classification tasks often assume the majority vote, although voting/weighting schemes might be evolved [3].

  2. 2.

    https://archive-beta.ics.uci.edu.

  3. 3.

    Laptop with Intel i7 10700k CPU, 4.3 GHz single core.

  4. 4.

    2SEGP parameterization: pop. size 500, ensemble size 50, max. tree size 500.

References

  1. Agapitos, A., Loughran, R., Nicolau, M., Lucas, S.M., O’Neill, M., Brabazon, A.: A survey of statistical machine learning elements in genetic programming. IEEE Trans. Evol. Comput. 23(6), 1029–1048 (2019)

    Article  Google Scholar 

  2. Badran, K.M.S., Rockett, P.I.: Multi-class pattern classification using single, multi-dimensional feature-space feature extraction evolved by multi-objective genetic programming and its application to network intrusion detection. Genet. Program Evolvable Mach. 13(1), 33–63 (2012)

    Article  Google Scholar 

  3. Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program Evolvable Mach. 2(4), 381–407 (2001)

    Article  MATH  Google Scholar 

  4. Cava, W.G.L., Silva, S., Danai, K., Spector, L., Vanneschi, L., Moore, J.H.: Multidimensional genetic programming for multiclass classification. Swarm Evol. Comput. 44, 260–272 (2019)

    Article  Google Scholar 

  5. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)

    Google Scholar 

  6. Curry, R., Lichodzijewski, P., Heywood, M.I.: Scaling genetic programming to large datasets using hierarchical dynamic subset selection. IEEE Trans. Syst. Man, Cybern. - Part B 37(4), 1065–1073 (2007)

    Article  Google Scholar 

  7. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)

    Article  Google Scholar 

  8. Fahlman, S.E., Lebiere, C.: The cascade-correlation learning architecture. In: Advances in Neural Information Processing Systems, vol. 2, pp. 524–532. Morgan Kaufmann (1989)

    Google Scholar 

  9. Folino, G., Pizzuti, C., Spezzano, G.: Training distributed GP ensemble with a selective algorithm based on clustering and pruning for pattern classification. IEEE Trans. Evol. Comput. 12(4), 458–468 (2008)

    Article  Google Scholar 

  10. García, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014)

    Article  Google Scholar 

  11. Gathercole, C., Ross, P.: Dynamic training subset selection for supervised learning in genetic programming. In: Davidor, Y., Schwefel, H.-P., Männer, R. (eds.) PPSN 1994. LNCS, vol. 866, pp. 312–321. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-58484-6_275

    Chapter  Google Scholar 

  12. Iba, H.: Bagging, boosting, and bloating in genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1053–1060. Morgan Kaufmann (1999)

    Google Scholar 

  13. Imamura, K., Soule, T., Heckendorn, R.B., Foster, J.A.: Behavioral diversity and a probabilistically optimal GP ensemble. Genet. Program Evolvable Mach. 4(3), 235–253 (2003)

    Article  Google Scholar 

  14. Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 363–370. ACM (2008)

    Google Scholar 

  15. Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 853–860. ACM (2010)

    Google Scholar 

  16. McIntyre, A.R., Heywood, M.I.: Classification as clustering: a pareto cooperative-competitive GP approach. Evol. Comput. 19(1), 137–166 (2011)

    Article  Google Scholar 

  17. Muni, D.P., Pal, N.R., Das, J.: A novel approach to design classifiers using genetic programming. IEEE Trans. Evol. Comput. 8(2), 183–196 (2004)

    Article  Google Scholar 

  18. Muñoz, L., Silva, S., Trujillo, L.: M3GP – multiclass classification with GP. In: Machado, P., et al. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 78–91. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_7

    Chapter  Google Scholar 

  19. Potter, M.A., Jong, K.A.D.: Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evol. Comput. 8(1), 1–29 (2000)

    Article  Google Scholar 

  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)

    Google Scholar 

  21. Rodrigues, N.M., Batista, J.E., Silva, S.: Ensemble genetic programming. In: Hu, T., Lourenço, N., Medvet, E., Divina, F. (eds.) EuroGP 2020. LNCS, vol. 12101, pp. 151–166. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44094-7_10

    Chapter  Google Scholar 

  22. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)

    Article  Google Scholar 

  23. Sipper, M., Moore, J.H.: Symbolic-regression boosting. CoRR abs/2206.12082 (2022)

    Google Scholar 

  24. Song, D., Heywood, M.I., Zincir-Heywood, A.N.: Training genetic programming on half a million patterns: an example from anomaly detection. IEEE Trans. Evol. Comput. 9(3), 225–239 (2005)

    Article  Google Scholar 

  25. Soule, T.: Voting teams: a cooperative approach to non-typical problems using genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 916–922. Morgan Kaufmann (1999)

    Google Scholar 

  26. Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1708–1715. ACM (2007)

    Google Scholar 

  27. Virgolin, M.: Genetic programming is naturally suited to evolve bagging ensembles. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 830–839. ACM (2021)

    Google Scholar 

  28. Wang, S., Mei, Y., Zhang, M.: Novel ensemble genetic programming hyper-heuristics for uncertain capacitated arc routing problem. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1093–1101. ACM (2019)

    Google Scholar 

  29. Wu, S.X., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1403–1410. ACM (2011)

    Google Scholar 

Download references

Acknowledgements

This research was enabled by the support of the Natural Science and Engineering Research Council (NSERC) of Canada Alliance Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, Z. et al. (2023). A Boosting Approach to Constructing an Ensemble Stack. In: Pappa, G., Giacobini, M., Vasicek, Z. (eds) Genetic Programming. EuroGP 2023. Lecture Notes in Computer Science, vol 13986. Springer, Cham. https://doi.org/10.1007/978-3-031-29573-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29573-7_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29572-0

  • Online ISBN: 978-3-031-29573-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics