Skip to main content

Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical Evolution

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2020)

Abstract

The deployment of Machine Learning (ML) models is a difficult and time-consuming job that comprises a series of sequential and correlated tasks that go from the data pre-processing, and the design and extraction of features, to the choice of the ML algorithm and its parameterisation. The task is even more challenging considering that the design of features is in many cases problem specific, and thus requires domain-expertise. To overcome these limitations Automated Machine Learning (AutoML) methods seek to automate, with few or no human-intervention, the design of pipelines, i.e., automate the selection of the sequence of methods that have to be applied to the raw data. These methods have the potential to enable non-expert users to use ML, and provide expert users with solutions that they would unlikely consider. In particular, this paper describes AutoML-DSGE – a novel grammar-based framework that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of Scikit-Learn classification pipelines. The experimental results include comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient Classification Pipeline Evolution (RECIPE), and show that the average performance of the classification pipelines generated by AutoML-DSGE is always superior to the average performance of RECIPE; the differences are statistically significant in 3 out of the 10 used datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  2. Lourenço, N., Assunção, F., Pereira, F.B., Costa, E., Machado, P.: Structured grammatical evolution: a dynamic approach. In: Ryan, C., O’Neill, M., Collins, J.J. (eds.) Handbook of Grammatical Evolution, pp. 137–161. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78717-6_6

    Chapter  Google Scholar 

  3. Jiménez, Á.B., Lázaro, J.L., Dorronsoro, J.R.: Finding optimal model parameters by deterministic and annealed focused grid search. Neurocomputing 72(13–15), 2824–2832 (2009)

    Article  Google Scholar 

  4. Young, S.R., Rose, D.C., Karnowski, T.P., Lim, S., Patton, R.M.: Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: MLHPC@SC, pp. 4:1–4:5. ACM (2015)

    Google Scholar 

  5. Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: NIPS, pp. 2546–2554 (2011)

    Google Scholar 

  6. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  7. Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)

    Article  Google Scholar 

  8. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: NIPS, pp. 2960–2968 (2012)

    Google Scholar 

  9. Bergstra, J., Yamins, D., Cox, D.D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: ICML (1). JMLR Workshop and Conference Proceedings, vol. 28, pp. 115–123. JMLR.org (2013)

    Google Scholar 

  10. Chunhong, Z., Licheng, J.: Automatic parameters selection for SVM based on GA. In: Fifth World Congress on Intelligent Control and Automation, WCICA 2004, vol. 2, pp. 1869–1872. IEEE (2004)

    Google Scholar 

  11. Friedrichs, F., Igel, C.: Evolutionary tuning of multiple SVM parameters. Neurocomputing 64, 107–117 (2005)

    Article  Google Scholar 

  12. Guyon, I., et al.: A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention. In: AutoML@ICML. JMLR Workshop and Conference Proceedings, vol. 64, pp. 21–30. JMLR.org (2016)

    Google Scholar 

  13. Guyon, I., et al.: Design of the 2015 ChaLearn AutoML challenge. In: IJCNN, pp. 1–8. IEEE (2015)

    Google Scholar 

  14. Frank, E., Hall, M.A., Holmes, G., Kirkby, R., Pfahringer, B.: WEKA - a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 1305–1314. Springer, Cham (2005). https://doi.org/10.1007/0-387-25465-X_62

    Chapter  Google Scholar 

  15. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: KDD, pp. 847–855. ACM (2013)

    Google Scholar 

  16. Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18, 25:1–25:5 (2017). http://jmlr.org/papers/v18/16-261.html

  17. Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: GECCO, pp. 485–492. ACM (2016)

    Google Scholar 

  18. Komer, B., Bergstra, J., Eliasmith, C.: Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML (2014)

    Google Scholar 

  19. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: NIPS, pp. 2962–2970 (2015)

    Google Scholar 

  20. de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16

    Chapter  Google Scholar 

  21. Lourenço, N., Pereira, F.B., Costa, E.: Unveiling the properties of structured grammatical evolution. Genet. Program. Evolvable Mach. 17(3), 251–289 (2016). https://doi.org/10.1007/s10710-015-9262-4

    Article  Google Scholar 

  22. O’Neill, M., Ryan, C.: Grammatical evolution. IEEE Trans. Evol. Comput. 5(4), 349–358 (2001)

    Article  Google Scholar 

  23. Keijzer, M., O’Neill, M., Ryan, C., Cattolico, M.: Grammatical evolution rules: the mod and the bucket rule. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 123–130. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45984-7_12

    Chapter  Google Scholar 

  24. Thorhauer, A., Rothlauf, F.: On the locality of standard search operators in grammatical evolution. In: Bartz-Beielstein, T., Branke, J., Filipič, B., Smith, J. (eds.) PPSN 2014. LNCS, vol. 8672, pp. 465–475. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10762-2_46

    Chapter  Google Scholar 

  25. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  26. Chen, X., et al.: Gene expression patterns in human liver cancers. Mol. Biol. Cell 13(6), 1929–1939 (2002)

    Article  Google Scholar 

  27. Chowdary, D., et al.: Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. J. Mol. Diagn. 8(1), 31–39 (2006)

    Article  Google Scholar 

  28. Wan, C., Freitas, A.A., De Magalhães, J.P.: Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12(2), 262–275 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially funded by: Fundação para a Ciência e Tecnologia (FCT), Portugal, under the PhD grant agreement SFRH/BD/114865/2016, the project grant DSAIPA/DS/0022/2018 (GADgET), and is based upon work from COST Action CA15140: ImAppNIO, supported by COST (European Cooperation in Science and Technology): www.cost.eu. We also thank the NVIDIA Corporation for the hardware granted to this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filipe Assunção .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Assunção, F., Lourenço, N., Ribeiro, B., Machado, P. (2020). Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical Evolution. In: Castillo, P.A., Jiménez Laredo, J.L., Fernández de Vega, F. (eds) Applications of Evolutionary Computation. EvoApplications 2020. Lecture Notes in Computer Science(), vol 12104. Springer, Cham. https://doi.org/10.1007/978-3-030-43722-0_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43722-0_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43721-3

  • Online ISBN: 978-3-030-43722-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics