Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical Evolution

Assunção, Filipe; Lourenço, Nuno; Ribeiro, Bernardete; Machado, Penousal

doi:10.1007/978-3-030-43722-0_34

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12104))

Included in the following conference series:

International Conference on the Applications of Evolutionary Computation (Part of EvoStar)

1317 Accesses
5 Citations
3 Altmetric

Abstract

The deployment of Machine Learning (ML) models is a difficult and time-consuming job that comprises a series of sequential and correlated tasks that go from the data pre-processing, and the design and extraction of features, to the choice of the ML algorithm and its parameterisation. The task is even more challenging considering that the design of features is in many cases problem specific, and thus requires domain-expertise. To overcome these limitations Automated Machine Learning (AutoML) methods seek to automate, with few or no human-intervention, the design of pipelines, i.e., automate the selection of the sequence of methods that have to be applied to the raw data. These methods have the potential to enable non-expert users to use ML, and provide expert users with solutions that they would unlikely consider. In particular, this paper describes AutoML-DSGE – a novel grammar-based framework that adapts Dynamic Structured Grammatical Evolution (DSGE) to the evolution of Scikit-Learn classification pipelines. The experimental results include comparing AutoML-DSGE to another grammar-based AutoML framework, Resilient Classification Pipeline Evolution (RECIPE), and show that the average performance of the classification pipelines generated by AutoML-DSGE is always superior to the average performance of RECIPE; the differences are statistically significant in 3 out of the 10 used datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Lourenço, N., Assunção, F., Pereira, F.B., Costa, E., Machado, P.: Structured grammatical evolution: a dynamic approach. In: Ryan, C., O’Neill, M., Collins, J.J. (eds.) Handbook of Grammatical Evolution, pp. 137–161. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78717-6_6
Chapter Google Scholar
Jiménez, Á.B., Lázaro, J.L., Dorronsoro, J.R.: Finding optimal model parameters by deterministic and annealed focused grid search. Neurocomputing 72(13–15), 2824–2832 (2009)
Article Google Scholar
Young, S.R., Rose, D.C., Karnowski, T.P., Lim, S., Patton, R.M.: Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: MLHPC@SC, pp. 4:1–4:5. ACM (2015)
Google Scholar
Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: NIPS, pp. 2546–2554 (2011)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)
Article Google Scholar
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: NIPS, pp. 2960–2968 (2012)
Google Scholar
Bergstra, J., Yamins, D., Cox, D.D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: ICML (1). JMLR Workshop and Conference Proceedings, vol. 28, pp. 115–123. JMLR.org (2013)
Google Scholar
Chunhong, Z., Licheng, J.: Automatic parameters selection for SVM based on GA. In: Fifth World Congress on Intelligent Control and Automation, WCICA 2004, vol. 2, pp. 1869–1872. IEEE (2004)
Google Scholar
Friedrichs, F., Igel, C.: Evolutionary tuning of multiple SVM parameters. Neurocomputing 64, 107–117 (2005)
Article Google Scholar
Guyon, I., et al.: A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention. In: AutoML@ICML. JMLR Workshop and Conference Proceedings, vol. 64, pp. 21–30. JMLR.org (2016)
Google Scholar
Guyon, I., et al.: Design of the 2015 ChaLearn AutoML challenge. In: IJCNN, pp. 1–8. IEEE (2015)
Google Scholar
Frank, E., Hall, M.A., Holmes, G., Kirkby, R., Pfahringer, B.: WEKA - a machine learning workbench for data mining. In: Maimon, O., Rokach, L. (eds.) The Data Mining and Knowledge Discovery Handbook, pp. 1305–1314. Springer, Cham (2005). https://doi.org/10.1007/0-387-25465-X_62
Chapter Google Scholar
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: KDD, pp. 847–855. ACM (2013)
Google Scholar
Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18, 25:1–25:5 (2017). http://jmlr.org/papers/v18/16-261.html
Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: GECCO, pp. 485–492. ACM (2016)
Google Scholar
Komer, B., Bergstra, J., Eliasmith, C.: Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML Workshop on AutoML (2014)
Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: NIPS, pp. 2962–2970 (2015)
Google Scholar
de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16
Chapter Google Scholar
Lourenço, N., Pereira, F.B., Costa, E.: Unveiling the properties of structured grammatical evolution. Genet. Program. Evolvable Mach. 17(3), 251–289 (2016). https://doi.org/10.1007/s10710-015-9262-4
Article Google Scholar
O’Neill, M., Ryan, C.: Grammatical evolution. IEEE Trans. Evol. Comput. 5(4), 349–358 (2001)
Article Google Scholar
Keijzer, M., O’Neill, M., Ryan, C., Cattolico, M.: Grammatical evolution rules: the mod and the bucket rule. In: Foster, J.A., Lutton, E., Miller, J., Ryan, C., Tettamanzi, A. (eds.) EuroGP 2002. LNCS, vol. 2278, pp. 123–130. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45984-7_12
Chapter Google Scholar
Thorhauer, A., Rothlauf, F.: On the locality of standard search operators in grammatical evolution. In: Bartz-Beielstein, T., Branke, J., Filipič, B., Smith, J. (eds.) PPSN 2014. LNCS, vol. 8672, pp. 465–475. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10762-2_46
Chapter Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Chen, X., et al.: Gene expression patterns in human liver cancers. Mol. Biol. Cell 13(6), 1929–1939 (2002)
Article Google Scholar
Chowdary, D., et al.: Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. J. Mol. Diagn. 8(1), 31–39 (2006)
Article Google Scholar
Wan, C., Freitas, A.A., De Magalhães, J.P.: Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 12(2), 262–275 (2015)
Article Google Scholar

Download references

Acknowledgments

This work is partially funded by: Fundação para a Ciência e Tecnologia (FCT), Portugal, under the PhD grant agreement SFRH/BD/114865/2016, the project grant DSAIPA/DS/0022/2018 (GADgET), and is based upon work from COST Action CA15140: ImAppNIO, supported by COST (European Cooperation in Science and Technology): www.cost.eu. We also thank the NVIDIA Corporation for the hardware granted to this research.

Author information

Authors and Affiliations

CISUC, Department of Informatics Engineering, University of Coimbra, Coimbra, Portugal
Filipe Assunção, Nuno Lourenço, Bernardete Ribeiro & Penousal Machado
LASIGE, Department of Informatics, Faculdade de Ciencias, Universidade de Lisboa, Lisbon, Portugal
Filipe Assunção

Authors

Filipe Assunção
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Lourenço
View author publications
You can also search for this author in PubMed Google Scholar
Bernardete Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Penousal Machado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filipe Assunção .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Pedro A. Castillo
Université Le Havre Normandie, Le Havre, France
Juan Luis Jiménez Laredo
Universidad de Extremadura, Mérida, Spain
Francisco Fernández de Vega

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Assunção, F., Lourenço, N., Ribeiro, B., Machado, P. (2020). Evolution of Scikit-Learn Pipelines with Dynamic Structured Grammatical Evolution. In: Castillo, P.A., Jiménez Laredo, J.L., Fernández de Vega, F. (eds) Applications of Evolutionary Computation. EvoApplications 2020. Lecture Notes in Computer Science(), vol 12104. Springer, Cham. https://doi.org/10.1007/978-3-030-43722-0_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-43722-0_34
Published: 09 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43721-3
Online ISBN: 978-3-030-43722-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics