Grammar-Based Evolutionary Approach for Automatic Workflow Composition with Open Preprocessing Sequence

Barbudo, Rafael; Ventura, Sebastián; Romero, José Raúl

doi:10.1007/978-3-030-96302-6_61

Grammar-Based Evolutionary Approach for Automatic Workflow Composition with Open Preprocessing Sequence

Rafael Barbudo^17,18,
Sebastián Ventura^17,18 &
José Raúl Romero^17,18

Conference paper
First Online: 22 February 2022

562 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 417))

Abstract

Knowledge discovery is a complex process involving several phases. Some of them are repetitive and time-consuming, so they are susceptible of being automated. As an example, the large number of machine learning algorithms, together with their hyper-parameters, constitutes a vast search space to explore. In this vein, the term AutoML was coined to encompass those approaches automating such phases. The automatic workflow composition is an AutoML task that involves both the selection and the hyper-parameter optimisation of the algorithms addressing different phases, thus giving a more comprehensive assistance during the knowledge discovery process. Unlike other proposals that predetermine the structure of the preprocessing sequence, and in some cases the size of the workflow, our proposal generates workflows made up of an arbitrary number of preprocessing algorithms of any type and a classifier. This allows returning more accurate results since its avoids the oversimplification of the solution space. The optimisation is conducted by a grammar-guided genetic programming algorithm. The proposal has been validated and compared against TPOT and RECIPE generating workflows with greater predictive performance.

Supported by the University of Cordoba and FEDER funds, project UCO-FEDER 18 REF.1263116 MOD, and by the Ministry of Science and Innovation, project PID2020-115832GB-I00, FPU17/00799.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
WEKA: https://cs.waikato.ac.nz/ml/weka/ (last access: 30/09/2021).
2.
scikit-learn: https://scikit-learn.org/ (last access: 30/09/2021).
3.
imbalanced-learn: https://imbalanced-learn.org/ (last access: 30/09/2021).

References

Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Automated data pre-processing via meta-learning. In: International Conference on Model and Data Engineering, pp. 194–208 (2016)
Google Scholar
Díaz-Pacheco, A., Reyes-García, C.A.: Full model selection in huge datasets and for proxy models construction. In: Batyrshin, I., Martínez-Villaseñor, M.L., Ponce Espinosa, H.E. (eds.) MICAI 2018. LNCS (LNAI), vol. 11288, pp. 171–182. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04491-6_13
Chapter Google Scholar
Elkholy, A., Yang, F., Gustafson, S.: Interpretable automated machine learning in maana™ knowledge platform. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1937–1939 (2019)
Google Scholar
Estévez-Velarde, S., Gutiérrez, Y., Almeida-Cruz, Y., Montoyo, A.: General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution. Inf. Sci. 543, 58–71 (2020)
Article Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: an overview. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–34. American Association for Artificial Intelligence, Menlo Park, CA, USA (1996)
Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Google Scholar
Gijsbers, P., Vanschoren, J., Olson, R.S.: Layered tpot: speeding up tree-based pipeline optimization. In: 2017 International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms, pp. 49–68 (2017)
Google Scholar
Hutter, F., Kotthoff, L., Vanschoren, J.: Automated Machine Learning: Methods, Systems, Challenges. Springer Nature, Heidelberg (2019)
Google Scholar
Mckay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’Neill, M.: Grammar-based genetic programming: a survey. Genet. Program Evolvable Mach. 11(3–4), 365–396 (2010). https://doi.org/10.1007/s10710-010-9109-y
Article Google Scholar
Mohr, F., Wever, M., Hüllermeier, E.: Ml-plan: automated machine learning via hierarchical planning. Mach. Learn. 107(8–10), 1495–1515 (2018)
Article MathSciNet Google Scholar
Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 485–492 (2016)
Google Scholar
Rice, J.R.: The algorithm selection problem. In: Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)
Google Scholar
De Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16
Chapter Google Scholar
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)
Google Scholar
Yang, L., Shami, A.: On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415, 295–316 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Numerical Analysis, University of Córdoba, 14071, Córdoba, Spain
Rafael Barbudo, Sebastián Ventura & José Raúl Romero
Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), Córdoba, Spain
Rafael Barbudo, Sebastián Ventura & José Raúl Romero

Authors

Rafael Barbudo
View author publications
You can also search for this author in PubMed Google Scholar
Sebastián Ventura
View author publications
You can also search for this author in PubMed Google Scholar
José Raúl Romero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Barbudo .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Department of Industrial Engineering and Computer Science, Stellenbosch University, Matieland, South Africa
Andries Engelbrecht
Department of Computer Science, Università degli Studi di Milano, Milan, Italy
Fabio Scotti
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
University of Mumbai, Mumbai, Maharashtra, India
Pooja Manghirmalani Mishra
University of Calabria (Unical), Rende, Italy
Giancarlo Fortino
Department of Informatics, Vilnius University, Kaunas, Lithuania
Virgilijus Sakalauskas
Center for Smart Computing Continuum, Forschung Burgenland, Eisenstadt, Austria
Sabri Pllana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbudo, R., Ventura, S., Romero, J.R. (2022). Grammar-Based Evolutionary Approach for Automatic Workflow Composition with Open Preprocessing Sequence. In: Abraham, A., et al. Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021). SoCPaR 2021. Lecture Notes in Networks and Systems, vol 417. Springer, Cham. https://doi.org/10.1007/978-3-030-96302-6_61

Download citation

DOI: https://doi.org/10.1007/978-3-030-96302-6_61
Published: 22 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96301-9
Online ISBN: 978-3-030-96302-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics