Abstract
Knowledge discovery is a complex process involving several phases. Some of them are repetitive and time-consuming, so they are susceptible of being automated. As an example, the large number of machine learning algorithms, together with their hyper-parameters, constitutes a vast search space to explore. In this vein, the term AutoML was coined to encompass those approaches automating such phases. The automatic workflow composition is an AutoML task that involves both the selection and the hyper-parameter optimisation of the algorithms addressing different phases, thus giving a more comprehensive assistance during the knowledge discovery process. Unlike other proposals that predetermine the structure of the preprocessing sequence, and in some cases the size of the workflow, our proposal generates workflows made up of an arbitrary number of preprocessing algorithms of any type and a classifier. This allows returning more accurate results since its avoids the oversimplification of the solution space. The optimisation is conducted by a grammar-guided genetic programming algorithm. The proposal has been validated and compared against TPOT and RECIPE generating workflows with greater predictive performance.
Supported by the University of Cordoba and FEDER funds, project UCO-FEDER 18 REF.1263116 MOD, and by the Ministry of Science and Innovation, project PID2020-115832GB-I00, FPU17/00799.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
WEKA: https://cs.waikato.ac.nz/ml/weka/ (last access: 30/09/2021).
- 2.
scikit-learn: https://scikit-learn.org/ (last access: 30/09/2021).
- 3.
imbalanced-learn: https://imbalanced-learn.org/ (last access: 30/09/2021).
References
Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Automated data pre-processing via meta-learning. In: International Conference on Model and Data Engineering, pp. 194–208 (2016)
Díaz-Pacheco, A., Reyes-García, C.A.: Full model selection in huge datasets and for proxy models construction. In: Batyrshin, I., Martínez-Villaseñor, M.L., Ponce Espinosa, H.E. (eds.) MICAI 2018. LNCS (LNAI), vol. 11288, pp. 171–182. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04491-6_13
Elkholy, A., Yang, F., Gustafson, S.: Interpretable automated machine learning in maana™ knowledge platform. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1937–1939 (2019)
Estévez-Velarde, S., Gutiérrez, Y., Almeida-Cruz, Y., Montoyo, A.: General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution. Inf. Sci. 543, 58–71 (2020)
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: an overview. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–34. American Association for Artificial Intelligence, Menlo Park, CA, USA (1996)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Gijsbers, P., Vanschoren, J., Olson, R.S.: Layered tpot: speeding up tree-based pipeline optimization. In: 2017 International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms, pp. 49–68 (2017)
Hutter, F., Kotthoff, L., Vanschoren, J.: Automated Machine Learning: Methods, Systems, Challenges. Springer Nature, Heidelberg (2019)
Mckay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’Neill, M.: Grammar-based genetic programming: a survey. Genet. Program Evolvable Mach. 11(3–4), 365–396 (2010). https://doi.org/10.1007/s10710-010-9109-y
Mohr, F., Wever, M., Hüllermeier, E.: Ml-plan: automated machine learning via hierarchical planning. Mach. Learn. 107(8–10), 1495–1515 (2018)
Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 485–492 (2016)
Rice, J.R.: The algorithm selection problem. In: Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)
De Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)
Yang, L., Shami, A.: On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415, 295–316 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Barbudo, R., Ventura, S., Romero, J.R. (2022). Grammar-Based Evolutionary Approach for Automatic Workflow Composition with Open Preprocessing Sequence. In: Abraham, A., et al. Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021). SoCPaR 2021. Lecture Notes in Networks and Systems, vol 417. Springer, Cham. https://doi.org/10.1007/978-3-030-96302-6_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-96302-6_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96301-9
Online ISBN: 978-3-030-96302-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)