Skip to main content

Grammar-Based Evolutionary Approach for Automatic Workflow Composition with Open Preprocessing Sequence

  • Conference paper
  • First Online:
  • 562 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 417))

Abstract

Knowledge discovery is a complex process involving several phases. Some of them are repetitive and time-consuming, so they are susceptible of being automated. As an example, the large number of machine learning algorithms, together with their hyper-parameters, constitutes a vast search space to explore. In this vein, the term AutoML was coined to encompass those approaches automating such phases. The automatic workflow composition is an AutoML task that involves both the selection and the hyper-parameter optimisation of the algorithms addressing different phases, thus giving a more comprehensive assistance during the knowledge discovery process. Unlike other proposals that predetermine the structure of the preprocessing sequence, and in some cases the size of the workflow, our proposal generates workflows made up of an arbitrary number of preprocessing algorithms of any type and a classifier. This allows returning more accurate results since its avoids the oversimplification of the solution space. The optimisation is conducted by a grammar-guided genetic programming algorithm. The proposal has been validated and compared against TPOT and RECIPE generating workflows with greater predictive performance.

Supported by the University of Cordoba and FEDER funds, project UCO-FEDER 18 REF.1263116 MOD, and by the Ministry of Science and Innovation, project PID2020-115832GB-I00, FPU17/00799.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    WEKA: https://cs.waikato.ac.nz/ml/weka/ (last access: 30/09/2021).

  2. 2.

    scikit-learn: https://scikit-learn.org/ (last access: 30/09/2021).

  3. 3.

    imbalanced-learn: https://imbalanced-learn.org/ (last access: 30/09/2021).

References

  1. Bilalli, B., Abelló, A., Aluja-Banet, T., Wrembel, R.: Automated data pre-processing via meta-learning. In: International Conference on Model and Data Engineering, pp. 194–208 (2016)

    Google Scholar 

  2. Díaz-Pacheco, A., Reyes-García, C.A.: Full model selection in huge datasets and for proxy models construction. In: Batyrshin, I., Martínez-Villaseñor, M.L., Ponce Espinosa, H.E. (eds.) MICAI 2018. LNCS (LNAI), vol. 11288, pp. 171–182. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04491-6_13

    Chapter  Google Scholar 

  3. Elkholy, A., Yang, F., Gustafson, S.: Interpretable automated machine learning in maana™ knowledge platform. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1937–1939 (2019)

    Google Scholar 

  4. Estévez-Velarde, S., Gutiérrez, Y., Almeida-Cruz, Y., Montoyo, A.: General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution. Inf. Sci. 543, 58–71 (2020)

    Article  Google Scholar 

  5. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery: an overview. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 1–34. American Association for Artificial Intelligence, Menlo Park, CA, USA (1996)

    Google Scholar 

  6. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)

    Google Scholar 

  7. Gijsbers, P., Vanschoren, J., Olson, R.S.: Layered tpot: speeding up tree-based pipeline optimization. In: 2017 International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms, pp. 49–68 (2017)

    Google Scholar 

  8. Hutter, F., Kotthoff, L., Vanschoren, J.: Automated Machine Learning: Methods, Systems, Challenges. Springer Nature, Heidelberg (2019)

    Google Scholar 

  9. Mckay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’Neill, M.: Grammar-based genetic programming: a survey. Genet. Program Evolvable Mach. 11(3–4), 365–396 (2010). https://doi.org/10.1007/s10710-010-9109-y

    Article  Google Scholar 

  10. Mohr, F., Wever, M., Hüllermeier, E.: Ml-plan: automated machine learning via hierarchical planning. Mach. Learn. 107(8–10), 1495–1515 (2018)

    Article  MathSciNet  Google Scholar 

  11. Olson, R.S., Bartley, N., Urbanowicz, R.J., Moore, J.H.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 485–492 (2016)

    Google Scholar 

  12. Rice, J.R.: The algorithm selection problem. In: Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)

    Google Scholar 

  13. De Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16

    Chapter  Google Scholar 

  14. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)

    Google Scholar 

  15. Yang, L., Shami, A.: On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415, 295–316 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rafael Barbudo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barbudo, R., Ventura, S., Romero, J.R. (2022). Grammar-Based Evolutionary Approach for Automatic Workflow Composition with Open Preprocessing Sequence. In: Abraham, A., et al. Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021). SoCPaR 2021. Lecture Notes in Networks and Systems, vol 417. Springer, Cham. https://doi.org/10.1007/978-3-030-96302-6_61

Download citation

Publish with us

Policies and ethics