Skip to main content

Fitness Landscape Analysis of Automated Machine Learning Search Spaces

  • Conference paper
  • First Online:
Evolutionary Computation in Combinatorial Optimization (EvoCOP 2020)

Abstract

The field of Automated Machine Learning (AutoML) has as its main goal to automate the process of creating complete Machine Learning (ML) pipelines to any dataset without requiring deep user expertise in ML. Several AutoML methods have been proposed so far, but there is not a single one that really stands out. Furthermore, there is a lack of studies on the characteristics of the fitness landscape of AutoML search spaces. Such analysis may help to understand the performance of different optimization methods for AutoML and how to improve them. This paper adapts classic fitness landscape analysis measures to the context of AutoML. This is a challenging task, as AutoML search spaces include discrete, continuous, categorical and conditional hyperparameters. We propose an ML pipeline representation, a neighborhood definition and a distance metric between pipelines, and use them in the evaluation of the fitness distance correlation (FDC) and the neutrality ratio for a given AutoML search space. Results of FDC are counter-intuitive and require a more in-depth analysis of a range of search spaces. Results of neutrality, in turn, show a strong positive correlation between the mean neutrality ratio and the fitness value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    When determining the search space size, for continuous hyperparameters, we simplify and always consider 100 values, regardless of the size of the interval.

  2. 2.

    https://cgpimenta.github.io/EvoCOP2020_CGPimenta/.

References

  1. van Aardt, W.A., Bosman, A.S., Malan, K.M.: Characterising neutrality in neural network error landscapes. In: Proceedings of the Congress on Evolutionary Computation, pp. 1374–1381. IEEE (2017)

    Google Scholar 

  2. Asuncion, A., Newman, D.: UCI machine learning repository (2007)

    Google Scholar 

  3. Bosman, A.S., Engelbrecht, A.P., Helbig, M.: Progressive gradient walk for neural network fitness landscape analysis. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1473–1480. ACM (2018)

    Google Scholar 

  4. Bosman, A.S., Engelbrecht, A., Helbig, M.: Search space boundaries in neural network error landscape analysis. In: Proceedings of the Symposium Series on Computational Intelligence, pp. 1–8. IEEE (2016)

    Google Scholar 

  5. Ekárt, A., Németh, S.Z.: A metric for genetic programs and fitness sharing. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 259–270. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-540-46239-2_19

    Chapter  Google Scholar 

  6. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)

    Google Scholar 

  7. Garciarena, U., Santana, R., Mendiburu, A.: Analysis of the complexity of the automatic pipeline generation problem. In: Proceedings of the Congress on Evolutionary Computation, pp. 1–8. IEEE (2018)

    Google Scholar 

  8. Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning. TSSCML. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5. http://automl.org/book

    Book  Google Scholar 

  9. Jones, T., Forrest, S.: Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Proceedings of the 6th International Conference on Genetic Algorithms, pp. 184–192 (1995)

    Google Scholar 

  10. Malan, K.M., Engelbrecht, A.P.: Characterising the searchability of continuous optimisation problems for PSO. Swarm Intell. 8(4), 275–302 (2014). https://doi.org/10.1007/s11721-014-0099-x

    Article  Google Scholar 

  11. Mckay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’Neill, M.: Grammar-based genetic programming: a survey. Genet. Program Evolvable Mach. 11(3–4), 365–396 (2010). https://doi.org/10.1007/s10710-010-9109-y

    Article  Google Scholar 

  12. Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 151–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_8

    Chapter  Google Scholar 

  13. Pedregosa, F., et al.: Scikit-learn: machine learning in python. JMLR 12(Oct), 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  14. Pitzer, E., Affenzeller, M.: A comprehensive survey on fitness landscape analysis. In: Klempous, R., Suárez Araujo, C.P. (eds.) Recent Advances in Intelligent Engineering Systems, vol. 378, pp. 161–191. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23229-9_8

    Chapter  Google Scholar 

  15. Pushak, Y., Hoos, H.: Algorithm configuration landscapes: In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds.) PPSN 2018. LNCS, vol. 11102, pp. 271–283. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99259-4_22

    Chapter  Google Scholar 

  16. Rakitianskaia, A., Bekker, E., Malan, K.M., Engelbrecht, A.: Analysis of error landscapes in multi-layered neural networks for classification. In: Proceedings of the 2016 IEEE Congress on Evolutionary Computation, pp. 5270–5277. IEEE (2016)

    Google Scholar 

  17. Reidys, C.M., Stadler, P.F.: Neutrality in fitness landscapes. Appl. Math. Comput. 117(2–3), 321–350 (2001). https://doi.org/10.1016/S0096-3003(99)00166-6

    Article  MathSciNet  MATH  Google Scholar 

  18. Sipser, M.: Introduction to the Theory of Computation. 3rd edn. Cengage Learning (2012)

    Google Scholar 

  19. Stadler, P.F.: Fitness landscapes. In: Lässig, M., Valleriani, A. (eds.) Biological Evolution and Statistical Physics, vol. 585, pp. 183–204. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45692-9_10

    Chapter  Google Scholar 

  20. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM (2013)

    Google Scholar 

  21. Vanneschi, L., Pirola, Y., Mauri, G., Tomassini, M., Collard, P., Verel, S.: A study of the neutrality of boolean function landscapes in genetic programming. Theor. Comput. Sci. 425, 34–57 (2012)

    Article  MathSciNet  Google Scholar 

  22. Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)

    Article  Google Scholar 

  23. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data mining: practical machine learning tools and techniques, 4th edn. Morgan Kaufmann Publishers Inc., Burlington (2016)

    Google Scholar 

  24. Zöller, M.A., Huber, M.F.: Survey on automated machine learning. arXiv preprint arXiv:1904.12054 (2019)

  25. Zwillinger, D.: CRC standard mathematical tables and formulae. Chapman and Hall/CRC, London/Boca Raton (2002)

    Book  Google Scholar 

Download references

Acknowledgments

The UFMG authors would like to thank FAPEMIG, CNPq and CAPES for their financial support. This work has also been partially funded by ATMOSPHERE (H2020 777154 and MCTIC/RNP 51119).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristiano G. Pimenta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pimenta, C.G., de Sá, A.G.C., Ochoa, G., Pappa, G.L. (2020). Fitness Landscape Analysis of Automated Machine Learning Search Spaces. In: Paquete, L., Zarges, C. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2020. Lecture Notes in Computer Science(), vol 12102. Springer, Cham. https://doi.org/10.1007/978-3-030-43680-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43680-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43679-7

  • Online ISBN: 978-3-030-43680-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics