Fitness Landscape Analysis of Automated Machine Learning Search Spaces

Pimenta, Cristiano G.; de Sá, Alex G. C.; Ochoa, Gabriela; Pappa, Gisele L.

doi:10.1007/978-3-030-43680-3_8

Cristiano G. Pimenta¹⁰,
Alex G. C. de Sá¹⁰,
Gabriela Ochoa¹¹ &
…
Gisele L. Pappa¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12102))

Included in the following conference series:

European Conference on Evolutionary Computation in Combinatorial Optimization (Part of EvoStar)

822 Accesses
16 Citations
1 Altmetric

Abstract

The field of Automated Machine Learning (AutoML) has as its main goal to automate the process of creating complete Machine Learning (ML) pipelines to any dataset without requiring deep user expertise in ML. Several AutoML methods have been proposed so far, but there is not a single one that really stands out. Furthermore, there is a lack of studies on the characteristics of the fitness landscape of AutoML search spaces. Such analysis may help to understand the performance of different optimization methods for AutoML and how to improve them. This paper adapts classic fitness landscape analysis measures to the context of AutoML. This is a challenging task, as AutoML search spaces include discrete, continuous, categorical and conditional hyperparameters. We propose an ML pipeline representation, a neighborhood definition and a distance metric between pipelines, and use them in the evaluation of the fitness distance correlation (FDC) and the neutrality ratio for a given AutoML search space. Results of FDC are counter-intuitive and require a more in-depth analysis of a range of search spaces. Results of neutrality, in turn, show a strong positive correlation between the mean neutrality ratio and the fitness value.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
When determining the search space size, for continuous hyperparameters, we simplify and always consider 100 values, regardless of the size of the interval.
2.
https://cgpimenta.github.io/EvoCOP2020_CGPimenta/.

References

van Aardt, W.A., Bosman, A.S., Malan, K.M.: Characterising neutrality in neural network error landscapes. In: Proceedings of the Congress on Evolutionary Computation, pp. 1374–1381. IEEE (2017)
Google Scholar
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Google Scholar
Bosman, A.S., Engelbrecht, A.P., Helbig, M.: Progressive gradient walk for neural network fitness landscape analysis. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1473–1480. ACM (2018)
Google Scholar
Bosman, A.S., Engelbrecht, A., Helbig, M.: Search space boundaries in neural network error landscape analysis. In: Proceedings of the Symposium Series on Computational Intelligence, pp. 1–8. IEEE (2016)
Google Scholar
Ekárt, A., Németh, S.Z.: A metric for genetic programs and fitness sharing. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 259–270. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-540-46239-2_19
Chapter Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Google Scholar
Garciarena, U., Santana, R., Mendiburu, A.: Analysis of the complexity of the automatic pipeline generation problem. In: Proceedings of the Congress on Evolutionary Computation, pp. 1–8. IEEE (2018)
Google Scholar
Hutter, F., Kotthoff, L., Vanschoren, J. (eds.): Automated Machine Learning. TSSCML. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5. http://automl.org/book
Book Google Scholar
Jones, T., Forrest, S.: Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In: Proceedings of the 6th International Conference on Genetic Algorithms, pp. 184–192 (1995)
Google Scholar
Malan, K.M., Engelbrecht, A.P.: Characterising the searchability of continuous optimisation problems for PSO. Swarm Intell. 8(4), 275–302 (2014). https://doi.org/10.1007/s11721-014-0099-x
Article Google Scholar
Mckay, R.I., Hoai, N.X., Whigham, P.A., Shan, Y., O’Neill, M.: Grammar-based genetic programming: a survey. Genet. Program Evolvable Mach. 11(3–4), 365–396 (2010). https://doi.org/10.1007/s10710-010-9109-y
Article Google Scholar
Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 151–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_8
Chapter Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. JMLR 12(Oct), 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pitzer, E., Affenzeller, M.: A comprehensive survey on fitness landscape analysis. In: Klempous, R., Suárez Araujo, C.P. (eds.) Recent Advances in Intelligent Engineering Systems, vol. 378, pp. 161–191. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23229-9_8
Chapter Google Scholar
Pushak, Y., Hoos, H.: Algorithm configuration landscapes: In: Auger, A., Fonseca, C.M., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds.) PPSN 2018. LNCS, vol. 11102, pp. 271–283. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99259-4_22
Chapter Google Scholar
Rakitianskaia, A., Bekker, E., Malan, K.M., Engelbrecht, A.: Analysis of error landscapes in multi-layered neural networks for classification. In: Proceedings of the 2016 IEEE Congress on Evolutionary Computation, pp. 5270–5277. IEEE (2016)
Google Scholar
Reidys, C.M., Stadler, P.F.: Neutrality in fitness landscapes. Appl. Math. Comput. 117(2–3), 321–350 (2001). https://doi.org/10.1016/S0096-3003(99)00166-6
Article MathSciNet MATH Google Scholar
Sipser, M.: Introduction to the Theory of Computation. 3rd edn. Cengage Learning (2012)
Google Scholar
Stadler, P.F.: Fitness landscapes. In: Lässig, M., Valleriani, A. (eds.) Biological Evolution and Statistical Physics, vol. 585, pp. 183–204. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45692-9_10
Chapter Google Scholar
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM (2013)
Google Scholar
Vanneschi, L., Pirola, Y., Mauri, G., Tomassini, M., Collard, P., Verel, S.: A study of the neutrality of boolean function landscapes in genetic programming. Theor. Comput. Sci. 425, 34–57 (2012)
Article MathSciNet Google Scholar
Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)
Article Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data mining: practical machine learning tools and techniques, 4th edn. Morgan Kaufmann Publishers Inc., Burlington (2016)
Google Scholar
Zöller, M.A., Huber, M.F.: Survey on automated machine learning. arXiv preprint arXiv:1904.12054 (2019)
Zwillinger, D.: CRC standard mathematical tables and formulae. Chapman and Hall/CRC, London/Boca Raton (2002)
Book Google Scholar

Download references

Acknowledgments

The UFMG authors would like to thank FAPEMIG, CNPq and CAPES for their financial support. This work has also been partially funded by ATMOSPHERE (H2020 777154 and MCTIC/RNP 51119).

Author information

Authors and Affiliations

Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Cristiano G. Pimenta, Alex G. C. de Sá & Gisele L. Pappa
University of Stirling, Stirling, UK
Gabriela Ochoa

Authors

Cristiano G. Pimenta
View author publications
You can also search for this author in PubMed Google Scholar
Alex G. C. de Sá
View author publications
You can also search for this author in PubMed Google Scholar
Gabriela Ochoa
View author publications
You can also search for this author in PubMed Google Scholar
Gisele L. Pappa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristiano G. Pimenta .

Editor information

Editors and Affiliations

University of Coimbra, Coimbra, Portugal
Luís Paquete
Aberystwyth University, Aberystwyth, UK
Christine Zarges

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pimenta, C.G., de Sá, A.G.C., Ochoa, G., Pappa, G.L. (2020). Fitness Landscape Analysis of Automated Machine Learning Search Spaces. In: Paquete, L., Zarges, C. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2020. Lecture Notes in Computer Science(), vol 12102. Springer, Cham. https://doi.org/10.1007/978-3-030-43680-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-43680-3_8
Published: 09 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43679-7
Online ISBN: 978-3-030-43680-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics