Elsevier

Information Sciences

Volume 543, 8 January 2021, Pages 58-71
Information Sciences

General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution

https://doi.org/10.1016/j.ins.2020.07.035Get rights and content

Highlights

  • HML-Opt allows a researcher to define a complex space of machine learning pipelines.

  • HML-Opt automatically finds the best pipelines within time and memory constraints.

  • Meaningful statistics and insights are extracted from the experimentation process.

  • Freely available source code is provided for the research community.

Abstract

This paper introduces Hierarchical Machine Learning Optimisation (HML-Opt), an AutoML framework that is based on probabilistic grammatical evolution. HML-Opt has been designed to provide a flexible framework where a researcher can define the space of possible pipelines to solve a specific machine learning problem, which can range from high-level decisions about representation and features to low-level hyper-parameter values. The evaluation of HML-Opt is presented via two different case studies, both of which demonstrate that it is competitive with existing AutoML tools on a variety of benchmarks. Furthermore, HML-Opt can be applied to novel problems, such as knowledge extraction from natural language text, whereas other techniques are insufficiently flexible to capture the complexity of these scenarios. The source code for HML-Opt is available online for the research community.

Introduction

The research process for solving a machine learning problem often involves experimenting with several different approaches (neural networks, supervised classifiers, clustering algorithms, etc.). Designing an effective solution to any one of these problems often involves a large experimentation phase where researchers use different datasets, algorithms, and specific parameters. As an example, Fig. 1 shows a hypothetical pipeline composed of several steps. In each step, different options are available. Suitable combinations of these options yield different values for the performance metric that is being evaluated. These hypothetical steps can range from applying some data preprocessing techniques to selecting specific algorithms and further determining the values of their hyperparameters.

Exploring all the possible combinations of algorithms and parameters for a given problem can be unfeasible, since the number of possibilities is often exponential with respect to the number of steps in a pipeline. Furthermore, when some algorithms have numerical (discrete or continuous) parameters (e.g., regularisation rate, number of neurons in a neural network layer), it is impossible to evaluate all the combinations. This problem is further complicated by the fact that each experiment may have a high computational complexity (e.g., training from scratch a neural network on a large dataset). Researchers are often compelled to select a small number of possibilities based on previous experience and domain knowledge about the problem at hand.

The automation of this lengthy experimentation process is denominated Automatic Machine Learning (AutoML). AutoML is an increasingly growing field that has been applied for finding optimal machine learning pipelines in a variety of scenarios. As an example, in computer vision, where several neural network architectures have been extensively explored, [44] applies reinforced learning to learn to build the optimal neural network architecture for any given image dataset. Likewise, general frameworks such as Auto-Sklearn [12], Auto-Weka [39], or Auto-Keras [15] have appeared, based upon existing machine learning frameworks, that automatically explore different combinations of algorithms available in such frameworks.

Current AutoML techniques focus mostly on a specific subset of algorithms, often tailored to a specific framework or tool-set. Solving complex problems, on the other hand, requires the combination of different tools that might not be available in a single framework. Besides, challenging machine learning problems are not restricted to finding the best architecture or hyper-parameter set for a given algorithm, but often involve higher-level decisions. For example, in natural language processing, prior to deciding which machine learning algorithm to use, the researcher must decide on representation (e.g., whether to use embeddings, and which ones specifically), features (e.g., whether to incorporate knowledge-based features), preprocessing (e.g., whether to remove stop-words or to apply stemming), etc. Advancing towards the automation of this process requires more expressive tools that allow a researcher to define a complex and problem-specific space of possible pipelines including everything from high-level decisions to low-level hyper-parameter ranges.

The main objective of this research is to present Hierarchical Machine Learning Optimisation (HML-Opt), an AutoML technique based on probabilistic grammatical evolution. This work extends and generalises previous research by Estevez-Velarde et al. [11]. The most important contributions of this research are:

  • HML-Opt allows a researcher to define a large and complex space of possible pipelines to solve a specific problem, potentially combining different frameworks and technologies.

  • HML-Opt automatically searches this space of pipelines, with given time frames and computational resources, to find an optimal or close to optimal architecture.

  • Meaningful statistics and insights about the problem are extracted from the experimentation process.

  • Freely available source code is provided for the research community.1

This rest of the paper is organised as follows. Section 2 presents a brief review of the relevant concepts and related work. Section 3 introduces HML-Opt, describing its overall design and relevant implementation details. Section 4 presents two case studies involving different datasets, ranging from simple classification problems to state-of-the-art knowledge discovery problems. Finally, Section 5 discuses the main contributions and highlights of the research and the case study results, and Section 6 presents the main conclusions and future lines of research.

Section snippets

Related work

The idea of designing meta-algorithms to select the best algorithms for specific problem domains is a recurrent trend in artificial intelligence research, which has been motivated by several factors. These include: the complexity of parameter tuning in practical problems; the wide variety of existing algorithms with similar performance; and, the existence of no-free-lunch results. In the domains of continuous and combinatorial optimisation, hybrid strategies have been developed to select from

Hierarchical Machine Learning Optimisation

This section presents Hierarchical Machine Learning Optimisation (HML-Opt), a technique that searches for the optimal pipeline for a specific machine learning problem. The core of our proposal is a process that involves three different parts. First, the researcher designs a grammar that defines the space of all possible pipelines that are interesting to analyze for a given machine learning problem. (see Section 3.1). This grammar allows the generation of different pipelines which can be

Results

This section presents two different evaluation scenarios to demonstrate the applicability and effectiveness of our proposal. The first scenario (Section 4.1) presents a comparison between HML-Opt and five AutoML frameworks in several standard machine learning problems. The purpose of this scenario is to demonstrate that HML-Opt can be used as an out-of-the-box AutoML tool that is competitive with the state-of-the-art. The second scenario (Section 4.2) applies HML-Opt to a complex knowledge

Discussion

In this section, we present a high-level analysis of the experimental results and discuss key characteristics of HML-Opt. Unlike other proposals, our technique is not restricted to particular types of pipelines, such as neural networks, or shallow classifiers. Since the grammar defines the space of possible experimentation, anything can be included, such as natural language preprocessing techniques or knowledge bases. Our proposal optimises at many different levels, from high-level decisions

Conclusions

This paper presents Hierarchical Machine Learning Optimisation (HML-Opt), an AutoML framework for automatically finding a close to optimal pipeline in a specific machine learning problem. This novel technique allows researchers to evaluate a much higher number of experimental setups than what is manually possible, given specific time frames and computational resources. Moreover, a key and innovative design feature of HML-Opt is to provide a declarative and expressive framework whereby a

CRediT authorship contribution statement

Suilan Estevez-Velarde: Conceptualization, Software, Investigation, Writing - original draft. Yoan Gutiérrez: Methodology, Validation, Writing - review & editing. Yudivián Almeida-Cruz: Conceptualization, Methodology, Supervision. Andrés Montoyo: Writing - review & editing, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

Funding: This research has been supported by a Carolina Foundation grant in agreement with University of Alicante and University of Havana. Moreover, it has also been partially funded by both aforementioned universities, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089).

This manuscript has been greatly improved by the valuable

References (46)

  • F. Assuncao, N. Lourenco, P. Machado, B. Ribeiro, Automatic generation of neural networks with structured Grammatical...
  • L. Bottou, Large-scale machine learning with stochastic gradient descent, in: Proc. COMPSTAT’2010, Springer, 2010, pp....
  • E.K. Burke, M. Gendreau, M. Hyde, G. Kendall, G. Ochoa, E. Özcan, R. Qu, Hyper-heuristics: A survey of the state of the...
  • F. Caraffini, F. Neri, M. Epitropakis, HyperSPAM: A study on hyper-heuristic coordination strategies in the continuous...
  • A.R. Carvalho, F.M. Ramos, A.A. Chaves, Metaheuristics for the feedforward artificial neural network (ANN) architecture...
  • Z. Chi, Statistical properties of probabilistic context-free grammars, Comput. Linguist. 25 (1) (1999) 131–160, ISSN...
  • F. Chollet, Keras,https://keras.io,...
  • P. Cortez, A. Cerdeira, F. Almeida, T. Matos, J. Reis, Modeling wine preferences by data mining from physicochemical...
  • A.G.C. de Sá, W.J.G.S. Pinto, L.O.V.B. Oliveira, G.L. Pappa< RECIPE: A grammar-based framework for automatically...
  • A.M. De Silva, P.H.W. Leong, Grammatical evolution, SpringerBriefs Appl. Sci. Technol. 5(9789812874108) (2015) 25–33,...
  • S. Estevez-Velarde, Y. Gutiérrez, A. Montoyo, Y. Almeida-Cruz, AutoML strategy based on grammatical evolution: A case...
  • M. Feurer, A. Klein, K. Eggensperger, J.T. Springenberg, M. Blum, F. Hutter, Efficient and robust automated machine...
  • M. Hauschild, M. Pelikan, An introduction and survey of estimation of distribution algorithms, Swarm Evol. Comput. 1...
  • H. Jin, Q. Song, X. Hu, Auto-Keras: Efficient Neural Architecture Search with Network Morphism, 2018. ISSN 00086363...
  • H. Josiński, D. Kostrzewa, A. Michalczuk, A. Świtoński, The expanded invasive weed optimization metaheuristic for...
  • J. Józefowska, J. Weglarz, On a methodology for discrete-continuous scheduling. Eur. J. Oper. Res. 107 (2) (1998)...
  • P. Kerschke, H. Trautmann, Automated algorithm selection on continuous black-box problems by combining exploratory...
  • K. Khan, A. Sahai, A Comparison of BA, GA, PSO, BP and LM for Training Feed forward Neural Networks in e-Learning...
  • H.T. Kim, C.W. Ahn, A New Grammatical Evolution Based on Probabilistic Context-free Grammar, in: Proc. 18th Asia...
  • D.P. Kingma, J.L. Ba, Adam: A method for stochastic optimization, in: 3rd Int. Conf. Learn. Represent. ICLR 2015 -...
  • B. Komer, J. Bergstra, C. Eliasmith, Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn, in:...
  • K. Li et al.

    Learning to optimize

  • Cited by (15)

    • Intelligent ensembling of auto-ML system outputs for solving classification problems

      2022, Information Sciences
      Citation Excerpt :

      In contrast with other existing Auto-ML technologies, AutoGOAL can automatically build machine learning pipelines that combine techniques and algorithms from different frameworks, including shallow classifiers, natural language processing tools, and neural networks [13,14]. AutoGOAL performs a Probabilistic Grammatical Evolution Search to explore the space of available algorithms and their hyperparamenters to build pipelines [15]. At the end of the search, the best performing pipeline found according to the input loss function is given as a solution for the problem.

    View all citing articles on Scopus
    View full text