Skip to main content

Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-Based Genetic Programming

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11102))

Abstract

This paper proposes \(\text {Auto-MEKA}_{\text {GGP}}\), an Automated Machine Learning (Auto-ML) method for Multi-Label Classification (MLC) based on the MEKA tool, which offers a number of MLC algorithms. In MLC, each example can be associated with one or more class labels, making MLC problems harder than conventional (single-label) classification problems. Hence, it is essential to select an MLC algorithm and its configuration tailored (optimized) for the input dataset. \(\text {Auto-MEKA}_{\text {GGP}}\) addresses this problem with two key ideas. First, a large number of choices of MLC algorithms and configurations from MEKA are represented into a grammar. Second, our proposed Grammar-based Genetic Programming (GGP) method uses that grammar to search for the best MLC algorithm and configuration for the input dataset. \(\text {Auto-MEKA}_{\text {GGP}}\) was tested in 10 datasets and compared to two well-known MLC methods, namely Binary Relevance and Classifier Chain, and also compared to GA-Auto-MLC, a genetic algorithm we recently proposed for the same task. Two versions of \(\text {Auto-MEKA}_{\text {GGP}}\) were tested: a full version with the proposed grammar, and a simplified version where the grammar includes only the algorithmic components used by GA-Auto-MLC. Overall, the full version of \(\text {Auto-MEKA}_{\text {GGP}}\) achieved the best predictive accuracy among all five evaluated methods, being the winner in six out of the 10 datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Code and documentation are available at: https://github.com/laic-ufmg/automlc/.

  2. 2.

    The implementation of the grammar(s) for EpochX is available at: https://github.com/laic-ufmg/automlc/tree/master/PPSN.

  3. 3.

    The datasets are available at: http://www.uco.es/kdis/mllresources/.

  4. 4.

    Available at: https://github.com/laic-ufmg/automlc/tree/master/PPSN/AutoMEKAS.bnf.

  5. 5.

    Available at: https://github.com/laic-ufmg/automlc/tree/master/PPSN/AutoMEKA.bnf.

References

  1. de Sá, A.G.C., Freitas, A.A., Pappa, G.L.: Multi-label classification search space in the MEKA software. Technical report, UFMG (2018). https://github.com/laic-ufmg/automlc/tree/master/PPSN/MLC-SearchSpace.pdf

  2. de Sá, A.G.C., Pappa, G.L., Freitas, A.A.: Towards a method for automatically selecting and configuring multi-label classification algorithms. In: Proceedings of GECCO Companion, pp. 1125–1132 (2017)

    Google Scholar 

  3. de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16

    Chapter  Google Scholar 

  4. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  5. Feurer, M., Klein, A., Eggensperger, K., et al.: Efficient and robust automated machine learning. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 2755–2763 (2015)

    Google Scholar 

  6. Křen, T., Pilát, M., Neruda, R.: Automatic creation of machine learning workflows with strongly typed genetic programming. Int. J. Artif. Intell. Tools 26(5), 1–24 (2017)

    Article  Google Scholar 

  7. Mckay, R., Hoai, N., Whigham, P., Shan, Y., O’Neill, M.: Grammar-based genetic programming: a survey. Genet. Program Evolvable Mach. 11(3), 365–396 (2010)

    Article  Google Scholar 

  8. Olson, R., Bartley, N., Urbanowicz, R., Moore, J.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of GECCO, pp. 485–492 (2016)

    Google Scholar 

  9. Otero, F., Castle, T., Johnson, C.: EpochX: genetic programming in Java with statistics and event monitoring. In: Proceedings of GECCO Companion, pp. 93–100 (2012)

    Google Scholar 

  10. Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  11. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)

    Article  MathSciNet  Google Scholar 

  12. Read, J., Reutemann, P., Pfahringer, B., Holmes, G.: MEKA: a multi-label/multi-target extension to WEKA. J. Mach. Learn. Res. 17(21), 1–5 (2016)

    MathSciNet  MATH  Google Scholar 

  13. Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10

    Chapter  Google Scholar 

  14. Thornton, C., Hutter, F., Hoos, H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the ACM SIGKDD Conference, pp. 847–855 (2013)

    Google Scholar 

  15. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-09823-4_34

    Chapter  Google Scholar 

  16. Witten, I., Frank, E., Hall, M.A., Pal, C.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

Download references

Acknowledgments

This work has been partially funded by the EUBra-BIGSEA project by the European Commission under the Cooperation Programme (MCTI/RNP 3rd Coordinated Call), Horizon 2020 grant agreement 690116. In addition, this work has been partially supported by the following Brazilian Research Support Agencies: CNPq, CAPES and FAPEMIG.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alex G. C. de Sá .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de Sá, A.G.C., Freitas, A.A., Pappa, G.L. (2018). Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-Based Genetic Programming. In: Auger, A., Fonseca, C., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds) Parallel Problem Solving from Nature – PPSN XV. PPSN 2018. Lecture Notes in Computer Science(), vol 11102. Springer, Cham. https://doi.org/10.1007/978-3-319-99259-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99259-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99258-7

  • Online ISBN: 978-3-319-99259-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics