Skip to main content

FERMAT: Feature Engineering with Grammatical Evolution

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2021)

Abstract

Feature engineering is a key step in a machine learning study. We propose FERMAT, a grammatical evolution framework for the automatic discovery of an optimal set of engineered features, with enhanced ability to characterize data. The framework contains a grammar specifying the original features and possible operations that can be applied to data. The optimization process searches for a transformation strategy to apply to the original dataset, aiming at creating a novel characterization composed by a combination of original and engineered attributes. FERMAT was applied to two real-world drug development datasets and results reveal that the framework is able to craft novel representations for data that foster the predictive ability of tree-based regression models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    DecisionTreeRegressor configuration details.

  2. 2.

    RandomForestRegressor configuration details.

  3. 3.

    Some of the best solutions are composed just by engineered features. In these runs, it is not possible to design a solution for FERMAT-Sel. Accordingly, the number of repetitions for this variant is lower than for the remaining alternatives.

References

  1. Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Program. Evolvable Mach. 8(4), 413–432 (2007)

    Article  Google Scholar 

  2. Assunção, F., Lourenço, N., Ribeiro, B., Machado, P.: Evolution of Scikit-learn pipelines with dynamic structured grammatical evolution. In: Castillo, P.A., Jiménez Laredo, J.L., Fernández de Vega, F. (eds.) EvoApplications 2020. LNCS, vol. 12104, pp. 530–545. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43722-0_34

    Chapter  Google Scholar 

  3. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  4. Castelli, M., Manzoni, L., Vanneschi, L.: An efficient genetic programming system with geometric semantic operators and its application to human oral bioavailability prediction. arXiv preprint arXiv:1208.2437 (2012)

  5. Dick, G., Rimoni, A.P., Whigham, P.A.: A re-examination of the use of genetic programming on the oral bioavailability problem. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1015–1022 (2015)

    Google Scholar 

  6. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Auto-sklearn: efficient and robust automated machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 113–134. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_6

    Chapter  Google Scholar 

  7. Foster, D., Karloff, H., Thaler, J.: Variable selection is hard. In: Conference on Learning Theory, pp. 696–709. PMLR (2015)

    Google Scholar 

  8. Jiménez, Á.B., Lázaro, J.L., Dorronsoro, J.R.: Finding optimal model parameters by deterministic and annealed focused grid search. Neurocomputing 72(13–15), 2824–2832 (2009)

    Article  Google Scholar 

  9. Jolliffe, I.T.: Principal components in regression analysis. In: Principal component analysis, pp. 129–155. Springer, New York (1986). https://doi.org/10.1007/978-1-4757-1904-8_8

  10. La Cava, W., Moore, J.: A general feature engineering wrapper for machine learning using \(\epsilon \)-Lexicase survival. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 80–95. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_6

    Chapter  Google Scholar 

  11. Lourenço, N., Assunção, F., Pereira, F.B., Costa, E., Machado, P.: Structured grammatical evolution: a dynamic approach. In: Ryan, C., O’Neill, M., Collins, J.J. (eds.) Handbook of Grammatical Evolution, pp. 137–161. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-78717-6_6

    Chapter  Google Scholar 

  12. Lourenço, N., Pereira, F.B., Costa, E.: Unveiling the properties of structured grammatical evolution. Genetic Program. Evolvable Mach. 17(3), 251–289 (2016). https://doi.org/10.1007/s10710-015-9262-4

    Article  Google Scholar 

  13. McDermott, J., et al.: Genetic programming needs better benchmarks. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, pp. 791–798 (2012)

    Google Scholar 

  14. Muharram, M.A., Smith, G.D.: The effect of evolved attributes on classification algorithms. In: Gedeon, T.T.D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 933–941. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24581-0_80

    Chapter  Google Scholar 

  15. Muharram, M.A., Smith, G.D.: Evolutionary feature construction using information gain and Gini index. In: Keijzer, M., O’Reilly, U.-M., Lucas, S., Costa, E., Soule, T. (eds.) EuroGP 2004. LNCS, vol. 3003, pp. 379–388. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24650-3_36

    Chapter  Google Scholar 

  16. Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 151–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_8

    Chapter  Google Scholar 

  17. Robnik-Šikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53(1), 23–69 (2003)

    Article  Google Scholar 

  18. de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16

    Chapter  Google Scholar 

  19. Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2015)

    Article  Google Scholar 

  20. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855 (2013)

    Google Scholar 

  21. Vamathevan, J., et al.: Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18(1), 463–477 (2019). https://doi.org/10.1038/s41573-019-0024-5

  22. Vanneschi, L., Silva, S., Castelli, M., Manzoni, L.: Geometric semantic genetic programming for real life applications. In: Riolo, R., Moore, J.H., Kotanchek, M. (eds.) Genetic Programming Theory and Practice XI. GEC, pp. 191–209. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-0375-7_11

    Chapter  Google Scholar 

  23. Whigham, P.A., et al.: Grammatically-based genetic programming. In: Proceedings of the Workshop on Genetic Programming: from Theory to Real-World Applications, vol. 16, pp. 33–41 (1995)

    Google Scholar 

  24. White, D.R., et al.: Better GP benchmarks: community survey results and proposals. Genetic Program. Evolvable Mach. 14(1), 3–29 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work was funded by FEDER funds through the Operational Programme Competitiveness Factors- COMPETE and national funds by FCT - Foundation for Science and Technology (POCI-01-0145-FEDER-029297, CISUC - UID/CEC/ 00326/2020) and within the scope of the project A4A: Audiology for All (CENTRO-01-0247-FEDER-047083) financed by the Operational Program for Competitiveness and Internationalisation of PORTUGAL 2020 through the European Regional Development Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuno Lourenço .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Monteiro, M., Lourenço, N., Pereira, F.B. (2021). FERMAT: Feature Engineering with Grammatical Evolution. In: Marreiros, G., Melo, F.S., Lau, N., Lopes Cardoso, H., Reis, L.P. (eds) Progress in Artificial Intelligence. EPIA 2021. Lecture Notes in Computer Science(), vol 12981. Springer, Cham. https://doi.org/10.1007/978-3-030-86230-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86230-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86229-9

  • Online ISBN: 978-3-030-86230-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics