Skip to main content

Domain-Aware Feature Learning with Grammar-Guided Genetic Programming

  • Conference paper
  • First Online:
Genetic Programming (EuroGP 2023)

Abstract

Feature Learning (FL) is key to well-performing machine learning models. However, the most popular FL methods lack interpretability, which is becoming a critical requirement of Machine Learning. We propose to incorporate information from the problem domain in the structure of programs on top of the existing M3GP approach. This technique, named Domain-Knowledge M3GP, works by defining the possible feature transformations using a grammar through Grammar-Guided Genetic Programming. While requiring the user to specify the domain knowledge, this approach has the advantage of limiting the search space, excluding programs that make no sense to humans. We extend this approach with the possibility of introducing complex, aggregating queries over historic data. This extension allows to expand the search space to include relevant programs that were not possible before. We evaluate our methods on performance and interpretability in 6 use cases, showing promising results in both areas. We conclude that performance and interpretability of FL methods can benefit from domain-knowledge incorporation and aggregation, and give guidelines on when to use them.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)

    Article  Google Scholar 

  2. Aguiñaga, A.R., Delgado, L.M., López-López, V.R., Téllez, A.C.: EEG-based emotion recognition using deep learning and M3GP. Appl. Sci. 12(5), 2527 (2022)

    Article  Google Scholar 

  3. Amin, M., Ali, A.: Performance evaluation of supervised machine learning classifiers for predicting healthcare operational decisions. Wavy AI Research Foundation: Lahore, Pakistan, vol. 90 (2018)

    Google Scholar 

  4. Arnaldo, I., O’Reilly, U.M., Veeramachaneni, K.: Building predictive models via feature synthesis. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 983–990. Association for Computing Machinery, New York (2015)

    Google Scholar 

  5. Arroba, P., Risco-Martín, J.L., Zapater, M., Moya, J.M., Ayala, J.L.: Enhancing regression models for complex systems using evolutionary techniques for feature engineering. J. Grid Comput. 13(3), 409–423 (2015)

    Article  Google Scholar 

  6. Azzali, I., Vanneschi, L., Silva, S., Bakurov, I., Giacobini, M.: A vectorial approach to genetic programming. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds.) EuroGP 2019. LNCS, vol. 11451, pp. 213–227. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16670-0_14

    Chapter  Google Scholar 

  7. Bacardit, J., Brownlee, A., Cagnoni, S., Iacca, G., McCall, J., Walker, D.: The intersection of evolutionary computation and explainable AI. In: Genetic and Evolutionary Computation Conference: GECCO 2022. ACM (2022)

    Google Scholar 

  8. Batista, J.E., Cabral, A.I., Vasconcelos, M.J., Vanneschi, L., Silva, S.: Improving land cover classification using genetic programming for feature construction. Remote Sens. 13(9), 1623 (2021)

    Article  Google Scholar 

  9. Batista, J.E., Silva, S.: Comparative study of classifier performance using automatic feature construction by M3GP (2022). https://doi.org/10.1109/CEC55065.2022.9870343

  10. Boddu, J.: Boom bikes demand analysis (2022). https://www.kaggle.com/code/jayantb1019/boom-bikes-demand-analysis/data

  11. Burlacu, B., Kronberger, G., Kommenda, M.: Operon C++: an efficient genetic programming framework for symbolic regression, pp. 1562–1570. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3377929.3398099

  12. La Cava, W., Silva, S., Vanneschi, L., Spector, L., Moore, J.: Genetic programming representations for multi-dimensional feature learning in biomedical classification. In: Squillero, G., Sim, K. (eds.) EvoApplications 2017. LNCS, vol. 10199, pp. 158–173. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55849-3_11

    Chapter  Google Scholar 

  13. Cherrier, N., Poli, J.P., Defurne, M., Sabatié, F.: Consistent feature construction with constrained genetic programming for experimental physics. In: 2019 IEEE Congress on Evolutionary Computation (CEC), Paris, France, pp. 1650–1658. IEEE (2019)

    Google Scholar 

  14. Detrano, R., et al.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64(5), 304–310 (1989)

    Article  Google Scholar 

  15. Dong, G., Liu, H.: Feature Engineering for Machine Learning and Data Analytics. CRC Press, Boca Raton (2018)

    Google Scholar 

  16. Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml

  17. Espada, G., Ingelse, L., Canelas, P., Barbosa, P., Fonseca, A.: Data types as a more ergonomic frontend for grammar-guided genetic programming. In: Scholz, B., Kameyama, Y. (eds.) Proceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2022, Auckland, New Zealand, 6–7 December 2022, pp. 86–94. ACM (2022). https://doi.org/10.1145/3564719.3568697

  18. Horn, F., Pack, R., Rieger, M.: The autofeat Python library for automated feature engineering and selection. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1167, pp. 111–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43823-4_10

    Chapter  Google Scholar 

  19. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. OTexts (2018)

    Google Scholar 

  20. Ingelse, L., Espada, G., Fonseca, A.: Benchmarking representations of individuals in grammar-guided genetic programming. Evo* 2022, p. 5 (2022)

    Google Scholar 

  21. Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, 19–21 October 2015, pp. 1–10. IEEE (2015)

    Google Scholar 

  22. Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)

    Article  Google Scholar 

  23. La Cava, W., Singh, T.R., Taggart, J., Suri, S., Moore, J.H.: Learning concise representations for regression by evolving networks of trees. arXiv preprint arXiv:1807.00981 (2018)

  24. Li, Y., Yang, C.: Domain knowledge based explainable feature construction method and its application in ironmaking process. Eng. Appl. Artif. Intell. 100, 104197 (2021). https://doi.org/10.1016/j.engappai.2021.104197

    Article  Google Scholar 

  25. Lourenço, N., Pereira, F.B., Costa, E.: SGE: a structured representation for grammatical evolution. In: Bonnevay, S., Legrand, P., Monmarché, N., Lutton, E., Schoenauer, M. (eds.) EA 2015. LNCS, vol. 9554, pp. 136–148. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31471-6_11

    Chapter  Google Scholar 

  26. Lourenço, N., Ferrer, J., Pereira, F.B., Costa, E.: A comparative study of different grammar-based genetic programming approaches. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 311–325. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_20

    Chapter  Google Scholar 

  27. Mégane, J., Lourenço, N., Machado, P.: Probabilistic grammatical evolution. In: Hu, T., Lourenço, N., Medvet, E. (eds.) EuroGP 2021. LNCS, vol. 12691, pp. 198–213. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72812-0_13

    Chapter  Google Scholar 

  28. Muñoz, L., Silva, S., Trujillo, L.: M3GP – multiclass classification with GP. In: Machado, P., et al. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 78–91. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_7

    Chapter  Google Scholar 

  29. Nau, B.: Daily website visitors (time series regression) (2022). https://www.kaggle.com/datasets/bobnau/daily-website-visitors/metadata

  30. Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H.: PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min. 10(1), 36 (2017). https://doi.org/10.1186/s13040-017-0154-4

    Article  Google Scholar 

  31. Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930

    Chapter  Google Scholar 

  32. Song, H.: AutoFE: efficient and robust automated feature engineering. Ph.D. thesis, Massachusetts Institute of Technology (2018)

    Google Scholar 

  33. Sovrano, F., Sapienza, S., Palmirani, M., Vitali, F.: Metrics, explainability and the European AI act proposal. J 5(1), 126–138 (2022)

    Google Scholar 

  34. Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 401–408 (2012)

    Google Scholar 

  35. Tran, B., Xue, B., Zhang, M.: Class dependent multiple feature construction using genetic programming for high-dimensional data. In: Peng, W., Alahakoon, D., Li, X. (eds.) AI 2017. LNCS (LNAI), vol. 10400, pp. 182–194. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63004-5_15

    Chapter  Google Scholar 

  36. Uriot, T., Virgolin, M., Alderliesten, T., Bosman, P.A.: On genetic programming representations and fitness functions for interpretable dimensionality reduction. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 458–466 (2022)

    Google Scholar 

  37. Virgolin, M., Alderliesten, T., Witteveen, C., Bosman, P.A.N.: Improving model-based genetic programming for symbolic regression of small expressions. Evol. Comput. 29(2), 211–237 (2021)

    Article  Google Scholar 

  38. Whigham, P.A.: Search bias, language bias, and genetic programming. Genet. Program. 1996, 230–237 (1996)

    Google Scholar 

  39. Whigham, P.A., Dick, G., Maclaurin, J., Owen, C.A.: Examining the “best of both worlds” of grammatical evolution. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1111–1118 (2015)

    Google Scholar 

  40. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945). http://www.jstor.org/stable/3001968

  41. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)

    Article  Google Scholar 

  42. Zou, J., Schiebinger, L.: AI can be sexist and racist-it’s time to make it fair (2018)

    Google Scholar 

  43. Zytek, A., Arnaldo, I., Liu, D., Berti-Equille, L., Veeramachaneni, K.: The need for interpretable features: motivation and taxonomy. arXiv preprint arXiv:2202.11748 (2022)

Download references

Acknowledgements

This work was supported by Fundação para a Ciência e Tecnologia (FCT) in the LASIGE Research Unit under the ref. UIDB/00408/2020 and UIDP/00408/2020, by the CMU-Portugal project CAMELOT (LISBOA-01-0247-FEDER- 045915), the RAP project under the reference (EXPL/CCI-COM/1306/2021), and FCT Advanced Computing projects (2022.15800.CPCA.A1, CPCA/A1/402869/2021, CPCA/A2/6009/2020, and CPCA/A1/5613/2020). We thank Sara Silva for her feedback and José Eduardo Madeira for his help implementing the grammars.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alcides Fonseca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ingelse, L., Fonseca, A. (2023). Domain-Aware Feature Learning with Grammar-Guided Genetic Programming. In: Pappa, G., Giacobini, M., Vasicek, Z. (eds) Genetic Programming. EuroGP 2023. Lecture Notes in Computer Science, vol 13986. Springer, Cham. https://doi.org/10.1007/978-3-031-29573-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29573-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29572-0

  • Online ISBN: 978-3-031-29573-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics