Abstract
Feature Learning (FL) is key to well-performing machine learning models. However, the most popular FL methods lack interpretability, which is becoming a critical requirement of Machine Learning. We propose to incorporate information from the problem domain in the structure of programs on top of the existing M3GP approach. This technique, named Domain-Knowledge M3GP, works by defining the possible feature transformations using a grammar through Grammar-Guided Genetic Programming. While requiring the user to specify the domain knowledge, this approach has the advantage of limiting the search space, excluding programs that make no sense to humans. We extend this approach with the possibility of introducing complex, aggregating queries over historic data. This extension allows to expand the search space to include relevant programs that were not possible before. We evaluate our methods on performance and interpretability in 6 use cases, showing promising results in both areas. We conclude that performance and interpretability of FL methods can benefit from domain-knowledge incorporation and aggregation, and give guidelines on when to use them.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
Aguiñaga, A.R., Delgado, L.M., López-López, V.R., Téllez, A.C.: EEG-based emotion recognition using deep learning and M3GP. Appl. Sci. 12(5), 2527 (2022)
Amin, M., Ali, A.: Performance evaluation of supervised machine learning classifiers for predicting healthcare operational decisions. Wavy AI Research Foundation: Lahore, Pakistan, vol. 90 (2018)
Arnaldo, I., O’Reilly, U.M., Veeramachaneni, K.: Building predictive models via feature synthesis. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 983–990. Association for Computing Machinery, New York (2015)
Arroba, P., Risco-Martín, J.L., Zapater, M., Moya, J.M., Ayala, J.L.: Enhancing regression models for complex systems using evolutionary techniques for feature engineering. J. Grid Comput. 13(3), 409–423 (2015)
Azzali, I., Vanneschi, L., Silva, S., Bakurov, I., Giacobini, M.: A vectorial approach to genetic programming. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds.) EuroGP 2019. LNCS, vol. 11451, pp. 213–227. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16670-0_14
Bacardit, J., Brownlee, A., Cagnoni, S., Iacca, G., McCall, J., Walker, D.: The intersection of evolutionary computation and explainable AI. In: Genetic and Evolutionary Computation Conference: GECCO 2022. ACM (2022)
Batista, J.E., Cabral, A.I., Vasconcelos, M.J., Vanneschi, L., Silva, S.: Improving land cover classification using genetic programming for feature construction. Remote Sens. 13(9), 1623 (2021)
Batista, J.E., Silva, S.: Comparative study of classifier performance using automatic feature construction by M3GP (2022). https://doi.org/10.1109/CEC55065.2022.9870343
Boddu, J.: Boom bikes demand analysis (2022). https://www.kaggle.com/code/jayantb1019/boom-bikes-demand-analysis/data
Burlacu, B., Kronberger, G., Kommenda, M.: Operon C++: an efficient genetic programming framework for symbolic regression, pp. 1562–1570. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3377929.3398099
La Cava, W., Silva, S., Vanneschi, L., Spector, L., Moore, J.: Genetic programming representations for multi-dimensional feature learning in biomedical classification. In: Squillero, G., Sim, K. (eds.) EvoApplications 2017. LNCS, vol. 10199, pp. 158–173. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55849-3_11
Cherrier, N., Poli, J.P., Defurne, M., Sabatié, F.: Consistent feature construction with constrained genetic programming for experimental physics. In: 2019 IEEE Congress on Evolutionary Computation (CEC), Paris, France, pp. 1650–1658. IEEE (2019)
Detrano, R., et al.: International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 64(5), 304–310 (1989)
Dong, G., Liu, H.: Feature Engineering for Machine Learning and Data Analytics. CRC Press, Boca Raton (2018)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Espada, G., Ingelse, L., Canelas, P., Barbosa, P., Fonseca, A.: Data types as a more ergonomic frontend for grammar-guided genetic programming. In: Scholz, B., Kameyama, Y. (eds.) Proceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2022, Auckland, New Zealand, 6–7 December 2022, pp. 86–94. ACM (2022). https://doi.org/10.1145/3564719.3568697
Horn, F., Pack, R., Rieger, M.: The autofeat Python library for automated feature engineering and selection. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1167, pp. 111–120. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43823-4_10
Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice. OTexts (2018)
Ingelse, L., Espada, G., Fonseca, A.: Benchmarking representations of individuals in grammar-guided genetic programming. Evo* 2022, p. 5 (2022)
Kanter, J.M., Veeramachaneni, K.: Deep feature synthesis: towards automating data science endeavors. In: 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, 19–21 October 2015, pp. 1–10. IEEE (2015)
Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
La Cava, W., Singh, T.R., Taggart, J., Suri, S., Moore, J.H.: Learning concise representations for regression by evolving networks of trees. arXiv preprint arXiv:1807.00981 (2018)
Li, Y., Yang, C.: Domain knowledge based explainable feature construction method and its application in ironmaking process. Eng. Appl. Artif. Intell. 100, 104197 (2021). https://doi.org/10.1016/j.engappai.2021.104197
Lourenço, N., Pereira, F.B., Costa, E.: SGE: a structured representation for grammatical evolution. In: Bonnevay, S., Legrand, P., Monmarché, N., Lutton, E., Schoenauer, M. (eds.) EA 2015. LNCS, vol. 9554, pp. 136–148. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31471-6_11
Lourenço, N., Ferrer, J., Pereira, F.B., Costa, E.: A comparative study of different grammar-based genetic programming approaches. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 311–325. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_20
Mégane, J., Lourenço, N., Machado, P.: Probabilistic grammatical evolution. In: Hu, T., Lourenço, N., Medvet, E. (eds.) EuroGP 2021. LNCS, vol. 12691, pp. 198–213. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72812-0_13
Muñoz, L., Silva, S., Trujillo, L.: M3GP – multiclass classification with GP. In: Machado, P., et al. (eds.) EuroGP 2015. LNCS, vol. 9025, pp. 78–91. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16501-1_7
Nau, B.: Daily website visitors (time series regression) (2022). https://www.kaggle.com/datasets/bobnau/daily-website-visitors/metadata
Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H.: PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min. 10(1), 36 (2017). https://doi.org/10.1186/s13040-017-0154-4
Ryan, C., Collins, J.J., Neill, M.O.: Grammatical evolution: evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–96. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0055930
Song, H.: AutoFE: efficient and robust automated feature engineering. Ph.D. thesis, Massachusetts Institute of Technology (2018)
Sovrano, F., Sapienza, S., Palmirani, M., Vitali, F.: Metrics, explainability and the European AI act proposal. J 5(1), 126–138 (2022)
Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation, pp. 401–408 (2012)
Tran, B., Xue, B., Zhang, M.: Class dependent multiple feature construction using genetic programming for high-dimensional data. In: Peng, W., Alahakoon, D., Li, X. (eds.) AI 2017. LNCS (LNAI), vol. 10400, pp. 182–194. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63004-5_15
Uriot, T., Virgolin, M., Alderliesten, T., Bosman, P.A.: On genetic programming representations and fitness functions for interpretable dimensionality reduction. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 458–466 (2022)
Virgolin, M., Alderliesten, T., Witteveen, C., Bosman, P.A.N.: Improving model-based genetic programming for symbolic regression of small expressions. Evol. Comput. 29(2), 211–237 (2021)
Whigham, P.A.: Search bias, language bias, and genetic programming. Genet. Program. 1996, 230–237 (1996)
Whigham, P.A., Dick, G., Maclaurin, J., Owen, C.A.: Examining the “best of both worlds” of grammatical evolution. In: Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, pp. 1111–1118 (2015)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945). http://www.jstor.org/stable/3001968
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Zou, J., Schiebinger, L.: AI can be sexist and racist-it’s time to make it fair (2018)
Zytek, A., Arnaldo, I., Liu, D., Berti-Equille, L., Veeramachaneni, K.: The need for interpretable features: motivation and taxonomy. arXiv preprint arXiv:2202.11748 (2022)
Acknowledgements
This work was supported by Fundação para a Ciência e Tecnologia (FCT) in the LASIGE Research Unit under the ref. UIDB/00408/2020 and UIDP/00408/2020, by the CMU-Portugal project CAMELOT (LISBOA-01-0247-FEDER- 045915), the RAP project under the reference (EXPL/CCI-COM/1306/2021), and FCT Advanced Computing projects (2022.15800.CPCA.A1, CPCA/A1/402869/2021, CPCA/A2/6009/2020, and CPCA/A1/5613/2020). We thank Sara Silva for her feedback and José Eduardo Madeira for his help implementing the grammars.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ingelse, L., Fonseca, A. (2023). Domain-Aware Feature Learning with Grammar-Guided Genetic Programming. In: Pappa, G., Giacobini, M., Vasicek, Z. (eds) Genetic Programming. EuroGP 2023. Lecture Notes in Computer Science, vol 13986. Springer, Cham. https://doi.org/10.1007/978-3-031-29573-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-29573-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29572-0
Online ISBN: 978-3-031-29573-7
eBook Packages: Computer ScienceComputer Science (R0)