Abstract
Autoencoders are powerful models for non-linear dimensionality reduction. However, their neural network structure makes it difficult to interpret how the high dimensional features relate to the low-dimensional embedding, which is an issue in applications where explainability is important. There have been attempts to replace both the neural network components in autoencoders with interpretable genetic programming (GP) models. However, for the purposes of interpretable dimensionality reduction, we observe that replacing only the encoder with GP is sufficient. In this work, we propose the Genetic Programming Encoder for Autoencoding (GPE-AE). GPE-AE uses a multi-tree GP individual as an encoder, while retaining the neural network decoder. We demonstrate that GPE-AE is a competitive non-linear dimensionality reduction technique compared to conventional autoencoders and a GP based method that does not use an autoencoder structure. As visualisation is a common goal for dimensionality reduction, we also evaluate the quality of visualisations produced by our method, and highlight the value of functional mappings by demonstrating insights that can be gained from interpreting the GP encoders.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/TPAMI.2013.50
Bi, Y., Xue, B., Zhang, M.: Evolving deep forest with automatic feature extraction for image classification using genetic programming. In: Bäck, T., et al. (eds.) PPSN 2020. LNCS, vol. 12269, pp. 3–18. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58112-1_1
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE). Geosci. Model Dev. Discuss. 7(1), 1525–1534 (2014)
Dua, D., Graff, C.: UCI machine learning repository (2017). https://archive.ics.uci.edu/ml
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, NIPS 2002, 9–14 December 2002, Vancouver, British Columbia, Canada], pp. 833–840. MIT Press (2002)
Jolliffe, I.T.: Principal Component Analysis. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1094–1096. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2_455
Kashef, S., Nezamabadi-pour, H.: An advanced ACO algorithm for feature subset selection. Neurocomputing 147, 271–279 (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). https://arxiv.org/abs/1412.6980
Leardi, R., Boggia, R., Terrile, M.: Genetic algorithms as a strategy for feature selection. J. Chemom. 6(5), 267–281 (1992)
Lee, J.A., Verleysen, M.: Nonlinear Dimensionality Reduction, vol. 1. Springer, New York (2007). https://doi.org/10.1007/978-0-387-39351-3
Lensen, A., Xue, B., Zhang, M.: Can genetic programming do manifold learning too? In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds.) EuroGP 2019. LNCS, vol. 11451, pp. 114–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16670-0_8
Lensen, A., Xue, B., Zhang, M.: Genetic programming for manifold learning: preserving local topology. IEEE Transactions on Evolutionary Computation, pp. 1–15 (2022). early Access
Lensen, A., Zhang, M., Xue, B.: Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genet. Program. Evolvable Mach. 21(3), 399–431 (2020). https://doi.org/10.1007/s10710-020-09375-4
van der Maaten, L.: Learning a parametric embedding by preserving local structure. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, AISTATS 2009, Clearwater Beach, Florida, USA, 16–18 April 2009. JMLR Proceedings, vol. 5, pp. 384–391. JMLR.org (2009)
McDermott, J.: Why is auto-encoding difficult for genetic programming? In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds.) EuroGP 2019. LNCS, vol. 11451, pp. 131–145. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16670-0_9
McInnes, L., Healy, J.: UMAP: uniform manifold approximation and projection for dimension reduction. CoRR abs/1802.03426 (2018)
Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming (2008). lulu.com, https://www.gp-field-guide.org.uk/
Rodriguez-Coayahuitl, L., Morales-Reyes, A., Escalante, H.J.: Evolving autoencoding structures through genetic programming. Genet. Program. Evolvable Mach. 20(3), 413–440 (2019). https://doi.org/10.1007/s10710-019-09354-4
Ruberto, S., Terragni, V., Moore, J.H.: Image feature learning with genetic programming. In: Bäck, T., et al. (eds.) PPSN 2020. LNCS, vol. 12270, pp. 63–78. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58115-2_5
Sainburg, T., McInnes, L., Gentner, T.Q.: Parametric UMAP embeddings for representation and semisupervised learning. Neural Comput. 33(11), 2881–2907 (2021)
Schofield, F., Lensen, A.: Using genetic programming to find functional mappings for UMAP embeddings. In: IEEE Congress on Evolutionary Computation, CEC 2021, Kraków, Poland, June 28–1 July 2021, pp. 704–711. IEEE (2021)
Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)
Uriot, T., Virgolin, M., Alderliesten, T., Bosman, P.: On genetic programming representations and fitness functions for interpretable dimensionality reduction (2022). https://arxiv.org/abs/2203.00528
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013). https://doi.org/10.1145/2641190.2641198
Xue, B., Zhang, M., Browne, W.N.: Multi-objective particle swarm optimisation (PSO) for feature selection. In: Genetic and Evolutionary Computation Conference, GECCO 2012, Philadelphia, PA, USA, 7–11 July 2012, pp. 81–88. ACM (2012)
Zhao, H.: A multi-objective genetic programming approach to developing pareto optimal decision trees. Decis. Support Syst. 43(3), 809–826 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schofield, F., Slyfield, L., Lensen, A. (2023). A Genetic Programming Encoder for Increasing Autoencoder Interpretability. In: Pappa, G., Giacobini, M., Vasicek, Z. (eds) Genetic Programming. EuroGP 2023. Lecture Notes in Computer Science, vol 13986. Springer, Cham. https://doi.org/10.1007/978-3-031-29573-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-29573-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29572-0
Online ISBN: 978-3-031-29573-7
eBook Packages: Computer ScienceComputer Science (R0)