Abstract
Exploratory data analysis is a fundamental aspect of knowledge discovery that aims to find the main characteristics of a dataset. Dimensionality reduction, such as manifold learning, is often used to reduce the number of features in a dataset to a manageable level for human interpretation. Despite this, most manifold learning techniques do not explain anything about the original features nor the true characteristics of a dataset. In this paper, we propose a genetic programming approach to manifold learning called GP-MaL which evolves functional mappings from a high-dimensional space to a lower dimensional space through the use of interpretable trees. We show that GP-MaL is competitive with existing manifold learning algorithms, while producing models that can be interpreted and re-used on unseen data. A number of promising future directions of research are found in the process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
An embedding here refers to the low-dimensional representation of the structure present in a dataset.
- 2.
Here, neighbours refer to the closest instances to a point by (Euclidean) distance.
- 3.
Five inputs were found to be a good balance between encouraging wider trees and minimising computing resources required.
- 4.
Information gain (mutual information) is often used in feature selection for classification to measure the dependency between a feature and the class label.
References
Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Cano, A., Ventura, S., Cios, K.J.: Multi-objective genetic programming for feature extraction and data visualization. Soft Comput. 21(8), 2069–2089 (2017)
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
François, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 19(7), 873–886 (2007)
Jolliffe, I.T.: Principal component analysis. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1094–1096. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-04898-2
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
Lensen, A., Xue, B., Zhang, M.: New representations in genetic programming for feature construction in k-means clustering. In: Shi, Y., et al. (eds.) SEAL 2017. LNCS, vol. 10593, pp. 543–555. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68759-9_44
Lensen, A., Xue, B., Zhang, M.: Automatically evolving difficult benchmark feature selection datasets with genetic programming. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp. 458–465. ACM (2018)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454. Springer, Boston (2012). https://doi.org/10.1007/978-1-4615-5689-3
van der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)
van der Maaten, L., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Neshatian, K., Zhang, M., Andreae, P.: A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evol. Comput. 16(5), 645–661 (2012)
Nguyen, S., Zhang, M., Alahakoon, D., Tan, K.C.: Visualizing the evolution of computer programs for genetic programming [research frontier]. IEEE Comput. Intell. Mag. 13(4), 77–94 (2018)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Poli, R., Langdon, W.B., McPhee, N.F.: A Field Guide to Genetic Programming. Lulu.com, Morrisville (2008)
Rodriguez-Coayahuitl, L., Morales-Reyes, A., Escalante, H.J.: Structurally layered representation learning: towards deep learning through genetic programming. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds.) EuroGP 2018. LNCS, vol. 10781, pp. 271–288. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77553-1_17
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Sun, Y., Xue, B., Zhang, M., Yen, G.G.: A particle swarm optimization-based flexible convolutional auto-encoder for image classification. IEEE Trans. Neural Netw. Learn. Syst. (2018). https://doi.org/10.1109/TNNLS.2018.2881143
Sun, Y., Yen, G.G., Yi, Z.: Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evol. Comput. (2018). https://doi.org/10.1109/TEVC.2018.2808689
Tran, B., Xue, B., Zhang, M.: Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)
Zhang, C., Liu, C., Zhang, X., Almpanidis, G.: An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lensen, A., Xue, B., Zhang, M. (2019). Can Genetic Programming Do Manifold Learning Too?. In: Sekanina, L., Hu, T., Lourenço, N., Richter, H., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2019. Lecture Notes in Computer Science(), vol 11451. Springer, Cham. https://doi.org/10.1007/978-3-030-16670-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-16670-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16669-4
Online ISBN: 978-3-030-16670-0
eBook Packages: Computer ScienceComputer Science (R0)