Abstract
Manifold learning techniques have become increasingly valuable as data continues to grow in size. By discovering a lower-dimensional representation (embedding) of the structure of a dataset, manifold learning algorithms can substantially reduce the dimensionality of a dataset while preserving as much information as possible. However, state-of-the-art manifold learning algorithms are opaque in how they perform this transformation. Understanding the way in which the embedding relates to the original high-dimensional space is critical in exploratory data analysis. We previously proposed a Genetic Programming method that performed manifold learning by evolving mappings that are transparent and interpretable. This method required the dimensionality of the embedding to be known a priori, which makes it hard to use when little is known about a dataset. In this paper, we substantially extend our previous work, by introducing a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality. Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods, while also providing a range (front) of solutions that give different trade-offs between quality and dimensionality. Furthermore, the learned models are shown to often be simple and efficient, utilising only a small number of features in an interpretable manner.
Similar content being viewed by others
Notes
Data gathered using https://scholar.google.com/scholar?q=%22manifold+learning%22 on 15th October, 2019.
Technically, PCA does not perform manifold learning, as it does not perform non-linear dimensionality reduction, but it is still a useful baseline.
This was verified by examining the learned trees.
References
C.C. Aggarwal, C.K. Reddy, (eds.), Data Clustering: Algorithms and Applications. CRC Press (2014)
H. Al-Sahaf, Y. Bi, Q. Chen, A. Lensen, Y. Mei, Y. Sun, B. Tran, B. Xue, M. Zhang, A survey on evolutionary machine learning. J. R. Soc. N. Z. 49(2), 205–228 (2019). https://doi.org/10.1080/03036758.2019.1609052
Y. Bengio, A.C. Courville, P. Vincent, Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
N. Boric, P.A. Estévez, Genetic programming-based clustering using an information theoretic fitness measure, in Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 31–38 (2007)
L. Cayton, Algorithms for manifold learning. Technical Report, pp. 1–17, (2005)
V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)
A.L.V. Coelho, E. Fernandes, K. Faceli, Multi-objective design of hierarchical consensus functions for clustering ensembles via genetic programming. Decis. Support Syst. 51(4), 794–809 (2011)
P. Comon, C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications (Academic press, Cambridge, 2010)
J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
D. Dheeru, E. Karra Taniskidou, UCI machine learning repository (2017). http://archive.ics.uci.edu/ml. Accessed 15 Oct 2019
Y. Dodge, The Concise Encyclopedia of Statistics (Springer, Berlin, 2008)
J.G. Dy, C.E. Brodley, Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
D. Floreano, P. Dürr, C. Mattiussi, Neuroevolution: from architectures to learning. Evolut. Intell. 1(1), 47–62 (2008)
T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd edn. (Springer, Berlin, 2009)
W. Hsu, Y. Zhang, J.R. Glass, Unsupervised learning of disentangled and interpretable representations from sequential data. In Proceedings of the Advances in Neural Information Processing Systems (NIPS) vol. 30, pp. 1876–1887 (2017)
I. Icke, A. Rosenberg, Multi-objective genetic programming for visual analytics. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 322–334 (2011)
I.T. Jolliffe, Principal component analysis, in International Encyclopedia of Statistical Science, pp. 1094–1096. Springer (2011)
M.A. Kramer, Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991). https://doi.org/10.1002/aic.690370209
J.B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
J.A. Lee, M. Verleysen, Nonlinear Dimensionality Reduction (Springer, Berlin, 2007)
A. Lensen, B. Xue, M. Zhang, Automatically evolving difficult benchmark feature selection datasets with genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp. 458–465. ACM (2018)
A. Lensen, B. Xue, M. Zhang, Can genetic programming do manifold learning too? in Proceedings of the European Conference on Genetic Programming (EuroGP). Lecture Notes in Computer Science, vol. 11451, pp. 114–130. Springer (2019)
H. Liu, H. Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective (Springer, Berlin, 1998)
H. Liu, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454 (Springer, Berlin, 2012)
H. Liu, L. Yu, Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
A.D. Lorenzo, E. Medvet, T. Tusar, A. Bartoli, An analysis of dimensionality reduction techniques for visualizing evolution, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, (GECCO), pp. 1864–1872 (2019)
J. McDermott, Why is auto-encoding difficult for genetic programming? in Proceedings of the European Conference on Genetic Programming (EuroGP). Lecture Notes in Computer Science, vol. 11451, pp. 131–145. Springer (2019)
L. McInnes, J. Healy, J. Melville, UMAP: uniform manifold approximation and projection for dimension reduction. arXiv e-prints arXiv:1802.03426 (2018)
K. Michalak, Low-dimensional euclidean embedding for visualization of search spaces in combinatorial optimization. IEEE Trans. Evolut. Comput. 23(2), 232–246 (2019)
W.J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, B. Yu, Interpretable machine learning: definitions, methods, and applications. arXiv e-prints arXiv:1901.04592 (2019)
K. Neshatian, M. Zhang, P. Andreae, A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evolut. Comput. 16(5), 645–661 (2012)
S. Nguyen, M. Zhang, D. Alahakoon, K.C. Tan, Visualizing the evolution of computer programs for genetic programming [research frontier]. IEEE Comput. Intell. Mag. 13(4), 77–94 (2018)
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
K.V. Price, Differential evolution, in In Handbook of Optimization—From Classical to Modern Approach, pp. 187–214 (2013)
L. Rodriguez-Coayahuitl, A. Morales-Reyes, H.J. Escalante, Structurally layered representation learning: Towards deep learning through genetic programming, in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 271–288 (2018)
S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
C. Shand, R. Allmendinger, J. Handl, A.M. Webb, J. Keane, Evolving controllably difficult datasets for clustering, in Proceedings of the Genetic and Evolutionary Computation Conference, (GECCO), pp. 463–471 (2019)
P. Sondhi, Feature construction methods: a survey. Technical report, Univeristy of Illinois at Urbana Champaign, Urbana, Illinois, USA (2009)
Y. Sun, G.G. Yen, Z. Yi, Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evolut. Comput. 23(1), 89–103 (2019)
J. Tang, S. Alelyani, H. Liu, Feature selection for classification: a review, in Data Classification: Algorithms and Applications, pp. 37–64. CRC Press (2014)
B. Tran, B. Xue, M. Zhang, Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)
L. van der Maaten, G.E. Hinton, Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evolut. Comput. 20(4), 606–626 (2016). https://doi.org/10.1109/TEVC.2015.2504420
C. Zhang, C. Liu, X. Zhang, G. Almpanidis, An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)
Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evolut. Comput. 11(6), 712–731 (2007)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lensen, A., Zhang, M. & Xue, B. Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genet Program Evolvable Mach 21, 399–431 (2020). https://doi.org/10.1007/s10710-020-09375-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-020-09375-4