Multi-objective genetic programming for manifold learning: balancing quality and dimensionality

Lensen, Andrew; Zhang, Mengjie; Xue, Bing

doi:10.1007/s10710-020-09375-4

Multi-objective genetic programming for manifold learning: balancing quality and dimensionality

Published: 05 February 2020

Volume 21, pages 399–431, (2020)
Cite this article

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

489 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Manifold learning techniques have become increasingly valuable as data continues to grow in size. By discovering a lower-dimensional representation (embedding) of the structure of a dataset, manifold learning algorithms can substantially reduce the dimensionality of a dataset while preserving as much information as possible. However, state-of-the-art manifold learning algorithms are opaque in how they perform this transformation. Understanding the way in which the embedding relates to the original high-dimensional space is critical in exploratory data analysis. We previously proposed a Genetic Programming method that performed manifold learning by evolving mappings that are transparent and interpretable. This method required the dimensionality of the embedding to be known a priori, which makes it hard to use when little is known about a dataset. In this paper, we substantially extend our previous work, by introducing a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality. Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods, while also providing a range (front) of solutions that give different trade-offs between quality and dimensionality. Furthermore, the learned models are shown to often be simple and efficient, utilising only a small number of features in an interpretable manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can Genetic Programming Do Manifold Learning Too?

Benchmarking Manifold Learning Methods on a Large Collection of Datasets

An Evolutionary Multiobjective Optimization Algorithm Based on Manifold Learning

Notes

Data gathered using https://scholar.google.com/scholar?q=%22manifold+learning%22 on 15th October, 2019.
Technically, PCA does not perform manifold learning, as it does not perform non-linear dimensionality reduction, but it is still a useful baseline.
This was verified by examining the learned trees.

References

C.C. Aggarwal, C.K. Reddy, (eds.), Data Clustering: Algorithms and Applications. CRC Press (2014)
H. Al-Sahaf, Y. Bi, Q. Chen, A. Lensen, Y. Mei, Y. Sun, B. Tran, B. Xue, M. Zhang, A survey on evolutionary machine learning. J. R. Soc. N. Z. 49(2), 205–228 (2019). https://doi.org/10.1080/03036758.2019.1609052
Article Google Scholar
Y. Bengio, A.C. Courville, P. Vincent, Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
N. Boric, P.A. Estévez, Genetic programming-based clustering using an information theoretic fitness measure, in Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 31–38 (2007)
L. Cayton, Algorithms for manifold learning. Technical Report, pp. 1–17, (2005)
V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)
Article Google Scholar
A.L.V. Coelho, E. Fernandes, K. Faceli, Multi-objective design of hierarchical consensus functions for clustering ensembles via genetic programming. Decis. Support Syst. 51(4), 794–809 (2011)
Article Google Scholar
P. Comon, C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications (Academic press, Cambridge, 2010)
Google Scholar
J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
D. Dheeru, E. Karra Taniskidou, UCI machine learning repository (2017). http://archive.ics.uci.edu/ml. Accessed 15 Oct 2019
Y. Dodge, The Concise Encyclopedia of Statistics (Springer, Berlin, 2008)
MATH Google Scholar
J.G. Dy, C.E. Brodley, Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
MathSciNet MATH Google Scholar
D. Floreano, P. Dürr, C. Mattiussi, Neuroevolution: from architectures to learning. Evolut. Intell. 1(1), 47–62 (2008)
Article Google Scholar
T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd edn. (Springer, Berlin, 2009)
Book MATH Google Scholar
W. Hsu, Y. Zhang, J.R. Glass, Unsupervised learning of disentangled and interpretable representations from sequential data. In Proceedings of the Advances in Neural Information Processing Systems (NIPS) vol. 30, pp. 1876–1887 (2017)
I. Icke, A. Rosenberg, Multi-objective genetic programming for visual analytics. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 322–334 (2011)
I.T. Jolliffe, Principal component analysis, in International Encyclopedia of Statistical Science, pp. 1094–1096. Springer (2011)
M.A. Kramer, Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991). https://doi.org/10.1002/aic.690370209
Article Google Scholar
J.B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
Article MathSciNet MATH Google Scholar
J.A. Lee, M. Verleysen, Nonlinear Dimensionality Reduction (Springer, Berlin, 2007)
Book MATH Google Scholar
A. Lensen, B. Xue, M. Zhang, Automatically evolving difficult benchmark feature selection datasets with genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp. 458–465. ACM (2018)
A. Lensen, B. Xue, M. Zhang, Can genetic programming do manifold learning too? in Proceedings of the European Conference on Genetic Programming (EuroGP). Lecture Notes in Computer Science, vol. 11451, pp. 114–130. Springer (2019)
H. Liu, H. Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective (Springer, Berlin, 1998)
Book MATH Google Scholar
H. Liu, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454 (Springer, Berlin, 2012)
MATH Google Scholar
H. Liu, L. Yu, Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Article Google Scholar
A.D. Lorenzo, E. Medvet, T. Tusar, A. Bartoli, An analysis of dimensionality reduction techniques for visualizing evolution, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, (GECCO), pp. 1864–1872 (2019)
J. McDermott, Why is auto-encoding difficult for genetic programming? in Proceedings of the European Conference on Genetic Programming (EuroGP). Lecture Notes in Computer Science, vol. 11451, pp. 131–145. Springer (2019)
L. McInnes, J. Healy, J. Melville, UMAP: uniform manifold approximation and projection for dimension reduction. arXiv e-prints arXiv:1802.03426 (2018)
K. Michalak, Low-dimensional euclidean embedding for visualization of search spaces in combinatorial optimization. IEEE Trans. Evolut. Comput. 23(2), 232–246 (2019)
Article Google Scholar
W.J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, B. Yu, Interpretable machine learning: definitions, methods, and applications. arXiv e-prints arXiv:1901.04592 (2019)
K. Neshatian, M. Zhang, P. Andreae, A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evolut. Comput. 16(5), 645–661 (2012)
Article Google Scholar
S. Nguyen, M. Zhang, D. Alahakoon, K.C. Tan, Visualizing the evolution of computer programs for genetic programming [research frontier]. IEEE Comput. Intell. Mag. 13(4), 77–94 (2018)
Article Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
K.V. Price, Differential evolution, in In Handbook of Optimization—From Classical to Modern Approach, pp. 187–214 (2013)
L. Rodriguez-Coayahuitl, A. Morales-Reyes, H.J. Escalante, Structurally layered representation learning: Towards deep learning through genetic programming, in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 271–288 (2018)
S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
C. Shand, R. Allmendinger, J. Handl, A.M. Webb, J. Keane, Evolving controllably difficult datasets for clustering, in Proceedings of the Genetic and Evolutionary Computation Conference, (GECCO), pp. 463–471 (2019)
P. Sondhi, Feature construction methods: a survey. Technical report, Univeristy of Illinois at Urbana Champaign, Urbana, Illinois, USA (2009)
Y. Sun, G.G. Yen, Z. Yi, Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evolut. Comput. 23(1), 89–103 (2019)
Article Google Scholar
J. Tang, S. Alelyani, H. Liu, Feature selection for classification: a review, in Data Classification: Algorithms and Applications, pp. 37–64. CRC Press (2014)
B. Tran, B. Xue, M. Zhang, Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)
Article Google Scholar
L. van der Maaten, G.E. Hinton, Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evolut. Comput. 20(4), 606–626 (2016). https://doi.org/10.1109/TEVC.2015.2504420
Article Google Scholar
C. Zhang, C. Liu, X. Zhang, G. Almpanidis, An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)
Article Google Scholar
Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evolut. Comput. 11(6), 712–731 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, Wellington, 6140, New Zealand
Andrew Lensen, Mengjie Zhang & Bing Xue

Authors

Andrew Lensen
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bing Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Lensen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lensen, A., Zhang, M. & Xue, B. Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genet Program Evolvable Mach 21, 399–431 (2020). https://doi.org/10.1007/s10710-020-09375-4

Download citation

Received: 15 October 2019
Revised: 18 December 2019
Published: 05 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10710-020-09375-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective genetic programming for manifold learning: balancing quality and dimensionality

Abstract

Access this article

Similar content being viewed by others

Can Genetic Programming Do Manifold Learning Too?

Benchmarking Manifold Learning Methods on a Large Collection of Datasets

An Evolutionary Multiobjective Optimization Algorithm Based on Manifold Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-objective genetic programming for manifold learning: balancing quality and dimensionality

Abstract

Access this article

Similar content being viewed by others

Can Genetic Programming Do Manifold Learning Too?

Benchmarking Manifold Learning Methods on a Large Collection of Datasets

An Evolutionary Multiobjective Optimization Algorithm Based on Manifold Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation