Skip to main content

Advertisement

Log in

Multi-objective genetic programming for manifold learning: balancing quality and dimensionality

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

Manifold learning techniques have become increasingly valuable as data continues to grow in size. By discovering a lower-dimensional representation (embedding) of the structure of a dataset, manifold learning algorithms can substantially reduce the dimensionality of a dataset while preserving as much information as possible. However, state-of-the-art manifold learning algorithms are opaque in how they perform this transformation. Understanding the way in which the embedding relates to the original high-dimensional space is critical in exploratory data analysis. We previously proposed a Genetic Programming method that performed manifold learning by evolving mappings that are transparent and interpretable. This method required the dimensionality of the embedding to be known a priori, which makes it hard to use when little is known about a dataset. In this paper, we substantially extend our previous work, by introducing a multi-objective approach that automatically balances the competing objectives of manifold quality and dimensionality. Our proposed approach is competitive with a range of baseline and state-of-the-art manifold learning methods, while also providing a range (front) of solutions that give different trade-offs between quality and dimensionality. Furthermore, the learned models are shown to often be simple and efficient, utilising only a small number of features in an interpretable manner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. Data gathered using https://scholar.google.com/scholar?q=%22manifold+learning%22 on 15th October, 2019.

  2. Technically, PCA does not perform manifold learning, as it does not perform non-linear dimensionality reduction, but it is still a useful baseline.

  3. This was verified by examining the learned trees.

References

  1. C.C. Aggarwal, C.K. Reddy, (eds.), Data Clustering: Algorithms and Applications. CRC Press (2014)

  2. H. Al-Sahaf, Y. Bi, Q. Chen, A. Lensen, Y. Mei, Y. Sun, B. Tran, B. Xue, M. Zhang, A survey on evolutionary machine learning. J. R. Soc. N. Z. 49(2), 205–228 (2019). https://doi.org/10.1080/03036758.2019.1609052

    Article  Google Scholar 

  3. Y. Bengio, A.C. Courville, P. Vincent, Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  4. N. Boric, P.A. Estévez, Genetic programming-based clustering using an information theoretic fitness measure, in Proceedings of the IEEE Congress on Evolutionary Computation (CEC), pp. 31–38 (2007)

  5. L. Cayton, Algorithms for manifold learning. Technical Report, pp. 1–17, (2005)

  6. V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)

    Article  Google Scholar 

  7. A.L.V. Coelho, E. Fernandes, K. Faceli, Multi-objective design of hierarchical consensus functions for clustering ensembles via genetic programming. Decis. Support Syst. 51(4), 794–809 (2011)

    Article  Google Scholar 

  8. P. Comon, C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and Applications (Academic press, Cambridge, 2010)

    Google Scholar 

  9. J. Demsar, Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  10. D. Dheeru, E. Karra Taniskidou, UCI machine learning repository (2017). http://archive.ics.uci.edu/ml. Accessed 15 Oct 2019

  11. Y. Dodge, The Concise Encyclopedia of Statistics (Springer, Berlin, 2008)

    MATH  Google Scholar 

  12. J.G. Dy, C.E. Brodley, Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)

    MathSciNet  MATH  Google Scholar 

  13. D. Floreano, P. Dürr, C. Mattiussi, Neuroevolution: from architectures to learning. Evolut. Intell. 1(1), 47–62 (2008)

    Article  Google Scholar 

  14. T. Hastie, R. Tibshirani, J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd edn. (Springer, Berlin, 2009)

    Book  MATH  Google Scholar 

  15. W. Hsu, Y. Zhang, J.R. Glass, Unsupervised learning of disentangled and interpretable representations from sequential data. In Proceedings of the Advances in Neural Information Processing Systems (NIPS) vol. 30, pp. 1876–1887 (2017)

  16. I. Icke, A. Rosenberg, Multi-objective genetic programming for visual analytics. in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 322–334 (2011)

  17. I.T. Jolliffe, Principal component analysis, in International Encyclopedia of Statistical Science, pp. 1094–1096. Springer (2011)

  18. M.A. Kramer, Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991). https://doi.org/10.1002/aic.690370209

    Article  Google Scholar 

  19. J.B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  20. J.A. Lee, M. Verleysen, Nonlinear Dimensionality Reduction (Springer, Berlin, 2007)

    Book  MATH  Google Scholar 

  21. A. Lensen, B. Xue, M. Zhang, Automatically evolving difficult benchmark feature selection datasets with genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference, GECCO, pp. 458–465. ACM (2018)

  22. A. Lensen, B. Xue, M. Zhang, Can genetic programming do manifold learning too? in Proceedings of the European Conference on Genetic Programming (EuroGP). Lecture Notes in Computer Science, vol. 11451, pp. 114–130. Springer (2019)

  23. H. Liu, H. Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective (Springer, Berlin, 1998)

    Book  MATH  Google Scholar 

  24. H. Liu, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, vol. 454 (Springer, Berlin, 2012)

    MATH  Google Scholar 

  25. H. Liu, L. Yu, Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)

    Article  Google Scholar 

  26. A.D. Lorenzo, E. Medvet, T. Tusar, A. Bartoli, An analysis of dimensionality reduction techniques for visualizing evolution, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, (GECCO), pp. 1864–1872 (2019)

  27. J. McDermott, Why is auto-encoding difficult for genetic programming? in Proceedings of the European Conference on Genetic Programming (EuroGP). Lecture Notes in Computer Science, vol. 11451, pp. 131–145. Springer (2019)

  28. L. McInnes, J. Healy, J. Melville, UMAP: uniform manifold approximation and projection for dimension reduction. arXiv e-prints arXiv:1802.03426 (2018)

  29. K. Michalak, Low-dimensional euclidean embedding for visualization of search spaces in combinatorial optimization. IEEE Trans. Evolut. Comput. 23(2), 232–246 (2019)

    Article  Google Scholar 

  30. W.J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, B. Yu, Interpretable machine learning: definitions, methods, and applications. arXiv e-prints arXiv:1901.04592 (2019)

  31. K. Neshatian, M. Zhang, P. Andreae, A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans. Evolut. Comput. 16(5), 645–661 (2012)

    Article  Google Scholar 

  32. S. Nguyen, M. Zhang, D. Alahakoon, K.C. Tan, Visualizing the evolution of computer programs for genetic programming [research frontier]. IEEE Comput. Intell. Mag. 13(4), 77–94 (2018)

    Article  Google Scholar 

  33. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  34. K.V. Price, Differential evolution, in In Handbook of Optimization—From Classical to Modern Approach, pp. 187–214 (2013)

  35. L. Rodriguez-Coayahuitl, A. Morales-Reyes, H.J. Escalante, Structurally layered representation learning: Towards deep learning through genetic programming, in Proceedings of the European Conference on Genetic Programming (EuroGP), pp. 271–288 (2018)

  36. S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  37. C. Shand, R. Allmendinger, J. Handl, A.M. Webb, J. Keane, Evolving controllably difficult datasets for clustering, in Proceedings of the Genetic and Evolutionary Computation Conference, (GECCO), pp. 463–471 (2019)

  38. P. Sondhi, Feature construction methods: a survey. Technical report, Univeristy of Illinois at Urbana Champaign, Urbana, Illinois, USA (2009)

  39. Y. Sun, G.G. Yen, Z. Yi, Evolving unsupervised deep neural networks for learning meaningful representations. IEEE Trans. Evolut. Comput. 23(1), 89–103 (2019)

    Article  Google Scholar 

  40. J. Tang, S. Alelyani, H. Liu, Feature selection for classification: a review, in Data Classification: Algorithms and Applications, pp. 37–64. CRC Press (2014)

  41. B. Tran, B. Xue, M. Zhang, Genetic programming for feature construction and selection in classification on high-dimensional data. Memet. Comput. 8(1), 3–15 (2016)

    Article  Google Scholar 

  42. L. van der Maaten, G.E. Hinton, Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    MATH  Google Scholar 

  43. B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evolut. Comput. 20(4), 606–626 (2016). https://doi.org/10.1109/TEVC.2015.2504420

    Article  Google Scholar 

  44. C. Zhang, C. Liu, X. Zhang, G. Almpanidis, An up-to-date comparison of state-of-the-art classification algorithms. Expert Syst. Appl. 82, 128–150 (2017)

    Article  Google Scholar 

  45. Q. Zhang, H. Li, MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evolut. Comput. 11(6), 712–731 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew Lensen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lensen, A., Zhang, M. & Xue, B. Multi-objective genetic programming for manifold learning: balancing quality and dimensionality. Genet Program Evolvable Mach 21, 399–431 (2020). https://doi.org/10.1007/s10710-020-09375-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-020-09375-4

Keywords

Navigation