ABSTRACT
Self-supervised learning (SSL) methods have been widely used to train deep learning models for computer vision and natural language processing domains. They leverage large amounts of unlabeled data to help pretrain models by learning patterns implicit in the data. Recently, new SSL techniques for tabular data have been developed, using new pretext tasks that typically aim to reconstruct a corrupted input sample and yielding models which are, ideally, robust feature transforms. In this paper, we pose the research question of whether genetic programming is capable of leveraging data processed using SSL methods to improve its performance. We test this hypothesis by assuming different amounts of labeled data on seven different datasets (five OpenML benchmarking datasets and two real-world datasets). The obtained results show that in almost all problems, standard genetic programming is not able to capitalize on the learned representations, producing results equal to or worse than using the labeled partitions.
- L. Breiman. 2004. Random Forests. Machine Learning 45 (2004), 5--32.Google ScholarDigital Library
- Carlota Cardoso, Rita T Sousa, Sebastian Köhler, and Catia Pesquita. 2020. A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain. Database 2020 (11 2020). Google ScholarCross Ref
- Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. ArXiv abs/2002.05709 (2020).Google Scholar
- Sajad Darabi, Shayan Fazeli, Ali Pazokitoroudi, Sriram Sankararaman, and Majid Sarrafzadeh. 2021. Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain. ArXiv abs/2108.12296 (2021).Google Scholar
- Carl Doersch, Abhinav Kumar Gupta, and Alexei A. Efros. 2015. Unsupervised Visual Representation Learning by Context Prediction. 2015 IEEE International Conference on Computer Vision (ICCV) (2015), 1422--1430.Google ScholarDigital Library
- Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised Representation Learning by Predicting Image Rotations. ArXiv abs/1803.07728 (2018).Google Scholar
- Alexander Kolesnikov, Xiaohua Zhai, and Lucas Beyer. 2019. Revisiting Self-Supervised Visual Representation Learning. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 1920--1929.Google Scholar
- Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. In European Conference on Computer Vision.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.Google ScholarDigital Library
- Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. 2008. A Field Guide to Genetic Programming. lulu.com.Google Scholar
- Alec Radford and Karthik Narasimhan. 2018. Improving Language Understanding by Generative Pre-Training.Google Scholar
- Rita T. Sousa, Sara Silva, and Catia Pesquita. 2021. Supervised Semantic Similarity. bioRxiv (2021). Google ScholarCross Ref
- Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: networked science in machine learning. SIGKDD Explorations 15, 2 (2013), 49--60. Google ScholarDigital Library
- Jinsung Yoon, Yao Zhang, James Jordon, and Mihaela van der Schaar. 2020. VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 11033--11043.Google Scholar
Index Terms
- Performance Analysis of Self-Supervised Strategies for Standard Genetic Programming
Recommendations
Semi-supervised genetic programming for classification
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computationLearning from unlabeled data provides innumerable advantages to a wide range of applications where there is a huge amount of unlabeled data freely available. Semi-supervised learning, which builds models from a small set of labeled examples and a ...
Semi-supervised Time Series Classification Model with Self-supervised Learning
AbstractSemi-supervised learning is a powerful machine learning method. It can be used for model training when only part of the data are labeled. Unlike discrete data, time series data generally have some temporal relation, which can be ...
Highlights- Self-supervised temporal relation learning can assist supervised model for time series classification.
Improving Few-Shot Image Classification with Self-supervised Learning
Cloud Computing – CLOUD 2022AbstractFew-Shot Image Classification (FSIC) aims to learn an image classifier with only a few training samples. The key challenge of few-shot image classification is to learn this classifier with scarce labeled data. To tackle the issue, we leverage the ...
Comments