Designing genetic programming classifiers with feature selection and feature construction
Introduction
Classification is a common supervised machine learning application, which is to find classifiers that are learned from training data and to predict class labels of unseen data according to predefined features. The quality of features has a great influence on the classification performance. Feature construction and feature selection are two common feature preprocessing methods before classification. Feature selection [1], [2] is to select effective features from original features and remove irrelevant or redundant features. Selected features are a subset of original features. Feature construction [3], [4], [5] is to construct one new feature (or multiple features) from original features. Constructed features are higher-level features that are new representations of original features.
Genetic Programming (GP) [6], [7] is an effective evolutionary computation (EC) algorithm due to its global search ability. GP is usually represented as a tree-like structure, which can be transformed to mathematical and logical expressions [8]. Due to the flexibility of GP, GP can be used for feature construction [4], [5], feature selection [9], [10] and classifier construction [11], [12].
Depending on whether to use the classification performance of a learning algorithm as the evaluation criterion, feature selection and feature construction methods are divided into wrapper and filter approaches. Wrapper-based methods use the classification performance of a learning algorithm as the evaluation criterion and generally achieve better classification performance than filter-based methods, while filter-based methods employ information measures such as Information Gain (IG) [13], Information Gain Ratio (IGR) [14], Fisher criterion [15], [16], Correlation [17], [18] as evaluation criteria and obtain less time costly and more general models than wrapper-based methods. For feature construction methods using GP, it is necessary to extract some instances from a training set to search for effective constructed features. For applications with insufficient instances, the training models found in limited data could have poor generalization ability, thus reduce the classification performance.
Different kinds of classification algorithms such as K-nearest neighbors (KNN) [19], decision trees [20], Naive Bayes (NB) [21], Support vector machines (SVM) [22] are used to train classifiers. Compared with other classification algorithms, less attention has been paid to the construction of GP classifiers. Some traditional machine learning algorithms typically assume that each feature is independent, and there is no relationship between features. In fact, there are interrelated features in some applications. GP can automatically discover the hidden relationship between features and construct new features. The classifier constructed by GP has the characteristic of feature construction, and can achieve particularly good classification performance in some applications with interrelated features.
GP can automatically select effective original features as terminal nodes and choose arithmetic and logical operators from a function set as internal nodes. After the evolution of generations, the GP expressions have discrimination ability between classes. Then, a GP expression is equivalent to a GP classifier. Because GP can automatically select original features as terminal nodes, some researches [23], [24], [25] do not restrict the number of distinct features within terminal nodes and only consider the classification performance of GP classification rules. Muni et al. (2006) [11] and Purohit et al. (2010) [12] have considered restricting the number of features selected to achieve the purpose of feature selection and classifier construction. However, there are no further experiments to verify the bad impact of irrelevant or relevant features on GP classifier. To make GP classifier easy to interpret, some researches [26], [27], [28], [29] not only consider the classification performance but also constraint the function complexity of the classification rules. Some literatures focus on solving multi-class classification problems using GP [30], [31], [32], [33], [34] and designing multiple-objective GP classifiers [35], [36], [37].
In applications with higher dimensions, the irrelevant and redundant features included in original features affect the search ability of GP and make GP easily fall into local optimum. However, the impact of irrelevant and redundant features on GP classifier is not verified in previous literatures, and whether constraining irrelevant and redundant features can improve the performance of GP classifier should be verified. Moreover, other characteristics of GP classifier need to be further investigated. In this paper, we propose two GP classifier construction methods. The first (GPMO) is to use a multi-objective fitness function that decreases the classification errors and the number of selected features to restrict the number of irrelevant and redundant original features as GP’s terminal nodes. The second (FSGPMO) is to first use a feature selection method (LFS) to remove irrelevant and redundant features and then use GPMO to construct classifiers to further reduce the impact of irrelevant and redundant features on classification performance. In addition, numerous experiments are conducted to verify whether the proposed GPMO and FSGPMO have advantages over GP classifiers with a single-objective fitness function, other GP-based classifiers, other classification algorithms and wrapper-based feature construction methods, and whether there is bloat and overfitting phenomena in the process of GP evolution.
The overall goal of this paper is to propose two GP classifier construction methods, GPMO and FSGPMO, to reduce the impact of irrelevant and redundant features on classification performance. The following five objectives will be investigated in order to achieve our overall goal.
Objective 1: Propose a GP classifier named GPMO that uses a multi-objective fitness function to restrict the number of irrelevant and redundant original features as GP’s terminal nodes.
Objective 2: Propose another GP classifier named FSGPMO that first uses feature selection method to remove irrelevant and redundant features then uses GPMO to construct classifiers.
Objective 3: Compare FSGPMO and GPMO with the GP classifier with a single-objective fitness function named GPSO, and investigate whether FSGPMO and GPMO can achieve better classification performance than GPSO.
Objective 4: Compare GPMO with two other GP-based classifiers, other classification algorithms and wrapper-based feature construction methods using GP, and verify the effectiveness of our proposed GPMO.
Objective 5: Investigate whether GPMO have bloat and overfitting phenomena and investigate the benefits of GP classifiers over other machine learning algorithms.
The remainder of the paper is organized as follows. Section 2 provides background information involved in this paper. Section 3 describes the construction methods of proposed GP classifiers. Section 4 provides our experimental methods and Section 5 presents our experimental results and discussions. Section 6 is our conclusions and future work.
Section snippets
Genetic Programming (GP)
The evolutionary computation (EC) techniques are inspired by Darwin’s theory of evolution. Genetic algorithm (GA), Particle swarm optimization (PSO), Genetic programming (GP) are commonly used EC algorithms. Compared with other EC algorithms, GP has the advantage of having flexible representations, so it can be designed for a variety of applications. GP is the evolution of computer programs which can usually be represented as tree-like structures [6], [7]. GP starts with a population that
Methodology
We use standard GP representation methods to design a classifier. The GP individuals are represented as a tree-like structure. Genetic operators of GP, including reproduction, crossover and mutation follow traditional methods.
A GP individual is equivalent to a GP classifier. The demonstration of a GP classifier is shown in Fig. 1. Terminal nodes of a GP classifier are randomly selected from original features , where n is the number of original features, denotes the th original
Benchmark techniques
To verify our proposed GP classifier, two benchmarks are chosen for comparison.
The first is proposed by Bojarczuk [28]. The GP classifier in this method is constrained by a fitness function to make the classification rules simpler and easy to interpret. The fitness function considers three factors: sensitivity (Se), specificity (Sp) and simplicity (Sy). The goal of the GP classifier is to maximize both the Se and Sy, and to minimize simultaneously the size of classification rule. The fitness
The visualization of GP classifiers
A GP classifier is equivalent to a classification rule. An instance is the input of a GP classifier. For binary classification tasks, if the output of the GP classifier is less than 0, then this instance belongs to class 1, otherwise it belongs to class 2. Fig. 4 demonstrates a GP classifier on Liver-disorders, HillValley, ionosphere and Wdbc datasets. In Fig. 4, the -axis value is the output of the GP classifier and the -axis value is a random number to disperse one-dimensional data into
Conclusions and future work
To restrict the bad impact of irrelevant and redundant features on GP classifier, this paper proposes a GP classifier named GPMO that use a multiple-objective fitness function to decrease the classification error rate and the number of selected features, and another GP classifier named FSGPMO that first use LFS feature selection method to remove irrelevant and redundant features then use GPMO to construct a classifier. The experiments on twelve datasets show that GPMO and FSGPMO have advantages
CRediT authorship contribution statement
Jianbin Ma: Conceptualization, Methodology, Software, Validation, Supervision, Investigation, Writing - original draft. Xiaoying Gao: Writing - review & editing, Formal analysis, Visualization.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by the Key R&D Program of Hebei Province, China (No. 20327405D) and Hebei Provincial Department of Human Resources and Social Security, China (No. CN201709).
References (82)
- et al.
Feature selection for imbalanced data based on neighborhood rough sets
Inform. Sci.
(2019) - et al.
A hybrid multiple feature construction approach using genetic programming
Appl. Soft Comput.
(2019) - et al.
Differentially private Naive Bayes learning over multiple data sources
Inform. Sci.
(2018) - et al.
A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets
Artif. Intell. Med.
(2004) - et al.
Multiobjective genetic programming for maximizing ROC performance
Neurocomputing
(2014) Using J-pruning to reduce overfitting in classification trees
Knowl.-Based Syst.
(2002)- et al.
Information complexity of neural networks
Neural Netw.
(2000) - et al.
A decision-theoretic generalization of on-line learning and an application to boosting
J. Comput. System Sci.
(1997) Stacked generalization
Neural Netw.
(1992)- et al.
Breast cancer diagnosis using genetic programming generated feature
Pattern Recognit.
(2006)
A filter-based feature construction and feature selection approach for classification using genetic programming
Knowl.-Based Syst.
Classifier design with feature selection and feature extraction using layered genetic programming
Expert Syst. Appl.
Classification of foreign fibers in cotton lint using machine vision and multi-class support vector machine
Comput. Electron. Agric.
Markov blanket-embedded genetic algorithm for gene selection
Pattern Recognit.
A survey on evolutionary computation approaches to feature selection
IEEE Trans. Evol. Comput.
Genetic programming for feature construction and selection in classification on high-dimensional data
Memet. Comput.
A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming
IEEE Trans. Evol. Comput.
Genetic Programming: On the Programming of Computers by Means of Natural Selection
Genetic programming III - Darwinian invention and problem solving
IEEE Trans. Evol. Comput.
A survey on the application of genetic programming to classification
IEEE Trans. Syst. Man Cybern. C
Genetic programming for simultaneous feature selection and classifier design
IEEE Trans. Syst. Man Cybern.-B
Evolutionary constructive induction
IEEE Trans. Knowl. Data Eng.
Genetic programming for attribute construction in data mining
Feature generation using genetic programming with application to fault classification
IEEE Trans. Syst. Man Cybern. B
Feature extraction and dimensionality reduction by genetic programming based on the Fisher criterion
Expert Syst.
A hybrid method for feature construction and selection to improve wind-damage prediction in the forestry sector
The WEKA data mining software: an update
Acm Sigkdd Explor. Newsl.
K-nearest neighbor classification over semantically secure encrypted relational data
IEEE Trans. Knowl. Data Eng.
C4.5: Programs for Machine Learning
The Nature of Statistical Learning Theory
Genetic programming for automatic target classification and recognition in synthetic aperture radar imagery
Discovering interesting classification rules with genetic programming
Appl. Soft Comput.
Genetic programming for knowledge discovery in chest-pain diagnosis
IEEE Eng. Med. Biol. Mag.
A novel approach to design classifiers using genetic programming
IEEE Trans. Evol. Comput.
Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification
Pattern Recognit. Lett.
Cited by (9)
A comprehensive review of automatic programming methods
2023, Applied Soft ComputingBagging-based ensemble classifiers using multiobjective Genetic Programming
2024, Research SquareA Multi-level Random Forest Model-Based Intrusion Detection Using Fuzzy Inference System for Internet of Things Networks
2023, International Journal of Computational Intelligence SystemsSeparability-based Quadratic Feature Transformation to Improve Classification Performance
2023, International Journal of Advanced Computer Science and ApplicationsAn Evolving Ensembles Method Based on Genetic Programming and Whale Optimization Algorithm for Imbalanced Datasets
2023, Proceedings of SPIE - The International Society for Optical Engineering