Designing genetic programming classifiers with feature selection and feature construction

https://doi.org/10.1016/j.asoc.2020.106826Get rights and content

Highlights

  • A GP classifier with a multiple-objective fitness function is proposed.

  • Another GP classifier that employs feature selection before constructing GP classifiers is proposed.

  • Our proposed GP classifiers have advantages over GP classifiers with a single-objective fitness function.

  • Our proposed GP classifier has advantages over two other GP-based classifiers, other classification algorithms and wrapper-based feature construction methods using GP.

  • Bloat phenomena exists in the process of GP evolution and overfitting phenomena is not obvious.

Abstract

Due to the flexibility of Genetic Programming (GP), GP has been used for feature construction, feature selection and classifier construction. In this paper, GP classifiers with feature selection and feature construction are investigated to obtain simple and effective classification rules. During the construction of a GP classifier, irrelevant and redundant features affect the search ability of GP, and make GP easily fall into local optimum. This paper proposes two new GP classifier construction methods to restrict bad impact of irrelevant and redundant features on GP classifier. The first is to use a multiple-objective fitness function that decreases both classification error rate and the number of selected features, which is named as GPMO. The second is to first use a feature selection method, i.e., linear forward selection (LFS) to remove irrelevant and redundant features and then use GPMO to construct classifiers, which is named as FSGPMO. Experiments on twelve datasets show that GPMO and FSGPMO have advantages over GP classifiers with a single-objective fitness function named GPSO in term of classification performance, the number of selected features, time cost and function complexity. The proposed FSGPMO can achieve better classification performance than GPMO on higher dimension datasets, however, FSGPMO may remove potential effective features for GP classifier and achieve much lower classification performance than GPMO on some datasets. Compared with two other GP-based classifiers, GPMO can significantly improve the classification performance. Comparisons with other classification algorithms show that GPMO can achieve better or comparable classification performance on most selected datasets. Our proposed GPMO can achieve better performance than wrapper-based feature construction methods using GP on applications with insufficient instances. Further investigations show that bloat phenomena exists in the process of GP evolution and overfitting phenomena is not obvious. Moreover, the benefits of GP over other machine learning algorithms are discussed.

Introduction

Classification is a common supervised machine learning application, which is to find classifiers that are learned from training data and to predict class labels of unseen data according to predefined features. The quality of features has a great influence on the classification performance. Feature construction and feature selection are two common feature preprocessing methods before classification. Feature selection [1], [2] is to select effective features from original features and remove irrelevant or redundant features. Selected features are a subset of original features. Feature construction [3], [4], [5] is to construct one new feature (or multiple features) from original features. Constructed features are higher-level features that are new representations of original features.

Genetic Programming (GP) [6], [7] is an effective evolutionary computation (EC) algorithm due to its global search ability. GP is usually represented as a tree-like structure, which can be transformed to mathematical and logical expressions [8]. Due to the flexibility of GP, GP can be used for feature construction [4], [5], feature selection [9], [10] and classifier construction [11], [12].

Depending on whether to use the classification performance of a learning algorithm as the evaluation criterion, feature selection and feature construction methods are divided into wrapper and filter approaches. Wrapper-based methods use the classification performance of a learning algorithm as the evaluation criterion and generally achieve better classification performance than filter-based methods, while filter-based methods employ information measures such as Information Gain (IG) [13], Information Gain Ratio (IGR) [14], Fisher criterion [15], [16], Correlation [17], [18] as evaluation criteria and obtain less time costly and more general models than wrapper-based methods. For feature construction methods using GP, it is necessary to extract some instances from a training set to search for effective constructed features. For applications with insufficient instances, the training models found in limited data could have poor generalization ability, thus reduce the classification performance.

Different kinds of classification algorithms such as K-nearest neighbors (KNN) [19], decision trees [20], Naive Bayes (NB) [21], Support vector machines (SVM) [22] are used to train classifiers. Compared with other classification algorithms, less attention has been paid to the construction of GP classifiers. Some traditional machine learning algorithms typically assume that each feature is independent, and there is no relationship between features. In fact, there are interrelated features in some applications. GP can automatically discover the hidden relationship between features and construct new features. The classifier constructed by GP has the characteristic of feature construction, and can achieve particularly good classification performance in some applications with interrelated features.

GP can automatically select effective original features as terminal nodes and choose arithmetic and logical operators from a function set as internal nodes. After the evolution of generations, the GP expressions have discrimination ability between classes. Then, a GP expression is equivalent to a GP classifier. Because GP can automatically select original features as terminal nodes, some researches [23], [24], [25] do not restrict the number of distinct features within terminal nodes and only consider the classification performance of GP classification rules. Muni et al. (2006) [11] and Purohit et al. (2010) [12] have considered restricting the number of features selected to achieve the purpose of feature selection and classifier construction. However, there are no further experiments to verify the bad impact of irrelevant or relevant features on GP classifier. To make GP classifier easy to interpret, some researches [26], [27], [28], [29] not only consider the classification performance but also constraint the function complexity of the classification rules. Some literatures focus on solving multi-class classification problems using GP [30], [31], [32], [33], [34] and designing multiple-objective GP classifiers [35], [36], [37].

In applications with higher dimensions, the irrelevant and redundant features included in original features affect the search ability of GP and make GP easily fall into local optimum. However, the impact of irrelevant and redundant features on GP classifier is not verified in previous literatures, and whether constraining irrelevant and redundant features can improve the performance of GP classifier should be verified. Moreover, other characteristics of GP classifier need to be further investigated. In this paper, we propose two GP classifier construction methods. The first (GPMO) is to use a multi-objective fitness function that decreases the classification errors and the number of selected features to restrict the number of irrelevant and redundant original features as GP’s terminal nodes. The second (FSGPMO) is to first use a feature selection method (LFS) to remove irrelevant and redundant features and then use GPMO to construct classifiers to further reduce the impact of irrelevant and redundant features on classification performance. In addition, numerous experiments are conducted to verify whether the proposed GPMO and FSGPMO have advantages over GP classifiers with a single-objective fitness function, other GP-based classifiers, other classification algorithms and wrapper-based feature construction methods, and whether there is bloat and overfitting phenomena in the process of GP evolution.

The overall goal of this paper is to propose two GP classifier construction methods, GPMO and FSGPMO, to reduce the impact of irrelevant and redundant features on classification performance. The following five objectives will be investigated in order to achieve our overall goal.

Objective 1: Propose a GP classifier named GPMO that uses a multi-objective fitness function to restrict the number of irrelevant and redundant original features as GP’s terminal nodes.

Objective 2: Propose another GP classifier named FSGPMO that first uses feature selection method to remove irrelevant and redundant features then uses GPMO to construct classifiers.

Objective 3: Compare FSGPMO and GPMO with the GP classifier with a single-objective fitness function named GPSO, and investigate whether FSGPMO and GPMO can achieve better classification performance than GPSO.

Objective 4: Compare GPMO with two other GP-based classifiers, other classification algorithms and wrapper-based feature construction methods using GP, and verify the effectiveness of our proposed GPMO.

Objective 5: Investigate whether GPMO have bloat and overfitting phenomena and investigate the benefits of GP classifiers over other machine learning algorithms.

The remainder of the paper is organized as follows. Section 2 provides background information involved in this paper. Section 3 describes the construction methods of proposed GP classifiers. Section 4 provides our experimental methods and Section 5 presents our experimental results and discussions. Section 6 is our conclusions and future work.

Section snippets

Genetic Programming (GP)

The evolutionary computation (EC) techniques are inspired by Darwin’s theory of evolution. Genetic algorithm (GA), Particle swarm optimization (PSO), Genetic programming (GP) are commonly used EC algorithms. Compared with other EC algorithms, GP has the advantage of having flexible representations, so it can be designed for a variety of applications. GP is the evolution of computer programs which can usually be represented as tree-like structures [6], [7]. GP starts with a population that

Methodology

We use standard GP representation methods to design a classifier. The GP individuals are represented as a tree-like structure. Genetic operators of GP, including reproduction, crossover and mutation follow traditional methods.

A GP individual is equivalent to a GP classifier. The demonstration of a GP classifier is shown in Fig. 1. Terminal nodes of a GP classifier are randomly selected from original features Fo=f1,f2,,fn, where n is the number of original features, fj denotes the jth original

Benchmark techniques

To verify our proposed GP classifier, two benchmarks are chosen for comparison.

The first is proposed by Bojarczuk [28]. The GP classifier in this method is constrained by a fitness function to make the classification rules simpler and easy to interpret. The fitness function considers three factors: sensitivity (Se), specificity (Sp) and simplicity (Sy). The goal of the GP classifier is to maximize both the Se and Sy, and to minimize simultaneously the size of classification rule. The fitness

The visualization of GP classifiers

A GP classifier is equivalent to a classification rule. An instance is the input of a GP classifier. For binary classification tasks, if the output of the GP classifier is less than 0, then this instance belongs to class 1, otherwise it belongs to class 2. Fig. 4 demonstrates a GP classifier on Liver-disorders, HillValley, ionosphere and Wdbc datasets. In Fig. 4, the X-axis value is the output of the GP classifier and the Y-axis value is a random number to disperse one-dimensional data into

Conclusions and future work

To restrict the bad impact of irrelevant and redundant features on GP classifier, this paper proposes a GP classifier named GPMO that use a multiple-objective fitness function to decrease the classification error rate and the number of selected features, and another GP classifier named FSGPMO that first use LFS feature selection method to remove irrelevant and redundant features then use GPMO to construct a classifier. The experiments on twelve datasets show that GPMO and FSGPMO have advantages

CRediT authorship contribution statement

Jianbin Ma: Conceptualization, Methodology, Software, Validation, Supervision, Investigation, Writing - original draft. Xiaoying Gao: Writing - review & editing, Formal analysis, Visualization.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the Key R&D Program of Hebei Province, China (No. 20327405D) and Hebei Provincial Department of Human Resources and Social Security, China (No. CN201709).

References (82)

  • MaJ. et al.

    A filter-based feature construction and feature selection approach for classification using genetic programming

    Knowl.-Based Syst.

    (2020)
  • LinJ.Y. et al.

    Classifier design with feature selection and feature extraction using layered genetic programming

    Expert Syst. Appl.

    (2008)
  • LiD. et al.

    Classification of foreign fibers in cotton lint using machine vision and multi-class support vector machine

    Comput. Electron. Agric.

    (2010)
  • ZhuZ. et al.

    Markov blanket-embedded genetic algorithm for gene selection

    Pattern Recognit.

    (2007)
  • XueB. et al.

    A survey on evolutionary computation approaches to feature selection

    IEEE Trans. Evol. Comput.

    (2016)
  • TranB. et al.

    Genetic programming for feature construction and selection in classification on high-dimensional data

    Memet. Comput.

    (2016)
  • NeshatianK. et al.

    A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming

    IEEE Trans. Evol. Comput.

    (2012)
  • KozaJ.R.

    Genetic Programming: On the Programming of Computers by Means of Natural Selection

    (1992)
  • KozaJ.R. et al.

    Genetic programming III - Darwinian invention and problem solving

    IEEE Trans. Evol. Comput.

    (2002)
  • EspejoP.G. et al.

    A survey on the application of genetic programming to classification

    IEEE Trans. Syst. Man Cybern. C

    (2010)
  • K. Neshatian, M. Zhang, Pareto front feature selection:using genetic programming to explore feature space, in:...
  • K. Neshatian, M. Zhang, Dimensionality reduction in face detection: A genetic programming approach, in: Proceedings of...
  • MuniD.P. et al.

    Genetic programming for simultaneous feature selection and classifier design

    IEEE Trans. Syst. Man Cybern.-B

    (2006)
  • A. Purohit, N.S. Chaudhari, A. Tiwari, Construction of classifier with feature selection based on genetic programming,...
  • MuharramM. et al.

    Evolutionary constructive induction

    IEEE Trans. Knowl. Data Eng.

    (2005)
  • OteroF.E.B. et al.

    Genetic programming for attribute construction in data mining

  • GuoH. et al.

    Feature generation using genetic programming with application to fault classification

    IEEE Trans. Syst. Man Cybern. B

    (2005)
  • GuoH. et al.

    Feature extraction and dimensionality reduction by genetic programming based on the Fisher criterion

    Expert Syst.

    (2008)
  • HartE. et al.

    A hybrid method for feature construction and selection to improve wind-damage prediction in the forestry sector

  • HallM. et al.

    The WEKA data mining software: an update

    Acm Sigkdd Explor. Newsl.

    (2009)
  • SamanthulaB.K. et al.

    K-nearest neighbor classification over semantically secure encrypted relational data

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • QuinlanJ.R.

    C4.5: Programs for Machine Learning

    (1993)
  • VapnikV.N.

    The Nature of Statistical Learning Theory

    (2000)
  • P.J. Rauss, J.M. Daida, Classification of spectral imagery using genetic programming, in: Proceedings of the 2000...
  • StanhopeS.

    Genetic programming for automatic target classification and recognition in synthetic aperture radar imagery

  • B.C. Chien, J.H. Yang, W.Y. Lin, Generating effective classifiers with supervised learning of genetic programming, in:...
  • FalcoI.D. et al.

    Discovering interesting classification rules with genetic programming

    Appl. Soft Comput.

    (2002)
  • BojarczukC.C. et al.

    Genetic programming for knowledge discovery in chest-pain diagnosis

    IEEE Eng. Med. Biol. Mag.

    (2000)
  • M. Zhang, Z. Yun, W.D. Smart, Program simplification in genetic programming for object classification, in: Proceedings...
  • MuniD. et al.

    A novel approach to design classifiers using genetic programming

    IEEE Trans. Evol. Comput.

    (2004)
  • ZhangM. et al.

    Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification

    Pattern Recognit. Lett.

    (2008)
  • Cited by (9)

    View all citing articles on Scopus
    View full text