Classifier design with feature selection and feature extraction using layered genetic programming

https://doi.org/10.1016/j.eswa.2007.01.006Get rights and content

Abstract

This paper proposes a novel method called FLGP to construct a classifier device of capability in feature selection and feature extraction. FLGP is developed with layered genetic programming that is a kind of the multiple-population genetic programming. Populations advance to an optimal discriminant function to divide data into two classes. Two methods of feature selection are proposed. New features extracted by certain layer are used to be the training set of next layer’s populations. Experiments on several well-known datasets are made to demonstrate performance of FLGP.

Section snippets

Feature selection

This research concentrate on three research topics: feature selection, feature generation, and classifier design. Feature selection is an important technique of pattern recognition dealing with raw features. It focuses on removing useless, irrelevant, and redundant features. The classification accuracy of data derived by selected features is better than that by no selection. Many research working on feature selection have been proposed (Ahmad and Dey, 2005, Dash and Liu, 1997, Jain and Zongker,

FLGP

This section aims to itemize FLGP. At first, basic GP terms, including terminal, operation, individual, population, and genetic operators are going to be introduced. Secondly, layers and the relations between layers are described.

We define the classification problem as follows.

Let T be the training set for a K-class classification problem including n training samples and TS be the test set. A training sample of T is a pair of class label and m significant real-valued elementsT={ti|ti=(ci,xi),ci

Experiments

This section will discuss the experiments and analyzes classification results. We select three diagnostic problems, cancer, diabetes, and heart, from the PROBEN1 benchmark set (Prechelt, 1994). These problems are originally from the UCI repository (Blake, Keogh, & Merz, 1998) and have been preprocessed by Prechelt (1994). Values of all sets are normalized to the continuous range [0, 1]. Missing attributes are completed. Every attribute of m possible values is encoded by the 1-of-m method.

Conclusions

This paper proposes a novel method called FLGP to construct classifier with capabilities of feature selection and feature extraction. FLGP employs multi-population genetic programming technique in a proper multi-layer architecture. By means of a number of experiments, we show that FLGP not only achieves high classification accuracy but also completes feature selection and feature extraction simultaneously. The classification accuracy of FLGP is comparable to traditional single population

References (41)

  • W. Banzhaf et al.

    Genetic programming: an introduction on the automatic evolution of computer programs and its application

    (1998)
  • Blake, C., Keogh, E., & Merz, C. J. (1998). UCI repository of machine learning databases. Irvine, University of...
  • Bojarczuk, C. C., Lopes, H. S., & Freitas, A. A. (1999). Discovering comprehensible classification rules using genetic...
  • M. Brameier et al.

    A comparison of linear genetic programming and neural networks in medical data mining

    IEEE Transactions on Evolutionary Computation

    (2001)
  • B.C. Chien et al.

    Learning effective classifiers with Z-value measure based on genetic programming

    Pattern Recognition

    (2003)
  • I.D. Falco et al.

    Discovering interesting classification rules with genetic programming

    Applied Soft Computing

    (2002)
  • F. Fernández et al.

    An empirical study of multipopulation genetic programming

    Genetic Programming and Evolvable Machines

    (2003)
  • Freitas, A. (1997). A genetic programming framework for two data mining tasks: classification and generalized rule...
  • J. Han et al.

    Data mining: concepts and techniques

    (2001)
  • G. Hong et al.

    Feature generation using genetic programming with application to fault classification

    IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics

    (2005)
  • Cited by (65)

    • Designing genetic programming classifiers with feature selection and feature construction

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      The previous GP classifier construction methods focused on how to evaluate a GP classification rule, how to constraint the function complexity of a GP classification rule to obtain better interpretability, and how to design a GP classifier to perform multiclass classification problems. There are some researches on designing GP classifiers that simultaneously perform feature selection and classification [11,35,64]. However, the impact of irrelevant and redundant features on GP classifier is not verified in details, and whether constraining irrelevant and redundant features can improve the performance of GP classifier should be verified.

    • Multi Hive Artificial Bee Colony Programming for high dimensional symbolic regression with feature selection

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      Much work was investigated the generalization ability of GP and ABC in classification problems [38–42]. Recently, works using automatic programming methods for high-dimensional symbolic regression problems have increased [36,38,43–46]. Lin et al. proposed a multi-population genetic programming-based Feature Layered Genetic Programming (FLGP) classifier using multi-layered architecture [43].

    View all citing articles on Scopus
    View full text