Genetic programming based pattern classification with feature space partitioning

https://doi.org/10.1016/S0020-0255(00)00081-5Get rights and content

Abstract

Genetic programming (GP) is an evolutionary technique and is gaining attention for its ability to learn the underlying data relationships and express them in a mathematical manner. Although GP uses the same principles as genetic algorithms, it is a symbolic approach to program induction; i.e., it involves the discovery of a highly fit computer program from the space of computer programs that produces a desired output when presented with a particular input. We have successfully applied the GP paradigm for the n-category pattern classification problem. The ability of the GP classifier to learn the data distributions depends upon the number of classes and the spatial spread of data. As the number of classes increases, it increases the difficulty for the GP classifier to resolve between classes. So, there is a need to partition the feature space and identify sub-spaces with reduced number of classes. The basic objective is to divide the feature space into sub-spaces and hence the data set that contains representative samples of n classes into sub-data sets corresponding to the sub-spaces of the feature space, so that some of the sub-data sets/spaces can have data belonging to only p-classes (p<n). The GP classifier is then evolved independently for the sub-data sets/spaces of the feature space. The GP classifier becomes simpler for some of the sub-data sets/spaces as only p classes are present. It also results in localized learning as the GP classifier has to learn the data distribution in only a sub-space of the feature space rather than in the entire feature space. In this paper, we are integrating the GP classifier with feature space partitioning (FSP) for localized learning to improve pattern classification.

Introduction

Classification has been done traditionally by the maximum likelihood classifier (MLC), which assumes a normal distribution for the input data [1]. Artificial neural networks have also been successfully applied for pattern classification problems in the areas of remote sensing [2], [3] and biomedical applications [4]. Genetic programming (GP), which is part of the evolutionary computing family is gaining attention for its ability to learn the underlying data relationships and express them in a mathematical manner. GP uses the same principles as genetic algorithms [5]. However, it is a symbolic approach to program induction; i.e., it involves the discovery of a highly fit computer program from the space of computer programs that produces a desired output when presented with a particular input. GP has been successfully applied to diverse problems such as optimal control, planning in artificial intelligence, discovery of game playing strategies [6], evolution of neural networks [7], fuzzy logic production rules [8], automated synthesis of analog electrical circuits [9] and in decision support for vehicle dispatching [10]. In pattern classification, GP-based techniques have an advantage over statistical methods such as MLC as they are distribution free; i.e., no a priori knowledge is assumed about the statistical distribution of classes.

GP has been used for solving 2-category pattern classification problems such as XOR, 2-spiral etc. [6]. We have extended the GP paradigm for n-category pattern classification [10]. Unlike binary strings in genetic algorithms [5], the structures undergoing adaptation in GP are hierarchical computer programs of dynamically varying size and shape. GP assumes that the solution to a problem can be formulated as a search for a highly fit individual computer program in the space of possible computer programs [6]. Let

  • F=f1,f2,…,fn, be the set of functions.

  • T=X1,X2,…,Xn, be the set of terminals.

The functions in the functions set may include:

  • Arithmetic operations (+, −, ×, ÷).

  • Mathematical functions (SINE, COS, EXP, LOG).

  • Boolean operators (AND, OR, NOT).

  • Conditional operators (IF THEN ELSE).

  • User-defined domain specific functions.

The set of possible structures; i.e., computer programs in GP is the set of all possible compositions of functions that can be composed from F and T. The computer programs are basically LISP S-expressions. As GP is an evolutionary process, a population of computer programs is evolved over successive generations. During each generation, the fitness of each solution is evaluated and for the next generation, the solutions are selected based on their fitness. The selected solutions are subjected to genetic operators – crossover and mutation. Mutation introduces variation in the computer programs so as to ensure certain diversity in the population. In all our simulations, we are using the GPQUICK software [12]. Appendix C defines the various GP parameters used in GPQUICK. Table 1 gives the values of the parameters used in our simulation experiments. The choice of the parameters is however empirical.

The ability of the GP classifier to learn the data distributions is a function of the number of the classes n and the spatial spread of data. In this paper, we are integrating the GP classifier with feature space partitioning (FSP) to improve pattern classification. Essentially, it results in data partitioning of the data set containing representative samples of n-classes into smaller data sets so that some of the smaller data sets can then have data belonging to only p classes (p<n). Thus, in certain regions of feature space, the GP classifier gets simplified considerably, as it has to resolve between p-classes rather than n-classes. The GP classifier is evolved independently in each sub-region of the feature space. The main motivation for integrating the GP classifier with FSP is that

  • (i) it results in localized learning of data distributions,

  • (ii) it can reduce the number of classes to be resolved in certain sub-spaces,

  • (iii) it can lead to improved pattern classification,

  • (iv) it gives scope for a parallel implementation.

This paper is organized as follows. Section 2 discusses the basic steps involved in GP-based n-category classification. In Section 3, we shall present the merit of integrating FSP with the GP classifier. Section 4 gives a description of the framework for the proposed FSP. Section 5 presents the experimental results for GP with FSP. Section 6 gives the conclusion.

Section snippets

GP-based n-category pattern classification

In this section, we will outline how we have applied GP for n-category pattern classification problem [11]. The given data set that contains representative samples of the n classes is divided into a training set and a validation set. The training set is used for obtaining the GP classifier and the validation set is used for evaluating the GP classifier.

GP classifier and feature space partitioning

As mentioned earlier, in any pattern classification application, discrimination among classes is done by developing a discriminant function that accepts the input feature vector and assigns one of the n-classes. We will consider a simple problem to illustrate the merit of localized learning due to FSP for the GP classifier. In this experiment based on synthetic data, the evolution of the GPCEs is done with the GP parameters shown in Table 1 except that the number of generations was limited to

Feature space partitioning for data analysis

Clustering techniques, hierarchical classification and data analysis in different sub-spaces have been studied in pattern classification. While clustering techniques attempt to determine some groups in the data distribution, hierarchical classification creates a tree structure for classification. Given n classes, a tree structure can be developed and at each node, a discriminant function successively divides the data into two groups and at the lowest level, the class is assigned to the input.

GP classifier with FSP

The basic framework for GP-based pattern classification with FSP can be divided into the following steps:

  • 1.

    Partitioning of the feature space into sub-regions of the feature space so that regions with lesser number of classes can be identified from regions with more number of classes.

  • 2.

    Creation of sub-region specific data sets.

  • 3.

    GP-based pattern classification in each sub-region of feature space is done independently.

  • 4.

    Combination of results obtained in different sub-regions of the feature space.

Steps

Conclusions

GP is an evolutionary approach to solve problems. We have extended the GP paradigm to the n class problem by formulating it as n 2-class problems. As the number of classes n increases, it increases the complexity of the pattern classification problem because more number of classes have to be resolved. So, a better understanding of the underlying data distribution of the various classes in different regions of the feature space is needed to reduce misclassification. In this paper, we have

References (12)

  • R.O. Duda et al.

    Pattern Classification and Scene Analysis

    (1973)
  • P.D. Heerman et al.

    Classification of multispectral remote sensing data using a back propogation neural network

    IEEE Trans. Geosci. Remote Sensing

    (1992)
  • T. Yoshida et al.

    Neural network approach to land cover mapping

    IEEE Trans. Geosci. Remote Sensing

    (1994)
  • L.C. Pretorius, C. Nel, Feature extraction fro ECG for classification by artificial neural networks, in: IEEE Fifth...
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization and Machine Learning

    (1989)
  • J.R. Koza

    Genetic Programming: On the Programming of Computers by Means of Natural

    (1992)
There are more references available in the full text version of this article.

Cited by (26)

  • An artificial neural network-based model for roping prediction in aluminum alloy sheet

    2023, Acta Materialia
    Citation Excerpt :

    The ratio of some classes in datasets is so large that the network becomes biased, resulting in high accuracy for the majority classes but poor performance for the minority classes [33]. According to research [34], learning algorithms’ effectiveness can be greatly degraded by the uneven distribution of class samples. In the training phase, the adaptive momentum estimation (ADAM) is adopted as the optimizer with an initial learning rate of 10−5 and a weight decay rate of 10−6.

  • Computational socioeconomics

    2019, Physics Reports
    Citation Excerpt :

    Their algorithm employed gray level co-occurrence matrices [127] to generate texture and edge patterns from satellite imagery that are useful in urban land cover classification. Liao et al. [128] presented a high-accuracy population mapping method that integrates genetic programming (GP) [129] and genetic algorithms (GA) [130] with geographic information systems (GIS). Specifically, they applied GIS to identify relevant factors (e.g., land-cover types and transport infrastructure) and use GP and GA to transform census data to population grids.

View all citing articles on Scopus
View full text