Genetic programming based pattern classification with feature space partitioning
Introduction
Classification has been done traditionally by the maximum likelihood classifier (MLC), which assumes a normal distribution for the input data [1]. Artificial neural networks have also been successfully applied for pattern classification problems in the areas of remote sensing [2], [3] and biomedical applications [4]. Genetic programming (GP), which is part of the evolutionary computing family is gaining attention for its ability to learn the underlying data relationships and express them in a mathematical manner. GP uses the same principles as genetic algorithms [5]. However, it is a symbolic approach to program induction; i.e., it involves the discovery of a highly fit computer program from the space of computer programs that produces a desired output when presented with a particular input. GP has been successfully applied to diverse problems such as optimal control, planning in artificial intelligence, discovery of game playing strategies [6], evolution of neural networks [7], fuzzy logic production rules [8], automated synthesis of analog electrical circuits [9] and in decision support for vehicle dispatching [10]. In pattern classification, GP-based techniques have an advantage over statistical methods such as MLC as they are distribution free; i.e., no a priori knowledge is assumed about the statistical distribution of classes.
GP has been used for solving 2-category pattern classification problems such as XOR, 2-spiral etc. [6]. We have extended the GP paradigm for n-category pattern classification [10]. Unlike binary strings in genetic algorithms [5], the structures undergoing adaptation in GP are hierarchical computer programs of dynamically varying size and shape. GP assumes that the solution to a problem can be formulated as a search for a highly fit individual computer program in the space of possible computer programs [6]. Let
F=f1,f2,…,fn, be the set of functions.
T=X1,X2,…,Xn, be the set of terminals.
The functions in the functions set may include:
- •
Arithmetic operations (+, −, ×, ÷).
- •
Mathematical functions (SINE, COS, EXP, LOG).
- •
Boolean operators (AND, OR, NOT).
- •
Conditional operators (IF THEN ELSE).
- •
User-defined domain specific functions.
The set of possible structures; i.e., computer programs in GP is the set of all possible compositions of functions that can be composed from F and T. The computer programs are basically LISP S-expressions. As GP is an evolutionary process, a population of computer programs is evolved over successive generations. During each generation, the fitness of each solution is evaluated and for the next generation, the solutions are selected based on their fitness. The selected solutions are subjected to genetic operators – crossover and mutation. Mutation introduces variation in the computer programs so as to ensure certain diversity in the population. In all our simulations, we are using the GPQUICK software [12]. Appendix C defines the various GP parameters used in GPQUICK. Table 1 gives the values of the parameters used in our simulation experiments. The choice of the parameters is however empirical.
The ability of the GP classifier to learn the data distributions is a function of the number of the classes n and the spatial spread of data. In this paper, we are integrating the GP classifier with feature space partitioning (FSP) to improve pattern classification. Essentially, it results in data partitioning of the data set containing representative samples of n-classes into smaller data sets so that some of the smaller data sets can then have data belonging to only p classes (p<n). Thus, in certain regions of feature space, the GP classifier gets simplified considerably, as it has to resolve between p-classes rather than n-classes. The GP classifier is evolved independently in each sub-region of the feature space. The main motivation for integrating the GP classifier with FSP is that
(i) it results in localized learning of data distributions,
(ii) it can reduce the number of classes to be resolved in certain sub-spaces,
(iii) it can lead to improved pattern classification,
(iv) it gives scope for a parallel implementation.
This paper is organized as follows. Section 2 discusses the basic steps involved in GP-based n-category classification. In Section 3, we shall present the merit of integrating FSP with the GP classifier. Section 4 gives a description of the framework for the proposed FSP. Section 5 presents the experimental results for GP with FSP. Section 6 gives the conclusion.
Section snippets
GP-based n-category pattern classification
In this section, we will outline how we have applied GP for n-category pattern classification problem [11]. The given data set that contains representative samples of the n classes is divided into a training set and a validation set. The training set is used for obtaining the GP classifier and the validation set is used for evaluating the GP classifier.
GP classifier and feature space partitioning
As mentioned earlier, in any pattern classification application, discrimination among classes is done by developing a discriminant function that accepts the input feature vector and assigns one of the n-classes. We will consider a simple problem to illustrate the merit of localized learning due to FSP for the GP classifier. In this experiment based on synthetic data, the evolution of the GPCEs is done with the GP parameters shown in Table 1 except that the number of generations was limited to
Feature space partitioning for data analysis
Clustering techniques, hierarchical classification and data analysis in different sub-spaces have been studied in pattern classification. While clustering techniques attempt to determine some groups in the data distribution, hierarchical classification creates a tree structure for classification. Given n classes, a tree structure can be developed and at each node, a discriminant function successively divides the data into two groups and at the lowest level, the class is assigned to the input.
GP classifier with FSP
The basic framework for GP-based pattern classification with FSP can be divided into the following steps:
- 1.
Partitioning of the feature space into sub-regions of the feature space so that regions with lesser number of classes can be identified from regions with more number of classes.
- 2.
Creation of sub-region specific data sets.
- 3.
GP-based pattern classification in each sub-region of feature space is done independently.
- 4.
Combination of results obtained in different sub-regions of the feature space.
Conclusions
GP is an evolutionary approach to solve problems. We have extended the GP paradigm to the n class problem by formulating it as n 2-class problems. As the number of classes n increases, it increases the complexity of the pattern classification problem because more number of classes have to be resolved. So, a better understanding of the underlying data distribution of the various classes in different regions of the feature space is needed to reduce misclassification. In this paper, we have
References (12)
- et al.
Pattern Classification and Scene Analysis
(1973) - et al.
Classification of multispectral remote sensing data using a back propogation neural network
IEEE Trans. Geosci. Remote Sensing
(1992) - et al.
Neural network approach to land cover mapping
IEEE Trans. Geosci. Remote Sensing
(1994) - L.C. Pretorius, C. Nel, Feature extraction fro ECG for classification by artificial neural networks, in: IEEE Fifth...
Genetic Algorithms in Search, Optimization and Machine Learning
(1989)Genetic Programming: On the Programming of Computers by Means of Natural
(1992)
Cited by (26)
An artificial neural network-based model for roping prediction in aluminum alloy sheet
2023, Acta MaterialiaCitation Excerpt :The ratio of some classes in datasets is so large that the network becomes biased, resulting in high accuracy for the majority classes but poor performance for the minority classes [33]. According to research [34], learning algorithms’ effectiveness can be greatly degraded by the uneven distribution of class samples. In the training phase, the adaptive momentum estimation (ADAM) is adopted as the optimizer with an initial learning rate of 10−5 and a weight decay rate of 10−6.
Computational socioeconomics
2019, Physics ReportsCitation Excerpt :Their algorithm employed gray level co-occurrence matrices [127] to generate texture and edge patterns from satellite imagery that are useful in urban land cover classification. Liao et al. [128] presented a high-accuracy population mapping method that integrates genetic programming (GP) [129] and genetic algorithms (GA) [130] with geographic information systems (GIS). Specifically, they applied GIS to identify relevant factors (e.g., land-cover types and transport infrastructure) and use GP and GA to transform census data to population grids.
A CBR-based fuzzy decision tree approach for database classification
2010, Expert Systems with ApplicationsDynamic population variation in genetic programming
2009, Information SciencesPopulation variation in genetic programming
2007, Information SciencesThalassaemia classification by neural networks and genetic programming
2007, Information Sciences