Multidimensional genetic programming for multiclass classification
Introduction
Feature selection and feature construction play fundamental roles in the application of machine learning (ML) to classification. Feature selection makes it possible, for example, to reduce high-dimensional datasets to a manageable size, and to refine experimental designs through measurement selection in some domains. The ML community has become increasingly aware of the need for automated and flexible feature engineering methods to complement the large set of classification methodologies that are now widely available in open-source packages such as Weka and Scikit-Learn [10,30]. Typical classification pipelines treat feature selection and feature construction as pre-processing steps, in which the attributes in the dataset are selected according to some heuristic [9] and then projected into more complex feature spaces using e.g. kernel functions [29]. In both cases the feature pre-processing is often conducted in a trial-and-error way rather than being automated or intrinsic to the learning method. The use of non-linear feature expansions can also lead to classifiers that are black-box, making it difficult for researchers to gain insight into the modelled process by studying the model itself. In this paper we investigate a multiclass classification strategy designed to integrate feature selection, construction and model intelligibility goals into a distance-based classifier to improve its ability to build accurate and simple classifiers.
A well known learning method that implicitly conducts feature selection and construction is genetic programming (GP) [17], which has been proposed for classification [7,15]. GP incorporates feature selection and construction by optimizing a population of programs constructed from a set of instructions that operate on the dataset features to produce a model. Compared to traditional ML approaches such as logistic regression and decision tree classification, GP makes fewer a priori assumptions about the data [22] and allows for various program representations [26]. In addition, GP has well-established methods for optimizing the intelligibility of models [37]. There have been some promising real-world applications of GP to binary classification [43], but recent work has focused on extending GP to the multi-class classification problem [14,28], in which there are more than two outcomes to estimate. This previous work suggests that traditional GP fares worse in comparison to other classification methods in the multiclass setting. However, two recent GP-based methods, M2GP [14] and M3GP [28] were shown to perform on par with several other ML strategies in recent studies.
The performance improvements observed by M2GP and M3GP stemmed the incorporation of a distance-based classification strategy into a multi-output GP system. We recently proposed a new method [20] called M4GP, that, although inspired by M2GP and M3GP, significantly improves these two methods. In this paper we extend M4GP by introducing an archiving strategy and by comparing it to recently published methods on data challenges from two different domains. The contributions of this work are:
- •
M4GP uses a novel (stack-based) program representation, that simplifies the construction of multidimensional solutions compared to M2GP and M3GP (which, instead, used a tree-based representation). This makes the evolutionary process of M4GP more efficient and the final solutions more expressive, readable and easy to understand.
- •
M4GP incorporates a multiobjective parent selection and survival technique that allows it to clearly and consistently outperform M2GP and M3GP on a wide set of test problems. To the best of our knowledge, this technique had never been used for multi-class classification before.
- •
We introduce an archiving strategy that maintains a set of optimal trade-off solutions based on complexity and accuracy. The final model is selected from this archive using an internal validation set to reduce ovefitting.
The paper is organized as follows: Section 2 presents M4GP. In Section 3 we discuss previous and related work, focusing on the similarities and differences between M2GP, M3GP and M4GP. Section 4 describes our experimental study, presenting the used test problems and the experimental settings. In Section 5, we discuss the obtained experimental results. Finally, Section 6 concludes the paper.
Section snippets
M4GP
In multiclass classification (classification into more than two classes), we wish to find a mapping that associates the vector of attributes with K > 2 class labels from the set using n paired examples from the training set .
One way to conduct classification is to measure the similarity of each attribute to the bulk properties of the attributes within each class, and then assign the label corresponding to the most similar group. This strategy is embodied
Related work
GP has been used extensively for evolving classification functions ŷ(x) directly [7,15,26]. In application to multiple classes, the discriminant functions evolved by GP must be thresholded, or the problem must be split into several binary classification problems [7]. To overcome the need for arbitrary thresholds in multiclass problems, M2GP proposed a multi-output GP that evolved Φ(x) and used the nearest centroid approach (Eq. (1)) [14]. M2GP demonstrated in particular that Mahalanobis
Experimental analysis
The experimental analysis of M4GP is divided into three sections. First we conduct benchmark comparisons, comparing M4GP to alternative GP strategies and to results from related GP literature. In the subsequent two section, we benchmark M4GP against published results from two different studies, one concerned with human activity recognition [3,33], and the other with disease prediction from genome-wide association studies [38]. Performance is quantified in a number of ways: 1) by classification
Results
The best classifiers generated by M4GP for each trial are compared first to benchmark methods in §5.1 and then to other published results in §5.2 and §5.3.
Discussion and conclusion
A new computational method for multi-class classification, based on GP, was studied in this paper. The new method is called M4GP, and it represents an improvement upon M2GP, M3GP and eM3GP, previous state-of-the-art techniques for multi-class classification with GP. It extends these methods by introducing a stack-based data flow, integrating advanced selection methods, and maintaining a Pareto archive that preserves concise models and integrates into the final model selection step. M4GP
Acknowledgments
The authors would like to thank Mauro Castelli for his feedback as well as members of the Computational Intelligence Laboratory at Hampshire College. This work is partially supported by the National Science Foundation (NSF)-sponsored IGERT: Offshore Wind Energy Engineering, Environmental Science, and Policy (Grant Number 1068864), as well as Grant No. 1017817, 1129139, and 1331283. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors
References (44)
- et al.
The Opportunity challenge: a benchmark database for on-body sensor-based activity recognition
Pattern Recogn. Lett.
(2013) - et al.
Genetic programming-based feature transform and classification for the automatic detection of pulmonary nodules on computed tomography images
Inf. Sci.
(Dec. 2012) - et al.
A relevance feedback method based on genetic programming for classification of remote sensing images
Inf. Sci.
(July 2011) - et al.
Evolutionary compact embedding for large-scale image classification
Inf. Sci.
(Sept. 2015) - et al.
Building predictive models via feature synthesis
The distributed genetic algorithm revisited
- et al.
XGBoost: a scalable tree boosting system
- et al.
A survey on the application of genetic programming to classification
Syst. Man Cybern. Part C: Appl. Rev. IEEE Trans.
(2010) - et al.(2001)
- et al.
An introduction to variable and feature selection
J. Mach. Learn. Res.
(Mar. 2003)
The WEKA data mining software: an update
ACM SIGKDD Explor. Newsl.
Solving uncompromising problems with lexicase selection
IEEE Trans. Evol. Comput.
Improving Neural Networks by Preventing Co-adaptation of Feature Detectors
Improving genetic programming based symbolic regression using deterministic machine learning
A multi-dimensional genetic programming approach for multi-class classification problems
Application of genetic programming for multicategory pattern classification
Evolut. Comput. IEEE Trans.
Estimating attributes: analysis and extensions of RELIEF
Genetic Programming: on the Programming of Computers by Means of Natural Selection
Genetic programming-based construction of features for machine learning and knowledge discovery tasks
Genet. Program. Evolvable Mach.
Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods
Genetic programming representations for multi-dimensional feature learning in biomedical classification
Epsilon-lexicase selection for regression
Cited by (46)
PS-Tree: A piecewise symbolic regression tree
2022, Swarm and Evolutionary ComputationCitation Excerpt :For future research, on the one hand, incorporating more advanced multi-objective optimization methods into PS-Tree [57] may improve the performance of constructed features, thereby improving the prediction accuracy of the final model. On the other hand, integrating PS-Tree with other advanced feature construction algorithms [58,59], is a promising direction for improving its performance. In addition, incorporating other simple classifiers, such as logistic regression or support vector machine, into the PS-Tree framework to deal with the classification problem is also a potential research direction.
A novel binary classification approach based on geometric semantic genetic programming
2022, Swarm and Evolutionary ComputationCitation Excerpt :This encoding allowed for achieving better performance thanks to the implementation of an advanced parent selection technique that led to more accurate classifiers. Finally, the extension presented in [21] uses a stack-based program representation, which permits a further simplification of the construction of multidimensional solutions. This extension also incorporates a multiobjective parent selection and survival technique, as well as an archiving strategy that maintains a set of optimal solutions, taking into account their complexity and accuracy.
Multi-objective particle swarm optimization with adaptive strategies for feature selection
2021, Swarm and Evolutionary ComputationCitation Excerpt :As the number of features increases, the classification accuracy decreases sharply and the time of training increases quickly due to many noisy, irrelevant, and redundant features on some datasets [2]. To eliminate these features, three important dimensionality reduction technologies are proposed including feature extraction, feature construction, and feature selection [3]. Among the dimensionality reduction technologies, feature selection can reduce the influence of these features by extracting a small meaningful feature subset from the original features [4].
A cooperative coevolution framework for evolutionary learning and instance selection
2021, Swarm and Evolutionary ComputationCitation Excerpt :GP utilizes the expression tree to express a formula, a model, or even a program, and adopts crossover and mutation to recombine solutions. Many studies adopted or extended GP to learn regression model [6], classifier [33,34], or program [35]. Advanced GP methods are developed by considering different representations [36,37], variation operators [38,39], different dialect of EAs [40,41], and their combinations [34,41].
A novel error-correcting output codes based on genetic programming and ternary digit operators
2021, Pattern RecognitionCitation Excerpt :The tree structure provides GP flexibility to fit into diverse types of problems. For example, by setting features as the terminal nodes, GP can pick up multiple features with various operators to implement feature construction [21], or tackle the multiclass classification task [22]. These studies inspire us to design a new GP based ECOC algorithm by taking advantage of its flexibility.
Framework of model selection criteria approximated genetic programming for optimization function for renewable energy systems
2020, Swarm and Evolutionary Computation