Elsevier

Swarm and Evolutionary Computation

Volume 44, February 2019, Pages 260-272
Swarm and Evolutionary Computation

Multidimensional genetic programming for multiclass classification

https://doi.org/10.1016/j.swevo.2018.03.015Get rights and content

Abstract

We describe a new multiclass classification method that learns multidimensional feature transformations using genetic programming. This method optimizes models by first performing a transformation of the feature space into a new space of potentially different dimensionality, and then performing classification using a distance function in the transformed space. We analyze a novel program representation for using genetic programming to represent multidimensional features and compare it to other approaches. Similarly, we analyze the use of a distance metric for classification in comparison to simpler techniques more commonly used when applying genetic programming to multiclass classification. Finally, we compare this method to several state-of-the-art classification techniques across a broad set of problems and show that this technique achieves competitive test accuracies while also producing concise models. We also quantify the scalability of the method on problems of varying dimensionality, sample size, and difficulty. The results suggest the proposed method scales well to large feature spaces.

Introduction

Feature selection and feature construction play fundamental roles in the application of machine learning (ML) to classification. Feature selection makes it possible, for example, to reduce high-dimensional datasets to a manageable size, and to refine experimental designs through measurement selection in some domains. The ML community has become increasingly aware of the need for automated and flexible feature engineering methods to complement the large set of classification methodologies that are now widely available in open-source packages such as Weka and Scikit-Learn [10,30]. Typical classification pipelines treat feature selection and feature construction as pre-processing steps, in which the attributes in the dataset are selected according to some heuristic [9] and then projected into more complex feature spaces using e.g. kernel functions [29]. In both cases the feature pre-processing is often conducted in a trial-and-error way rather than being automated or intrinsic to the learning method. The use of non-linear feature expansions can also lead to classifiers that are black-box, making it difficult for researchers to gain insight into the modelled process by studying the model itself. In this paper we investigate a multiclass classification strategy designed to integrate feature selection, construction and model intelligibility goals into a distance-based classifier to improve its ability to build accurate and simple classifiers.

A well known learning method that implicitly conducts feature selection and construction is genetic programming (GP) [17], which has been proposed for classification [7,15]. GP incorporates feature selection and construction by optimizing a population of programs constructed from a set of instructions that operate on the dataset features to produce a model. Compared to traditional ML approaches such as logistic regression and decision tree classification, GP makes fewer a priori assumptions about the data [22] and allows for various program representations [26]. In addition, GP has well-established methods for optimizing the intelligibility of models [37]. There have been some promising real-world applications of GP to binary classification [43], but recent work has focused on extending GP to the multi-class classification problem [14,28], in which there are more than two outcomes to estimate. This previous work suggests that traditional GP fares worse in comparison to other classification methods in the multiclass setting. However, two recent GP-based methods, M2GP [14] and M3GP [28] were shown to perform on par with several other ML strategies in recent studies.

The performance improvements observed by M2GP and M3GP stemmed the incorporation of a distance-based classification strategy into a multi-output GP system. We recently proposed a new method [20] called M4GP, that, although inspired by M2GP and M3GP, significantly improves these two methods. In this paper we extend M4GP by introducing an archiving strategy and by comparing it to recently published methods on data challenges from two different domains. The contributions of this work are:

  • M4GP uses a novel (stack-based) program representation, that simplifies the construction of multidimensional solutions compared to M2GP and M3GP (which, instead, used a tree-based representation). This makes the evolutionary process of M4GP more efficient and the final solutions more expressive, readable and easy to understand.

  • M4GP incorporates a multiobjective parent selection and survival technique that allows it to clearly and consistently outperform M2GP and M3GP on a wide set of test problems. To the best of our knowledge, this technique had never been used for multi-class classification before.

  • We introduce an archiving strategy that maintains a set of optimal trade-off solutions based on complexity and accuracy. The final model is selected from this archive using an internal validation set to reduce ovefitting.

Thanks to these improvements, M4GP is able to improve the best known GP methods for multi-class classification, and finds results that are competitive with the state-of-the-art methods for the studied problems (a set of 26 classification problems, ranging in numbers of classes, attributes and samples). Furthermore, on a set of biomedical data sets with up to 5000 attributes, M4GP is shown to perform on par with state-of-the-art methods while producing smaller models in less time. All these features foster M4GP as the new state-of-the-art multi-class classification method with GP.

The paper is organized as follows: Section 2 presents M4GP. In Section 3 we discuss previous and related work, focusing on the similarities and differences between M2GP, M3GP and M4GP. Section 4 describes our experimental study, presenting the used test problems and the experimental settings. In Section 5, we discuss the obtained experimental results. Finally, Section 6 concludes the paper.

Section snippets

M4GP

In multiclass classification (classification into more than two classes), we wish to find a mapping ŷ(x):RpC that associates the vector of attributes xRp with K > 2 class labels from the set C={1K} using n paired examples from the training set T={(xi,yi),i=1n}.

One way to conduct classification is to measure the similarity of each attribute to the bulk properties of the attributes within each class, and then assign the label corresponding to the most similar group. This strategy is embodied

Related work

GP has been used extensively for evolving classification functions ŷ(x) directly [7,15,26]. In application to multiple classes, the discriminant functions evolved by GP must be thresholded, or the problem must be split into several binary classification problems [7]. To overcome the need for arbitrary thresholds in multiclass problems, M2GP proposed a multi-output GP that evolved Φ(x) and used the nearest centroid approach (Eq. (1)) [14]. M2GP demonstrated in particular that Mahalanobis

Experimental analysis

The experimental analysis of M4GP is divided into three sections. First we conduct benchmark comparisons, comparing M4GP to alternative GP strategies and to results from related GP literature. In the subsequent two section, we benchmark M4GP against published results from two different studies, one concerned with human activity recognition [3,33], and the other with disease prediction from genome-wide association studies [38]. Performance is quantified in a number of ways: 1) by classification

Results

The best classifiers generated by M4GP for each trial are compared first to benchmark methods in §5.1 and then to other published results in §5.2 and §5.3.

Discussion and conclusion

A new computational method for multi-class classification, based on GP, was studied in this paper. The new method is called M4GP, and it represents an improvement upon M2GP, M3GP and eM3GP, previous state-of-the-art techniques for multi-class classification with GP. It extends these methods by introducing a stack-based data flow, integrating advanced selection methods, and maintaining a Pareto archive that preserves concise models and integrates into the final model selection step. M4GP

Acknowledgments

The authors would like to thank Mauro Castelli for his feedback as well as members of the Computational Intelligence Laboratory at Hampshire College. This work is partially supported by the National Science Foundation (NSF)-sponsored IGERT: Offshore Wind Energy Engineering, Environmental Science, and Policy (Grant Number 1068864), as well as Grant No. 1017817, 1129139, and 1331283. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors

References (44)

  • M. Hall et al.

    The WEKA data mining software: an update

    ACM SIGKDD Explor. Newsl.

    (2009)
  • T. Helmuth et al.

    Solving uncompromising problems with lexicase selection

    IEEE Trans. Evol. Comput.

    (2014)
  • G.E. Hinton et al.

    Improving Neural Networks by Preventing Co-adaptation of Feature Detectors

    (July 2012)
  • I. Icke et al.

    Improving genetic programming based symbolic regression using deterministic machine learning

  • V. Ingalalli et al.

    A multi-dimensional genetic programming approach for multi-class classification problems

  • J.K. Kishore et al.

    Application of genetic programming for multicategory pattern classification

    Evolut. Comput. IEEE Trans.

    (2000)
  • I. Kononenko

    Estimating attributes: analysis and extensions of RELIEF

  • J.R. Koza

    Genetic Programming: on the Programming of Computers by Means of Natural Selection

    (1992)
  • K. Krawiec

    Genetic programming-based construction of features for machine learning and knowledge discovery tasks

    Genet. Program. Evolvable Mach.

    (2002)
  • W. La Cava et al.

    Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods

  • W. La Cava et al.

    Genetic programming representations for multi-dimensional feature learning in biomedical classification

  • W. La Cava et al.

    Epsilon-lexicase selection for regression

  • Cited by (46)

    • PS-Tree: A piecewise symbolic regression tree

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      For future research, on the one hand, incorporating more advanced multi-objective optimization methods into PS-Tree [57] may improve the performance of constructed features, thereby improving the prediction accuracy of the final model. On the other hand, integrating PS-Tree with other advanced feature construction algorithms [58,59], is a promising direction for improving its performance. In addition, incorporating other simple classifiers, such as logistic regression or support vector machine, into the PS-Tree framework to deal with the classification problem is also a potential research direction.

    • A novel binary classification approach based on geometric semantic genetic programming

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      This encoding allowed for achieving better performance thanks to the implementation of an advanced parent selection technique that led to more accurate classifiers. Finally, the extension presented in [21] uses a stack-based program representation, which permits a further simplification of the construction of multidimensional solutions. This extension also incorporates a multiobjective parent selection and survival technique, as well as an archiving strategy that maintains a set of optimal solutions, taking into account their complexity and accuracy.

    • Multi-objective particle swarm optimization with adaptive strategies for feature selection

      2021, Swarm and Evolutionary Computation
      Citation Excerpt :

      As the number of features increases, the classification accuracy decreases sharply and the time of training increases quickly due to many noisy, irrelevant, and redundant features on some datasets [2]. To eliminate these features, three important dimensionality reduction technologies are proposed including feature extraction, feature construction, and feature selection [3]. Among the dimensionality reduction technologies, feature selection can reduce the influence of these features by extracting a small meaningful feature subset from the original features [4].

    • A cooperative coevolution framework for evolutionary learning and instance selection

      2021, Swarm and Evolutionary Computation
      Citation Excerpt :

      GP utilizes the expression tree to express a formula, a model, or even a program, and adopts crossover and mutation to recombine solutions. Many studies adopted or extended GP to learn regression model [6], classifier [33,34], or program [35]. Advanced GP methods are developed by considering different representations [36,37], variation operators [38,39], different dialect of EAs [40,41], and their combinations [34,41].

    • A novel error-correcting output codes based on genetic programming and ternary digit operators

      2021, Pattern Recognition
      Citation Excerpt :

      The tree structure provides GP flexibility to fit into diverse types of problems. For example, by setting features as the terminal nodes, GP can pick up multiple features with various operators to implement feature construction [21], or tackle the multiclass classification task [22]. These studies inspire us to design a new GP based ECOC algorithm by taking advantage of its flexibility.

    View all citing articles on Scopus
    View full text