Elsevier

Pattern Recognition

Volume 93, September 2019, Pages 404-417
Pattern Recognition

Genetic programming for multiple-feature construction on high-dimensional classification

https://doi.org/10.1016/j.patcog.2019.05.006Get rights and content

Highlights

  • Genetic programming (GP) is the most suitable technique for feature construction. This paper investigates what are the key factors and how they influence the performance of different approaches to GP for multiple feature construction on highdimensional data.

  • In terms of representation, a multi-tree representation achieves better classification performance than a single-tree representation.

  • In terms of evaluation, an appropriate combination of filter measures is more effective and efficient than a hybrid combination of wrapper and filter.

  • In multi-tree GP for feature construction, the class-dependent constructed features achieved significantly better classification performance than the class-independent ones.

Abstract

Data representation is an important factor in deciding the performance of machine learning algorithms including classification. Feature construction (FC) can combine original features to form high-level ones that can help classification algorithms achieve better performance. Genetic programming (GP) has shown promise in FC due to its flexible representation. Most GP methods construct a single feature, which may not scale well to high-dimensional data. This paper aims at investigating different approaches to constructing multiple features and analysing their effectiveness, efficiency, and underlying behaviours to reveal the insight of multiple-feature construction using GP on high-dimensional data. The results show that multiple-feature construction achieves significantly better performance than single-feature construction. In multiple-feature construction, using multi-tree GP representation is shown to be more effective than using the single-tree GP thanks to the ability to consider the interaction of the newly constructed features during the construction process. Class-dependent constructed features achieve better performance than the class-independent ones. A visualisation of the constructed features also demonstrates the interpretability of the GP-based FC approach, which is important to many real-world applications.

Introduction

In machine learning, data representation is a critical factor contributing to the performance of machine learning and pattern recognition methods. With the advances in data collection technologies, more and more high-dimensional data are collected. These datasets can have thousands of features or more. They may or may not interact with each other under unknown rules to determine the class label. Inducing patterns from these datasets is challenging for many common learning algorithms due to the curse of dimensionality. Furthermore, there may exist a large number of irrelevant and redundant features that are not actually useful in learning the target concept. The presence of these features may obscure the effect of the relevant features on showing the hidden pattern of the data and thereby reduce the representation of the whole feature set [1]. Therefore, automatic data transformation to obtain a smaller and more discriminating feature set becomes an important process for effective machine learning and pattern recognition [2]. Feature learning has shown to be effective in speech recognition [3], face recognition [4], robotics [5], disease detection [6], etc.

A popular approach to automatic feature learning is the neural network (NN) based deep learning where high-level features are generated in hidden layers of neurons from the input images [7], video [8], and text [9]. However, how to design an appropriate deep NN architecture for a specific problem typically still requires a lot of trial and error or expert knowledge of the field. Furthermore, effectively training a deep NN usually requires a significant amount of data, which may not be available in many applications. These issues make feature learning less popular in problems that can not meet these requirements. This is where Feature Construction (FC), which is one type of feature learning, can be used to automatically learn more discriminating features from the data.

Genetic programming (GP) is an evolutionary computation technique that evolves a population of solutions or individuals based on the idea of the survival of the fittest. Using genetic operators such as crossover and mutation, GP can evolve better offspring from fittest parents evaluated based on some objective set in the fitness function. This evolutionary principle is the same as in genetic algorithms (GAs). However, while GAs work only on vector-based representation, GP can work on more flexible representations such as trees or graphs. The tree-based representation provides a natural representation for FC where a constructed feature can be represented as a tree with features or constants in the leaf nodes serving as arguments of the internal nodes which can be arithmetic operators. A GP individual can represent a single feature (i.e. single-tree representation) or multiple features (i.e multi-tree representation).

With a flexible tree-based representation and a population-based search, GP has shown to be effective in automatically constructing a more discriminating feature without requiring a predefined model and a huge amount of training data [10]. Furthermore, the built-in feature selection process allows GP to select more relevant features to form the new feature. This ability is especially beneficial for big data, where a large number of features are collected before a specific task is created. This means that many features can be irrelevant to the task. Feeding all of them into the learning algorithms may unnecessarily increase the running time and degrade their performance. FC is also used for dimensionality reduction, especially for high-dimensional data. The number of constructed features can be very small compared to the original number of features, which significantly reduces the data size and helps machine learning methods improve their performance. In addition, the features constructed by GP have better interpretability than those constructed in the hidden layers of NNs. With these advantages, FC using GP could be an alternative choice for problems that deep learning approaches cannot provide a good solution.

Many GP-based FC methods have been proposed to construct a single feature as an augmentation or multiple features as a replacement of the original feature set. In the case of high-dimensional data, augmenting a single constructed feature cannot obtain a different performance [10]. On the other hand, how to create a small set of constructed features that can significantly improve the discriminating ability of the data is still challenging. An investigation of different approaches to multiple-feature construction using GP is needed to get an insight into this task. There are different types of multiple-feature construction methods, which can be categorised based on the representation, the evaluation method, and whether a constructed feature is class-dependent (i.e. the feature is constructed particularly for a particular class) or class-independent.

Representation: Multiple-feature construction methods have been proposed using both single-tree [11], [12] and multi-tree [13], [14] representation. When using single-tree representation, GP can construct multiple features by using all possible subtrees [11] or some subtrees under predefined special nodes [12]. Another approach is to run single-tree GP multiple times, each time constructs a new feature [15]. Multi-tree GP was also proposed and shown to be effective in constructing multiple features on datasets with about tens of features [13], [14] as well as thousands of features [16].

Evaluation: During the FC process, different types of methods can be used to evaluate the constructed features. They are filter, wrapper, embedded or combination of these approaches [17]. While filter methods use some measure such as information gain or correlation to evaluate the constructed features [15], wrapper methods use a classification algorithm to evaluate them [12]. Although wrapper methods are usually computationally more expensive than filter methods, they usually obtain better classification accuracy. On the other hand, filters are usually more general than wrappers. Combination of the two measures was also proposed to better evaluate the constructed feature set [18]. Since a GP tree (or the constructed feature) can be used as a binary classifier, its classification performance can be used to evaluate its performance. This scenario is referred to as embedded FC [10].

Class-dependency: In addition to the choice between single or multi-tree representation, constructing class-dependent or class-independent features is another option in designing FC methods. Most of the proposed GP-based FC methods are class-independent [11], [13], where a high-level feature is constructed without focusing on any class of the problem. In contrast, each class-dependent constructed feature in [15] aims at distinguishing instances of one class from the other classes. However, the method is limited to construct one feature for each class, which may not scale well to high-dimensional data. Recently, a multiple class-dependent FC method for high-dimensional data [18] was proposed and shown to achieve better performance than the class-independent FC methods.

Although FC using GP has been studied for decades, most of the methods are applied on datasets with tens of features. Compare to feature selection, the search space of FC is larger since it requires to choose not only a good feature subset but also an appropriate set of operators to combine them for a more discriminating feature. This makes FC on high-dimensional data a challenging task. There have been many studies investigating GP operators and GP itself, but not much on GP for FC, especially multiple-feature construction on high-dimensional data. It is necessary to investigate what are the key factors and how they influence the performance of different approaches to GP for FC.

In this study, three multiple-feature construction methods are investigated including two methods using multi-tree representation, namely the class-independent (MCIFC) [16] and the class-dependent (CDFC) [18], and one method using single-tree representation (1TGPFC) proposed by Neshatian et al. [15] to construct class-dependent features. Performance of the constructed features by the three methods will be compared using the classification performance of common learning algorithms including k-Nearest Neighbour (KNN), Naive Bayes (NB) and Decision Tree (DT). Although this study is based on the two previously published conference papers, this paper substantially extends the two small conference papers by providing more comparisons and analysis between different approaches in designing key components of GP-based feature construction methods including: (1) Using single-tree (1TGPFC) versus multi-tree (CDFC) representation for multiple-feature construction; (2) Using different evaluation methods to evaluate constructed features during the evolutionary process; (3) Constructing class-dependent versus class-independent features within the same setting. More analysis is also provided to reveal the insights of the proposed approach by (4) visualising the constructed features and (5) comparing with more baseline methods.

Section snippets

Genetic programming algorithm

Algorithm 1 shows the pseudo code of a standard GP algorithm for multiple-feature construction using multi-tree representation to construct m new features.

Single-Tree GP-Based feature construction

One of the early GP based FC methods using single-tree representation was proposed by Raymer et al. [19]. It aimed to improve a previously proposed GA to evolve weights that can transform each original features into a new one. GP was proposed to enable non-linear transformation for each feature. Results on a water displacement problem with

Multiple-feature construction methods

To make this paper easy to follow, this section will briefly describe the class-independent and class-dependent multiple-feature construction methods that will be investigated as they were proposed in the original papers.

Datasets

In order to investigate different GP-based FC approaches that generally work on high-dimensional data, our experiments use eight gene expression datasets with thousands of features to tens of thousands of features. These datasets are popularly used in studies addressing high dimensionality reduction [34], [35]. Details about these datasets are shown in Table 1. The small number of instances of these datasets is due to the nature of gene expression data in which the cost of collecting one sample

Results and discussions

This section has five sections. Section 5.1 compares the multiple constructed features with the single constructed feature. Section 5.2 discusses the results of two approaches to multiple-feature construction using single-tree and multi-tree GP. Then comparisons between MCIFC and CDFC are presented in Sections 5.3 and 5.4 to reveal the effectiveness of different fitness evaluation functions and the class-dependent versus class-independent multiple-feature construction approaches. Finally,

Conclusions

The goal of this study was to investigate the performance of different approaches to multiple-feature construction on high-dimensional data using GP. Different methods were compared in the same context to reveal the important factors. Multiple-feature construction has shown to be more effective than single-feature construction. Using multi-tree GP representation achieved better results than single-tree GP thanks to the ability to consider the interaction of the newly constructed features during

Acknowledgment

This work was supported in part by the Marsden Fund of New Zealand Government under Contracts VUW1509 and VUW1615, Huawei Industry Fund E2880/3663, and the University Research Fund at Victoria University of Wellington 209862/3580, and 213150/3662.

Binh Tran (S’14) received her B.E. in Computer Science from Cantho University, Vietnam, in 1998, the M.Sc. degree in Applied Computer Science from Free University of Brussels, Belgium, in 2002, the Ph.D degree in computer science in 2018 at Victoria University of Wellington, New Zealand. She is currently a Post Doctoral Research Fellow in the School of Engineering and Computer Science at Victoria University ofWellington. Her research interests are in evolutionary computation, feature

References (40)

  • Z. Zhu et al.

    Markov blanket-embedded genetic algorithm for gene selection

    Pattern Recognit.

    (2007)
  • T. Afouras et al.

    Deep audio-visual speech recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2018)
  • Y.N. Dauphin et al.

    Language modeling with gated convolutional networks

    Proceedings of the International Conference on Machine Learning (ICML)

    (2017)
  • B. Tran et al.

    Genetic programming for feature construction and selection in classification on high-dimensional data

    Memetic Comput.

    (2015)
  • S. Ahmed et al.

    A New GP-Based wrapper feature construction approach to classification and Biomarker identification

    Proceedings of the IEEE Congress on Evolutionary Computation

    (2014)
  • M. Garcia-Limon et al.

    Simultaneous generation of prototypes and features through genetic programming

    Proceedings of the Annual Conference on Genetic and Evolutionary Computation

    (2014)
  • M. Smith et al.

    Genetic programming with a genetic algorithm for feature construction and selection

    Genet. Program. Evol. Mach.

    (2005)
  • K. Neshatian et al.

    A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming

    IEEE Trans. Evol. Comput.

    (2012)
  • B. Tran et al.

    Multiple Feature Construction in Classification on High-dimensional Data Using Gp

    Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI)

    (2016)
  • B. Tran et al.

    Class dependent multiple feature construction using genetic programming for high-dimensional data

    Proceedings of the AI Advances in Artificial Intelligence, Vol. 10400 of Lecture Notes in Computer Science

    (2017)
  • Cited by (69)

    • PS-Tree: A piecewise symbolic regression tree

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      Thus, by transforming the feature construction problem into a multi-objective optimization problem, we hope to be able to apply our algorithm to a broader range of scenarios. In this paper, we directly apply two widely used generation operators, subtree crossover and subtree mutation, in our algorithm, as in other GP methods [42]. The subtree crossover operator generates two new individuals by randomly exchanging two subtrees of two old individuals, whereas the subtree mutation operator generates a new individual by stochastically replacing a portion of the old individual with a randomly generated subtree.

    View all citing articles on Scopus

    Binh Tran (S’14) received her B.E. in Computer Science from Cantho University, Vietnam, in 1998, the M.Sc. degree in Applied Computer Science from Free University of Brussels, Belgium, in 2002, the Ph.D degree in computer science in 2018 at Victoria University of Wellington, New Zealand. She is currently a Post Doctoral Research Fellow in the School of Engineering and Computer Science at Victoria University ofWellington. Her research interests are in evolutionary computation, feature manipulation including feature selection and construction, high dimensional data, and machine learning.

    Ms. Tran is a member of the IEEE Computational Intelligence Society (CIS). She has been serving as a reviewer for over 10 international journals and conferences in the field.

    Bing Xue (M’10) received the B.Sc. degree from the Henan University of Economics and Law, Zhengzhou, China, in 2007, the M.Sc. degree in management from Shenzhen University, Shenzhen, China, in 2010, and the Ph.D. degree in computer science in 2014 at Victoria University of Wellington, New Zealand. She is currently a Senior Lecturer in School of Engineering and Computer Science at Victoria University of Wellington. Her research focuses mainly on evolutionary computation, feature selection, feature construction, multi-objective optimisation, image analysis, transfer learning, data mining, and machine learning. She has over 100 papers published in fully refereed international journals and conferences and most of them are on evolutionary feature selection and construction.

    Dr Xue is currently the Chair of the IEEE Task Force on Evolutionary Feature Selection and Construction, IEEE Computational Intelligence Society (CIS), Vice-Chair of the IEEE CIS Data Mining and Big Data Analytics Technical Committee, and Vice-Chair of IEEE CIS Task Force on Transfer Learning and Transfer Optimisation. She is also an Associate Editor/member of Editorial Board for five international journals and a reviewer of over 50 international journals. Dr Xue is the Finance Chair of IEEE Congress on Evolutionary Computation (CEC) 2019, a Program Co-Chair of the 31th Australasian AI 2018, ACALCI 2018, and the 7th International Conference on SoCPaR2015, and she is also a tutorial chair, special session chair, or publicity chair for many other international conferences.

    Mengjie Zhang (M’04-SM’10) received the B.E. and M.E. degrees from Artificial Intelligence Research Center, Agricultural University of Hebei, Hebei, China, and the Ph.D. degree in computer science from RMIT University, Melbourne, VIC, Australia, in 1989, 1992, and 2000, respectively. He is currently Professor of Computer Science, Head of the Evolutionary Computation Research Group, and the Associate Dean (Research and Innovation) in the Faculty of Engineering. His current research interests include evolutionary computation, particularly genetic programming, particle swarm optimization, and learning classifier systems with application areas of image analysis, multi-objective optimization, feature selection and reduction, job shop scheduling, and transfer learning. He has published over 350 research papers in refereed international journals and conferences.

    Prof. Zhang is a Fellow of Royal Society of New Zealand and have been a Panel member of the Marsden Fund (New Zealand Government Funding). He is also a senior member of IEEE and a member of ACM. He is currently chairing the IEEE CIS Intelligent Systems and Applications Technical Committee, and the immediate Past Chair for the IEEE CIS Emergent Technologies Technical Committee and the Evolutionary Computation Technical Committee, and a member of the IEEE CIS Award Committee. He is a vice-chair of the IEEE CIS Task Force on Evolutionary Feature Selection and Construction, a vice-chair of the Task Force on Evolutionary Computer Vision and Image Processing, and the founding chair of the IEEE Computational Intelligence Chapter in New Zealand. He is also a committee member of the IEEE NZ Central Section.

    View full text