Elsevier

Knowledge-Based Systems

Volume 255, 14 November 2022, 109611
Knowledge-Based Systems

Evolving ensembles using multi-objective genetic programming for imbalanced classification

https://doi.org/10.1016/j.knosys.2022.109611Get rights and content

Abstract

Multi-objective Genetic Programming (MGP) plays a prominent role in generating Pareto optimal classifier sets and making trade-offs among multiple classes adaptively. However, the existing MGP algorithms show poor performance and are difficult to implement when dealing with imbalanced classification problems. This work proposes a new MGP-based algorithm designed for imbalanced classification. Firstly, an efficient evolutionary strategy with nondominated sorting, environmental selection, and an archiving mechanism is designed to optimize the false positive rate, the false negative rate and reduce the size of the resulting tree. Then, a weighted ensemble decision is made according to each classifier’s performance in the majority and minority classes to obtain final classification results. Experimental results on 21 binary-class datasets and 17 multi-class datasets show that the proposed method outperforms existing ones in several commonly used imbalanced classification metrics.

Introduction

Classification of imbalanced data currently represents a great challenge in machine learning in the fields of medical diagnosis, fraud detection, and text categorization [1], [2], [3], [4], [5], [6]. Datasets are imbalanced when at least one class is rare (called a minority class), while other classes make up the rest (called a majority class).

Though classification approaches have shown great ability in machine learning tasks, they still struggle to tackle the problem of imbalanced training data, i.e., learning bias exists [7]. The re-weighting or sampling strategies are usually employed to solve imbalanced classification problems. However, there are still some deficiencies. For example, the weights need to be manually set, and the contribution of re-sampled data is weak. To our knowledge, genetic programming (GP) has the advantage of learning from imbalanced data. GP is an evolutionary computing technique based on the principles of evolution and natural selection which has been proved to be able to solve a series of real-world classification problems [8], [9]. They use the raw imbalanced training data in the learning process without the need to manually rebalance the class distributions. Some studies suggest that evolutionary-based methods outperform non-evolutionary models in imbalanced datasets analysis [10]. Compared with traditional sampling-based methods, it has two major advantages. First, it allows for the use of raw imbalanced data without prior sampling in the training process. Second, compared with single classifier, the combined knowledge of evolved classifiers can be used collaboratively in integration to achieve better generalization, and it provides an effective way to study complex real-world classification.

This motivated us to combine a multi-objective GP framework with efficient many-objective evolutionary algorithms (MaOEAs) to optimize multiple objectives such as classifier accuracy, classifier size, and the size of the solution tree, thereby evolving the GP classifier set, and further improving the overall algorithm performance.

In this paper, a multi-objective genetic programming (MGP) based algorithm is designed for high-performance imbalanced classification. It is combined with an efficient MaOEA and a weighted integrated decision strategy to evolve precise and diverse classifiers. Its performance is compared with standard (single prediction) GP and some excellent imbalanced classification methods. The main contributions are as follows:

(1) An efficient evolutionary strategy is integrated into MGP to optimize the false positive rate and the false negative rate while reducing the size of the solution tree. Fast nondominated sorting, environmental selection, and archiving mechanisms are adopted to implement a novel algorithm MGP+.

(2) Inspired by the ability of ensemble learning to improve the performance of classifiers, a comprehensive weighted ensemble decision is made based on the MGP+ framework, which is termed WMGP+.

(3) Experiments results validate that WMGP+, which uses MGP to evolve classifiers combined with a weighted ensemble decision strategy, has better performance than other competitive algorithms in dealing with imbalanced classification tasks on both binary-class and multi-class imbalanced datasets.

The rest of this paper is structured as follows. Existing imbalanced classification algorithms are reviewed in Section 2. Section 3 summarizes MGP+, which evolves the base classifiers. Section 4 presents details of WMGP+. Section 5 shows the experimental results and related comparisons. Section 6 concludes this paper and suggests future directions.

Section snippets

Overview of related work for class imbalance

Research on imbalanced classification algorithms can be roughly divided into two categories: data-level methods and algorithm-level methods. Data resampling is the most commonly used data-level method, including under-sampling and over-sampling algorithms. The former includes RU (random under-sampling), NCL (neighborhood cleaning rule) [11], and WU-SVM (weighted under-sampling support vector machine) [18]. The latter includes ADASYN (adaptive synthetic sampling approach) [12], SMOTE (small

MGP+

In imbalanced classification, it is more desired to minimize the false positive Fp and the false negative Fn, rather than to minimize classification error. To achieve this, we choose to use an MGP framework and optimize three objectives: (1) Fp, (2) Fn and (3) the number of leaf nodes Nt of the resulting tree. The goal is to improve classification accuracy while reducing the complexity of the final model. In the MGP framework, three learning objectives conflict with each other, where a set of

Weighted ensemble decision

After a group of optimized base classifiers is generated by MGP+, we adopt a weighted ensemble decision strategy to obtain final classification results.

Experiments and results

In this section, experiments are performed to test the proposed WMGP+. All experiments are conducted on a computer with an Intel Xeon(R) Gold 6130 CPU @ 2.10 GHz, 16-GB memory, and Ubuntu 16.04 operating system. The programming environment of experiments is Python 3.6.13. We first introduce the benchmark datasets from the repository of University of California at Irvine (UCI), experimental setting and evaluation metrics used for imbalanced class learning. Then, experimental results of

Conclusion

This paper presents a novel algorithm called WMGP+ to solve imbalanced classification problems by using multi-objective Genetic Programming and weighted ensemble decision making. In the process of population evolution, both population diversity and convergence are improved. WMGP+ determines the weight of a classifier according to its performance and makes a comprehensive weight set decision to get high-quality classification results. Based on the experimental results of binary classes and

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (51775385, 61703279); in part by the Strategy Research Project of Artificial Intelligence Algorithms of Ministry of Education of China (000011); in part by the Shanghai Industrial Collaborative Science and Technology Innovation Project, China (2021-cyxt2-kj10); in part by the Shanghai Municipal Science and Technology Major Project, China (2021SHZDZX0100); in part by the Science and Technology Project of Suzhou,

References (59)

  • BiY. et al.

    Genetic programming-based discriminative feature learning for low-quality image classification

    IEEE Trans. Cybern. Early Access

    (2021)
  • BiY. et al.

    Genetic programming with image-related operators and a flexible program structure for feature learning in image classification

    IEEE Trans. Evol. Comput.

    (2021)
  • HamidzadehJ. et al.

    Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem

    Eng. Appl. Artif. Intel.

    (2020)
  • J. Laurikkala, Improving identification of difficult small classes by balancing class distribution, in: Proc....
  • H. He, Y. Bai, E.A. Garcia, S. Li, ADASYN: Adaptive synthetic sampling approach for imbalanced learning, in: In Proc....
  • LinC.F. et al.

    Fuzzy support vector machines

    IEEE Trans. Neural Netw.

    (2002)
  • DongdongL. et al.

    Entropy-based hybrid sampling ensemble learning for imbalanced data

    Int. J. Intel. Syst.

    (2021)
  • ChawlaN.V. et al.

    SMOTE: Synthetic minority over-sampling technique

    J. Artificial Intelligence Res.

    (2002)
  • YuH. et al.

    Fuzzy support vector machine with relative density information for classifying imbalanced data

    IEEE Trans. Fuzzy Syst.

    (2019)
  • ZhaoJ. et al.

    A weighted hybrid ensemble method for classifying imbalanced data

    Knowl-Based Syst.

    (2020)
  • KangQ. et al.

    A distance-based weighted under sampling scheme for support vector machines and its application to imbalanced classification

    IEEE Trans. Neural Netw. Learn. Syst.

    (2018)
  • NguyenH.M. et al.

    Borderline over-sampling for imbalanced data classification

    Int. J. Knowl. Eng. Soft Data Paradig.

    (2011)
  • WangZ. et al.

    Sample and feature selecting based ensemble learning for imbalanced problems

    Appl. Soft Comput.

    (2021)
  • PeiW. et al.

    Genetic programming for development of cost-sensitive classifiers for binary high-dimensional unbalanced classification

    Appl. Soft Comput.

    (2021)
  • DevarriyaD. et al.

    Unbalanced breast cancer data classification using novel fitness functions in genetic programming

    Expert Syst. Appl.

    (2020)
  • KumarA. et al.

    A novel fitness function in genetic programming for medical data classification

    J. Biomed. Inform.

    (2020)
  • BhowanU. et al.

    Evolving diverse ensembles using genetic programming for classification with unbalanced data

    IEEE Trans. Evol. Comput.

    (2013)
  • BhowanU. et al.

    Reusing genetic programming for ensemble selection in classification of unbalanced data

    IEEE Trans. Evol. Comput.

    (2014)
  • E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the Strength Pareto Evolutionary Algorithm, TIK-Report, vol. 103,...
  • Cited by (22)

    • A hierarchical estimation of multi-modal distribution programming for regression problems

      2023, Knowledge-Based Systems
      Citation Excerpt :

      Gaussian and polynomial kernels are traditionally used in kernel-based methods to approximate the target function [17–20]. Genetic programming (GP) [21] is one of the evolutionary computation techniques that is used for solving different problems [22–26], and the regression problem is one of the most common [27–34]. GP has the benefit of not requiring the regression models to be specified beforehand to anticipate the outcome.

    • Meta-lasso: new insight on infection prediction after minimally invasive surgery

      2024, Medical and Biological Engineering and Computing
    View all citing articles on Scopus
    View full text