Elsevier

Expert Systems with Applications

Volume 38, Issue 9, September 2011, Pages 10932-10939
Expert Systems with Applications

Experimental evaluation of two new GEP-based ensemble classifiers

https://doi.org/10.1016/j.eswa.2011.02.135Get rights and content

Abstract

The paper proposes applying Gene Expression Programming (GEP) to induce ensemble classifiers. Two new algorithms inducing such classifiers are proposed. The proposed ensemble classifiers use two different measures to select genes produced by the Gene Expression Programming procedure. Selection of genes from the set of the non-dominated ones in the process of meta-learning is supported by a genetic algorithm. Integration of genes (i.e. learners) is based on the majority voting. The proposed algorithms were validated experimentally using several datasets and the results were compared with those of other well established classification methods.

Highlights

► Gene Expression Programming is used to define two new ensemble classifiers. ► Two quality measures are proposed and used in gene selection. ► Validation experiments were performed and results compared with other methods. ► Both classifiers give good classification accuracy. ► Both classifiers are competitive in terms of area under ROC curve.

Introduction

Gene Expression Programming introduced by Ferreira (2001) is an automatic programming approach. In GEP computer programs are represented as linear character strings of fixed-length called chromosomes which, in the subsequent fitness evaluation, can be expressed as expression trees of different sizes and shapes. The approach has flexibility and power to explore the entire search space, which comes from the separation of genotype and phenotype. As it has been observed by Ferreira (2006) GEP can be used to design decision trees, with the advantage that all the decisions concerning the growth of the tree are made by the algorithm itself without any human input, that is, the growth of the tree is totally determined and refined by evolution.

The ability of GEP to generate decision trees makes it a natural tool for solving classification problems. Ferreira (2006) showed several example applications of GEP including classification. Weinert and Lopes (2006) apply GEP to the data mining task of classification by inducing rules. The authors proposed a new method for rule encoding and genetic operators that preserve rule integrity. They also implemented a system, named GEPCLASS which allows for the automatic discovery of flexible rules, better fitted to data. Duan, Tang, Zhang, Wei, and Zhang (2006) claimed to improve efficiency of GEP used as a classification tool. Their contribution includes proposing new strategies for generating the classification threshold dynamically and designing a new approach called Distance Guided Evolution Algorithm. Zeng, Xiang, Chen, and Liu (2007) proposed a novel Immune Gene Expression Programming as a tool for rule mining. Another approach to evolving classification rules with Gene Expression Programming was proposed in Zhou, Xiao, Tirpak, and Nelson (2003). A different example of GEP application to classification problems was proposed by Li, Zhou, Xiao, and Nelson (2005). They proposed a new representation scheme based on prefix notation which brings some advantages as compared with the traditional approach. Wang et al. (2006) proposed a GEP decision tree system. The system can construct decision tree for classification without prior knowledge about the distribution of data. Karakasis and Stafylopatis (2006) proposed a hybrid evolutionary technique for data mining tasks, which combines the Clonal Selection Principle with Gene Expression Programming. The authors claim that their approach outperforms GEP in terms of convergence rate and computational efficiency.

In this paper we propose two GEP-induced ensemble classifiers and report on the results of the validating computational experiment. Ensemble methods first solve a classification problem by creating multiple learners, each able to solve the task independently, then use a procedure specified by the particular ensemble method for selecting and integrating individual learners. The proposed ensemble classifiers use two different measures to select genes produced by the Gene Expression Programming procedure. Selection of genes from the set of the non-dominated ones in the process of meta-learning is supported by a genetic algorithm. Integration of genes (i.e. learners) is based on the majority voting.

The paper is organized as follows: in Section 2 the idea of using Gene Expression Programming to induce classifiers is explained. We also introduce the domination relation and describe learning and meta-learning procedures. In Section 3 the results of an extensive computational experiment are presented and discussed. Section 4 contains conclusions and suggestions for future research.

Section snippets

Using Gene Expression Programming to induce classifiers

Consider data classification problem. In what follows C is the set of categorical classes which are denoted 1,  , C∣. We assume that the learning algorithm is provided with the training set TD = {<d, c >  d  D, c  C}  D × C, where D is the space of attribute vectors d=(w1d,,wnd) with wid being symbolic or numeric values. The learning algorithm is used to find the best possible approximation f¯ of the unknown function f such that f(d) = c. Then f¯ can be used to find the class c¯=f¯(d¯) for any d¯D-TD|D.

As

Computational experiment results

To evaluate the proposed approach computational experiment has been carried out. The experiment involved the following 2-classes datasets from the UCI Machine Learning Repository (Asuncion & Newman, 2007): Wisconsin Breast Cancer (WBC), Diabetes, Sonar, Australian Credit (ACredit), German Credit (GCredit), Cleveland Heart (Heart), Hepatitis and Ionosphere. Basic characteristics of these sets are shown in Table 1.

In the reported experiment the following classification tools have been used: GEP-A

Conclusions

The paper proposes two new GEP-induced ensemble classifiers. Main contribution of the paper can be summarized as follows:

  • Two quality measures used in GEP-A and GEP-B, respectively, are proposed and used in the process of gene selection.

  • Non-dominance relation between genes is defined and used in the process of gene selection.

  • Class specific GEP learning procedure is proposed and implemented.

  • Meta-learning algorithm supported by genetic programming is proposed and implemented.

The resulting

References (16)

  • A. Asuncion et al.

    UCI machine learning repository

    (2007)
  • G.W. Corder et al.

    Nonparametric statistics for non-statisticians: A step-by-step approach

    (2009)
  • Duan, L., Tang, C., Zhang, T., Wei, D., & Zhang, H., 2006. Distance guided classification with gene expression...
  • Fawcett, T. (2003). ROC graphs: Notes and practical considerations for researchers. HP Labs Tech Report HPL-2003-4,...
  • C. Ferreira

    Gene expression programming: A new adaptive algorithm for solving problems

    Complex Systems

    (2001)
  • C. Ferreira

    Gene expression programming

    Studies in Computational Intelligence

    (2006)
  • Je¸drzejowicz, J., & Je¸drzejowicz, P. (2008). GEP-induced expression trees as weak classifiers, In P. Perner (Ed.),...
  • Je¸drzejowicz, J., & Je¸drzejowicz, P. (2009). A family of GEP-induced ensemble classifiers, In N.T. Nguyen, R....
There are more references available in the full text version of this article.

Cited by (19)

  • GEP-based classifier for mining imbalanced data

    2021, Expert Systems with Applications
    Citation Excerpt :

    Using GEP-based classifiers is justified by two factors. Several studies have shown that various variants of GEP-based learners perform well on difficult classification problems (see, for example, Jedrzejowicz & Jedrzejowicz, 2011; Li et al., 2007; Lv et al., 2017; Omkar et al., 2012; Zhou et al., 2003). The second factor is that expression trees produced by GEP-based classifiers can be transformed into a set of rules which are comprehensible to users.

  • A Novel Rotation Forest Modality Based on Hybrid NNs: RF (ScPSO-NN)

    2019, Journal of King Saud University - Computer and Information Sciences
    Citation Excerpt :

    There are many skillful classifier systems found in literature, but higher classification performance is generally obtained by utilizing from complex or hybrid classification structures. Je and Je (2011) generated Gene Expression Programming (GEP) based ensemble classifiers (GEP-A and GEP-B) and tested the techniques on WBC and PID datasets. GEP-B generally preceded others on various trials that achieved 97.21% (WBC) and 78.12% (PID) classification accuracies.

  • A new ensemble method for gold mining problems: Predicting technology transfer

    2012, Electronic Commerce Research and Applications
    Citation Excerpt :

    Varying these three factors in designing an ensemble allows current ensemble methods to be fully sophisticated enough to address gold mining problems. There is also a variety of classifier-inducing methods such as gene expression programming (Jedrzejowicz and Jedrzejowicz 2011), decision trees, support vector machines, artificial neural networks, parametric regression models (Borra and Di Ciaccio 2002) and generalized additive models (De Bock et al. 2010). The commonality among these classifier-inducing methods is that the training cases are randomly grouped.

  • Collective of Base Classifiers for Mining Imbalanced Data

    2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
View all citing articles on Scopus
View full text