Elsevier

Information Sciences

Volume 240, 10 August 2013, Pages 1-20
Information Sciences

An interpretable classification rule mining algorithm

https://doi.org/10.1016/j.ins.2013.03.038Get rights and content

Abstract

Obtaining comprehensible classifiers may be as important as achieving high accuracy in many real-life applications such as knowledge discovery tools and decision support systems. This paper introduces an efficient Evolutionary Programming algorithm for solving classification problems by means of very interpretable and comprehensible IF-THEN classification rules. This algorithm, called the Interpretable Classification Rule Mining (ICRM) algorithm, is designed to maximize the comprehensibility of the classifier by minimizing the number of rules and the number of conditions. The evolutionary process is conducted to construct classification rules using only relevant attributes, avoiding noisy and redundant data information. The algorithm is evaluated and compared to nine other well-known classification techniques in 35 varied application domains. Experimental results are validated using several non-parametric statistical tests applied on multiple classification and interpretability metrics. The experiments show that the proposal obtains good results, improving significantly the interpretability measures over the rest of the algorithms, while achieving competitive accuracy. This is a significant advantage over other algorithms as it allows to obtain an accurate and very comprehensible classifier quickly.

Introduction

Discovering knowledge in large amounts of data collected over the last decades has become significantly challenging and difficult, especially in large-scale databases. Data mining (DM) [60] involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. Classification and regression are two forms of data analysis which can be used to extract models describing important data classes or to predict future data trends. Classification predicts categorical labels whereas regression models predict continuous-valued functions.

The data analysis tools used for DM include statistical models, mathematical methods, and machine learning algorithms. Classification is a common task in supervised machine learning with the search for algorithms that learn from training examples to produce predictions about future examples.

Classification has been successfully solved using several approaches [26]. On the one hand, there are approaches such as artificial neural networks (ANN) [46], support vector machines (SVM) [16], and instance-based learning methods [2]. These approaches obtain accurate classification models but they must be regarded as black boxes, i.e., they are opaque to the user. Opaque predictive models prevent the user from tracing the logic behind a prediction and obtaining interesting knowledge previously unknown from the model. These classifiers do not permit human understanding and inspection, they are not directly interpretable by an expert and it is not possible to discover which are the relevant attributes to predict the class of an example. This opacity prevents them from being used in many real-life knowledge discovery applications where both accuracy and comprehensibility are required, such as medical diagnosis [55], credit risk evaluation [42], and decision support systems [6], since the prediction model must explain the reasons for classification.

On the other hand, there are machine learning approaches which overcome this limitation and provide transparent and comprehensible classifiers such as decision trees [62] and rule-based systems [49]. Evolutionary Algorithms [65], and specifically Evolutionary Programming (EP) [13], [64] and Genetic Programming (GP) [25], have been successfully applied to build decision trees and rule-based systems easily. Rule-based systems are especially user-friendly and offer compact, understandable, intuitive and accurate classification models. To obtain comprehensibility, accuracy is often sacrificed by using simpler but transparent models, achieving a trade-off between accuracy and comprehensibility. Even though there are many rule based classification models, it has not been until recently that the comprehensibility of the models is becoming a more relevant objective. Proof of this trend is found in recent studies of issue [18], [27], [34], [57], i.e, the comprehensibility of the models is a new challenge as important as accuracy. This paper focuses on the interpretability, trying to reach more comprehensible models than most of the current proposals and thus covering the needs of many application domains that require greater comprehensibility than the provided by current methods.

This paper presents an EP approach applied to classification problems to obtain comprehensible rule-based classifiers. This algorithm, called ICRM (Interpretable Classification Rule Mining), is designed to obtain a base of rules with the minimum number of rules and conditions, in order to maximize its interpretability, while obtaining competitive accuracy results. The algorithm uses an individual = rule representation, following the Iterative Rule Learning (IRL) model. Individuals are constructed by means of a context-free grammar [33], [61], which establishes a formal definition of the syntactical restrictions of the problem to be solved and its possible solutions, so that only grammatically correct individuals are generated. Next, the most important characteristics of the algorithm are detailed. Firstly, the algorithm guarantees obtaining the minimum number of rules. This is possible because it generates one rule per class, together with a default class prediction, which is assigned when none of the available rules are triggered. Moreover, it is guaranteed that there are no contradictory or redundant rules, i.e., there is no pair of rules with the same antecedents and different consequents. Finally, it also guarantees the minimum number of conditions forming the antecedents of these rules, which is achieved by selecting only the most relevant and discriminating attributes that separate the classes in the attribute domains.

The experiments carried out on 35 different data sets and nine other algorithms show the competitive performance of our proposal in terms of predictive accuracy and execution time, obtaining significantly better results than all the other algorithms in terms of all the interpretability measures considered: the minimum number of rules, minimum number of conditions per rule, and minimum number of conditions of the classifier. The experimental study includes a statistical analysis based on the Bonferroni–Dunn [24] and Wilcoxon [59] non-parametric tests [28], [29] in order to evaluate whether there are statistically differences in the results of the algorithms.

This paper is structured as follows. Section 2 briefly reviews the related background works. Section 3 describes the ICRM algorithm. Section 4 describes the experimental study whose results are discussed in Section 5. Finally, Section 6 draws some conclusions raised from the work.

Section snippets

Background

This section introduces the accuracy vs interpretability problem and discusses the interpretability definition and metrics. Finally, it briefly reviews the most important works related to genetic rule-based classification systems in recent years.

The ICRM algorithm

This section describes the most relevant features and the execution model of the ICRM algorithm. This paper presents a comprehensive and extended version of the ICRM algorithm, whose initial results were reported in [17]. The algorithm consists of three phases. In the first phase, the algorithm creates a pool of rules that explore the attribute domains. In the second phase, the algorithm iterates to find classification rules and builds the classifier. Finally, the third phase optimizes the

Experimental study

This section describes the details of the experiments performed in various problem domains to evaluate the capabilities of the proposal and compare it to other classification methods.

The experiments carried out compare the results of the ICRM algorithm and nine other classification algorithms over 35 data sets. These data sets were collected from the KEEL repository website [3] and the algorithms are available on the KEEL software tool [4]. The data sets together with their partitions are

Results

This section discusses the experimental results and compares our method to different algorithms. In order to demonstrate the effectiveness and efficiency of our model, the accuracy, the execution time, and the different interpretability measures are evaluated.

Conclusion

In this paper we have proposed an interpretable classification rule mining (ICRM) algorithm, which is an interpretable and efficient rule-based evolutionary programming classification algorithm. The algorithm solves the cooperation–competition problem by dealing with the interaction among the rules during the evolutionary process. The proposal minimizes the number of rules, the number of conditions per rule, and the number of conditions of the classifier, increasing the interpretability of the

Acknowledgments

This work has been supported by the Regional Government of Andalusia and the Ministry of Science and Technology, projects P08-TIC-3720 and TIN-2011–22408, FEDER funds, and Ministry of Education FPU Grant AP2010–0042.

References (66)

  • J. Huysmans et al.

    An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models

    Decision Support Systems

    (2011)
  • D. Martens et al.

    Comprehensible credit scoring models using rule extraction from support vector machines

    European Journal of Operational Research

    (2007)
  • C. Nguyen et al.

    A genetic design of linguistic terms for fuzzy rule based classifiers

    International Journal of Approximate Reasoning

    (2013)
  • M. Paliwal et al.

    Neural networks and statistical techniques: a review of applications

    Expert Systems with Applications

    (2009)
  • S. Tsumoto

    Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model

    Information Sciences

    (2004)
  • W. Verbeke et al.

    Building comprehensible customer churn prediction models with advanced rule induction techniques

    Expert Systems with Applications

    (2011)
  • T. Wiens et al.

    Three way k-fold cross-validation of resource selection functions

    Ecological Modelling

    (2008)
  • J. Yang et al.

    Effective search for pittsburgh learning classifier systems via estimation of distribution algorithms

    Information Sciences

    (2012)
  • A. Zafra et al.

    G3P-MI: a genetic programming algorithm for multiple instance learning

    Information Sciences

    (2010)
  • J.S. Aguilar-Ruiz et al.

    Natural encoding for evolutionary supervised learning

    IEEE Transactions on Evolutionary Computation

    (2007)
  • D.W. Aha et al.

    Instance-based learning algorithms

    Machine Learning

    (1991)
  • J. Alcalá-Fdez et al.

    KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework, analysis framework

    Journal of Multiple-Valued Logic and Soft Computing

    (2011)
  • J. Alcalá-Fdez et al.

    KEEL: a software tool to assess evolutionary algorithms for data mining problems

    Soft Computing

    (2009)
  • J. Alonso et al.

    Hilk++: an interpretability-guided fuzzy modeling methodology for learning readable and comprehensible fuzzy rule-based classifiers

    Soft Computing

    (2011)
  • R. Axelrod

    The Complexity of Cooperation: Agent-based Models of Competition and Collaboration

    (1997)
  • J. Bacardit et al.

    Bloat control and generalization pressure using the minimum description length principle for a Pittsburgh approach learning classifier system

  • J. Bacardit et al.

    Performance and efficiency of memetic Pittsburgh learning classifier systems

    Evolutionary Computation

    (2009)
  • E. Bernadó-Mansilla et al.

    Accuracy-based learning classifier systems: models, analysis and applications to classification tasks

    Evolutionary Computation

    (2003)
  • M.V. Butz et al.

    Toward a theory of generalization and learning in XCS

    IEEE Transactions on Evolutionary Computation

    (2004)
  • A. Cano, A. Zafra, S. Ventura, An ep algorithm for learning highly interpretable classifiers, in: Proceedings of the...
  • M. Cintra et al.

    On rule learning methods: a comparative analysis of classic and fuzzy approaches

    Studies in Fuzziness and Soft Computing

    (2013)
  • W. Cohen, Fast effective rule induction, in: Proceedings of the 12th International Conference on Machine Learning,...
  • Cited by (0)

    View full text