Abstract
In this manuscript, we identify and evaluate some of the most used optimization models for rule extraction using genetic programming-based algorithms. Six different models, which combine the most common fitness functions, were tested. These functions employ well-known metrics such as support, confidence, sensitivity, specificity, and accuracy. The models were then applied in the assessment of the performance of a single algorithm in several real classification problems. Results were compared using two different criteria: accuracy and sensitivity/specificity. This comparison, which was supported by statistical analysis, pointed out that the use of the product of sensitivity and specificity provides a more realistic estimation of classifier performance. It was also shown that the accuracy metric can make the classifier biased, especially in unbalanced databases.
Similar content being viewed by others
References
Assis C, Pereira A, Pereira M, Carrano E (2013) Using genetic programming to detect fraud in electronic transactions. In: Proceedings of the 19th Brazilian symposium on Multimedia and the web. ACM, pp 337–340
Assis C, Pereira A, Pereira M, Carrano EG, et al (2014) A genetic programming approach for fraud detection in electronic transactions. In: 2014 IEEE symposium on computational intelligence in cyber security (CICS). IEEE, pp 1–8
Aydogan EK, Karaoglan I, Pardalos PM (2012) Hga: hybrid genetic algorithm in fuzzy rule-based classification systems for high-dimensional problems. Appl Soft Comput 12(2):800–806. doi:10.1016/j.asoc.2011.10.010
Berlanga F, Rivera A, del Jesus M, Herrera F (2010) Gp-coach: genetic programming-based learning of compact and accurate fuzzy rule-based classification systems for high-dimensional problems. Inf Sci 180(8):1183–1200. doi:10.1016/j.ins.2009.12.020
Carrano E, Wanner E, Takahashi R (2011) A multicriteria statistical based comparison methodology for evaluating evolutionary algorithms. IEEE Trans Evol Comput 15(6):848–870. doi:10.1109/TEVC.2010.2069567
Chan K, Ling S, Dillon T, Nguyen H (2011) Diagnosis of hypoglycemic episodes using a neural network based rule discovery system. Expert Syst Appl 38(8):9799–9808. doi:10.1016/j.eswa.2011.02.020
Choi WJ, Choi TS (2012) Genetic programming-based feature transform and classification for the automatic detection of pulmonary nodules on computed tomography images. Inf Sci 212:57–78. doi:10.1016/j.ins.2012.05.008
Coenen F, Leng P (2007) The effect of threshold values on association rule based classification accuracy. Data Knowl Eng 60(2):345–360
Cohen PR (1995) Empirical methods for artificial intelligence. MIT Press, Cambridge
Edwards D, Metz C (2007) Optimization of restricted roc surfaces in three-class classification tasks. IEEE Trans Med Imaging 26(10):1345–1356. doi:10.1109/TMI.2007.898578
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer, New York
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977. doi:10.1007/s00500-008-0392-y
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco
Ishibuchi H, Yamamoto T (2004) Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets Syst 141(1):59–88. doi:10.1016/S0165-0114(03)00114-3
Izmailov R, Bassu D, McIntosh A, Ness L, Shallcross D (2015) Application of multi-scale singular vector decomposition to vessel classification in overhead satellite imagery. In: Seventh international conference on digital image processing (ICDIP15). International Society for Optics and Photonics, pp 963,108–963,108
Jabeen H, Baig AR (2013) Two-stage learning for multi-class classification using genetic programming. Neurocomputing 116:311–316. doi:10.1016/j.neucom.2012.01.048
Jovic A, Bogunovic N (2011) Electrocardiogram analysis using a combination of statistical, geometric, and nonlinear heart rate variability features. Artif Intell Med 51(3):175–186. doi:10.1016/j.artmed.2010.09.005
Jowett D (1976) SIAM Rev 18(1):134–137. http://www.jstor.org/stable/2029021
Koshiyama AS, Vellasco MM, Tanscheit R (2015) Gpfis-class: a genetic fuzzy system based on genetic programming for classification problems. Appl Soft Comput 37:561–571. doi:10.1016/j.asoc.2015.08.055
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
Kumudha P, Venkatesan R, Radhika E (2015) Product metrics based predictive classification of software using rar mining and naive bayesapproach. Int J Appl Eng Res 10(7):17375–17391
Kuo CS, Hong TP, Chen CL (2007) Applying genetic programming technique in classification trees. Soft Comput 11(12):1165–1172. doi:10.1007/s00500-007-0159-x
Márquez-Vera C, Cano A, Romero C, Ventura S (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38(3):315–330. doi:10.1007/s10489-012-0374-8
Pereira MA, Davis-Júnior CA, Vasconcelos JA (2010) A niched genetic programming algorithm for classification rules discovery in geographic databases. Simulated Evolution and Learning, vol 6457. Lecture Notes in Computer Science. Springer, Berlin Heidelberg, pp 260–269
Pereira MA, Davis-Júnior CA, Carrano EG, Vasconcelos JA (2014) A niching genetic programming-based multi-objective algorithm for hybrid data classification. Neurocomputing 133:342–357. doi:10.1016/j.neucom.2013.12.048
Prasenna P, Ramana AR, Kumar RK, Devanbu A (2012) Network programming and mining classifier for intrusion detection using probability classification. In: 2012 International conference on pattern recognition, informatics and medical engineering (PRIME), pp 204–209. doi:10.1109/ICPRIME.2012.6208344
Romero C, Zafra A, Luna JM, Ventura S (2013) Association rule mining using genetic programming to provide feedback to instructors from multiple-choice quiz data. Expert Syst 30(2):162–172
Shimada K, Hirasawa K, Hu J (2006) Class association rule mining with chi-squared test using genetic network programming. In: SMC. IEEE, pp 5338–5344
Sikora M (2011) Induction and pruning of classification rules for prediction of microseismic hazards in coal mines. Expert Syst Appl 38(6):6748–6758. doi:10.1016/j.eswa.2010.11.059
Touati H, Ras Z, Studnicki J (2015) Meta-actions as a tool for action rules evaluation. Stud Comput Intell 584:177–197
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann series in data management systems, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco
Yang G, Mabu S, Shimada K, Hirasawa K (2011) An evolutionary approach to rank class association rules with feedback mechanism. Expert Syst Appl 38(12):15,040–15,048. doi:10.1016/j.eswa.2011.05.042
Zafra A, Ventura S (2010) G3p-mi: a genetic programming algorithm for multiple instance learning. Inf Sci 180(23):4496–4513. doi:10.1016/j.ins.2010.07.031
Zafra A, Ventura S (2012) Multi-objective approach based on grammar-guided genetic programming for solving multiple instance problems. Soft Comput 16(6):955–977
Zafra A, Romero C, Ventura S (2013) Dral a tool for discovering relevant e-activities for learners. Knowl Inf Syst 36(1):211–250. doi:10.1007/s10115-012-0531-8
Acknowledgements
We thank the Laboratory of Evolutionary Computation (UFMG) and the UCI Machine Learning Repository for having provided the datasets used in the experiments. This work has been supported in part by CNPq, CAPES and FAPEMIG. They are Brazilian agencies in charge of Fostering Scientific and Technological Development. They are governmental agencies of Brazil and the Minas Gerais state. They have so only the interest in generating relevant knowledge to the general society, so any kind of conflict of interest between them is discarded.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Appendix: Detailed results of each model in the datasets
Appendix: Detailed results of each model in the datasets
1.1 Balance scale
The accuracy and the product of sensitivity per specificity for the Balance scale dataset are shown in Tables 16 and 17, respectively. These values are reported in an average ± standard deviation notation.
The ordered models, according to the statistical procedure for a confidence level of 0.05, are presented in Fig. 4. The model 3 was the best with regard to accuracy, followed by 4 and 6. On the other hand, the models 4 and 6 are the best ones with regard to sensitivity per specificity product. Methods 2 and 5 lied in the last position for both criteria.
1.2 Base 1 dataset
The average accuracy obtained in the base 1 dataset is shown in Table 18.
In Table 19 is shown the product of sensitivity per specificity obtained in base 1 dataset. It is detailed the average values obtained in each class and the average global value in each model.
The ordered models, according to the statistical procedure for a confidence level of 0.05, are presented in Fig. 5 there is a tie between models 3, 6 and 4, according to accuracy. When the comparison criterion is the product, models 6 and 4 outperform the others.
1.3 Base 2 dataset
The accuracy and the product of sensitivity per specificity for the base 2 dataset are shown in Tables 20 and 21, respectively. These values are reported in an Average ± Standard Deviation notation.
The ordered models, according to the statistical procedure for aconfidence level of 0.05, are presented in Fig. 6. Again, the 3 model outperforms the other models when considered the first comparison criterion. Models 4 and 6 obtain the best performance according to the second comparison criterion.
1.4 Base 3 dataset
The average accuracy obtained in the base 3 dataset is shown in Table 22.
In Table 23 is shown the product of sensitivity per specificity obtained in base 3 dataset. The average values obtained in each class and the average global value in each model are detailed.
The ordered models, according to the statistical procedure for a confidence level of 0.05, are presented in Fig. 7. In this dataset, in both comparison criteria, models 3, 4 and 6 were tied.
1.5 Climate model simulation crashes dataset
The accuracy and the product of sensitivity per specificity for the climate model simulation crashes dataset are shown in Tables 24 and 25, respectively. These values are reported in an average ± standard deviation notation.
The ordered models, according to the statistical procedure for a confidence level of 0.05, are presented in Fig. 8. The model 3 is followed by model 6, according to the comparison based on accuracy. Thirdly come the models 1 and 4. On the other hand, according to the product criterion, models 4 and 6 are tied in the first place.
1.6 Flooding risk and infrastructure level databases
The results obtained in flooding risk presented model 3 as the best, considering accuracy (Table 26). Using the other criterion, models 4 and 6 were the best (Table 27).
On the other hand, considering the infrastructure level, models 4 and 3 were the best, according to accuracy (Table 28). Models 4 and 6 were the best according to the second comparison criterion (Table 29). The p-value between the models can be seen in Figs. 9 and 10.
Table 30 shows the percentage obtained in each class in flooding risk dataset in models 3, 4 and 6. Model 3 obtained the best values according to accuracy as comparison criteria, but in Table 30a, c, a predominance of false negative and true negative is observed, which indicates that the rules classified the most part of the samples as not belonging to the target classes. Additionally, it is important to remember that the target classes in (a) and (c) are minority in the dataset. A complementary scenario is observed in Table 30b, where the rules classified the most part of the sample as belonging to the target class, which is also the predominant class.
The confusion matrices obtained in model 4 are more balanced, regardless of whether it is a predominant class (Table 30e) or not (Table 30d, f). Model 6 obtained values that are very close to the ones obtained in model 4, considering all criteria analyzed. In addition, the confusion matrices of these two models are very close: only the values for class “high” presented difference (Table 30d, g).
A similar situation is observed in infrastructure level dataset, where models 3, 4 and 6 can be highlighted. However, considering the product of sensitivity and specificity as comparison criteria, model 6 presents better values than model 4.
In Table 31 is shown the percentage obtained in each class in infrastructure level dataset by models 3, 4 and 6.
As observed in the previous dataset, model 3 presented the best values of accuracy. Even in minority classes (high and medium for infrastructure level), model 3 presented high accuracy values. This happened because the number of TN was very high, increasing the accuracy in all cases. It is important to highlight the situation observed in class medium of the infrastructure dataset, in which the number of TP was zero, making the product measure equal to zero too. On the other hand, models 4 and 6 presented the best values of specificity/sensitivity product. Another important issue to note is that the use of product in the optimization model did not imply a drastic reduction in the accuracy. Sometimes the value presented in models 4 and 6 are similar or even equal to that ones measured to model 3. For instance, the accuracies of class high in flooding risk were the same in models 3 and 4.
1.7 Wine dataset
The accuracy and the product of sensitivity per specificity for the wine dataset are shown in Tables 32 and 33, respectively. These values are reported in an average ± standard deviation notation.
In Table 33 is shown the product of sensitivity per specificity obtained in wine dataset. The average values obtained in each class and the average global value in each model are detailed.
The ordered models, according to the statistical procedure for a confidence level of 0.05, are presented in Fig. 11. Models 6 and 4 presented a good performance in both criteria. In the first, accuracy, they are tied with model 3, all in the first position. In the second criterion, the product, they precede model 3 and the other models.
Rights and permissions
About this article
Cite this article
Pereira, M.A., Carrano, E.G., Davis Júnior, C.A. et al. A comparative study of optimization models in genetic programming-based rule extraction problems. Soft Comput 23, 1179–1197 (2019). https://doi.org/10.1007/s00500-017-2836-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-017-2836-8