Skip to main content

Advertisement

Log in

A Grammar-Guided Genetic Programing Algorithm for Associative Classification in Big Data

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

The state-of-the-art in associative classification includes interesting approaches for building accurate and interpretable classifiers. These approaches generally work on four different phases (data discretization, pattern mining, rule mining, and classifier building), some of them being computational expensive. The aim of this work is to propose a novel evolutionary algorithm for efficiently building associative classifiers in Big Data. The proposed model works in only two phases (a grammar-guided genetic programming framework is performed in each phase): (1) mining reliable association rules; (2) building an accurate classifier by ranking and combining the previously mined rules. The proposal has been implemented on different architectures (multi-thread, Apache Spark and Apache Flink) to take advantage of the distributed computing. The experimental results have been obtained on 40 well-known datasets and analyzed through non-parametric tests. Results were compared to multiple approaches in the field and analyzed on three ways: quality of the predictions, level of interpretability, and efficiency. The proposed method obtained accurate and interpretable classifiers in an efficient way even on high-dimensional data, outperforming the state-of-the-art algorithms on three different levels: quality of the predictions, interpretability, and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Fernández A, del Río S, Chawla N V, Herrera F. An insight into imbalanced big data classification: outcomes and challenges. Complex &, Intelligent Systems 2017;3(2):105–20.

    Article  Google Scholar 

  2. Chen H, Chiang R, Storey V. Business intelligence and analytics: from big data to big impact. MIS Quarterly: Management Information Systems 2012;36(4):1165–88.

    Article  Google Scholar 

  3. Cambria E, Chattopadhyay A, Linn E, Mandal B, White B. Storages are not forever. Cogn Comput 2017;9(5):646–58.

    Article  Google Scholar 

  4. Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. SIGMOD Rec 1993;22(2):207–16.

    Article  Google Scholar 

  5. Han J, Kamber M. 2011. Data mining: concepts and techniques. Morgan Kaufmann.

  6. Quinlan R. C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann Publishers; 1993.

    Google Scholar 

  7. Cortes C, Vapnik V. Support vector networks. Mach Learn 1995;20:273–97.

    Google Scholar 

  8. Thabtah FA. A review of associative classification mining. Knowl Eng Rev 2007;22(1):37–65.

    Article  Google Scholar 

  9. Asghar M Z, Khan A, Bibi A, Kundi F M, Ahmad H. Sentence-level emotion detection framework using rule-based classification. Cogn Comput 2017;9(6):868–94.

    Article  Google Scholar 

  10. Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. 4th International Conference on Knowledge Discovery and Data Mining(KDD98); 1998. p. 80–86.

  11. Bechini A, Marcelloni F, Segatori A. A MapReduce solution for associative classification of big data. Inf Sci 2016;332:33–55.

    Article  Google Scholar 

  12. Dean J, Ghemawat S. Mapreduce: Simplified data processing on large clusters. Communications of the ACM - 50th anniversary issue: 1958 - 2008 2008;51(1):107–13.

    Article  Google Scholar 

  13. Alcalá-Fdez J, Alcalá R, Herrera F. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Trans Fuzzy Syst 2011;19(5):857–72.

    Article  Google Scholar 

  14. Venturini L, Baralis E, Garza P. Scaling associative classification for very large datasets. Journal of Big Data 2017;4(1):44.

    Article  Google Scholar 

  15. Padillo F, Luna J M, Ventura S. Exhaustive search algorithms to mine subgroups on big data using Apache spark. Progress in Artificial Intelligence 2017;6(2):145–58.

    Article  Google Scholar 

  16. Ventura S, Luna JM. Pattern mining with evolutionary algorithms. New York: Springer International Publishing; 2016.

    Book  Google Scholar 

  17. Oneto L, Bisio F, Cambria E, Anguita D. SLT-based ELM for big social data analysis. Cogn Comput 2017;9(2):259–74.

    Article  Google Scholar 

  18. Kim S S, McLoone S, Byeon J H, Lee S, Liu H. Cognitively inspired artificial bee colony clustering for cognitive wireless sensor networks. Cogn Comput 2017;9(2):207–224.

    Article  Google Scholar 

  19. Al-Radaideh Q A, Bataineh DQ. A hybrid approach for arabic text summarization using domain knowledge and genetic algorithms. Cogn Comput 2018;10(4):651–69.

    Article  Google Scholar 

  20. Molina D, LaTorre A, Herrera F. An insight into bio-inspired and evolutionary algorithms for global optimization: review, analysis, and lessons learnt over a decade of competitions. Cogn Comput 2018;10(4):517–44.

    Article  Google Scholar 

  21. Siddique N, Adeli H. Nature inspired computing: an overview and some future directions. Cogn Comput 2015; 7(6):706–14.

    Article  Google Scholar 

  22. Lam C. Hadoop in action, 1st ed. Greenwich: Manning Publications Co.; 2010.

    Google Scholar 

  23. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I. Spark: cluster computing with working sets. Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud’10. Berkeley, CA, USA; 2010.

  24. Kumar C, Anjaiah P, Patil S, Lingappa E, Rakesh M. 2017. Mining association rules from NoSQL data bases using MapReduce fuzzy association rule mining algorithm.

  25. Martín D, Martínez-Ballesteros M, García-Gil D, Alcalá-Fdez J, Herrera F, Riquelme-Santos JC. MRQAR: a generic MapReduce framework to discover quantitative association rules in big data problems. Knowl-Based Syst 2018;153:176–92.

    Article  Google Scholar 

  26. McKay R I, Hoai N X, Whigham P A, Shan Y, O’Neill M. Grammar-based genetic programming: a survey. Genet Program Evolvable Mach 2010;11:365–96.

    Article  Google Scholar 

  27. Herrera F, Carmona C J, González P, del Jesus MJ. An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 2011;29(3):495–525.

    Article  Google Scholar 

  28. Luna JM, Padillo F, Pechenizkiy M, Ventura S. Apriori versions based on MapReduce for mining frequent patterns on big data. IEEE Trans Cybern 2017;PP(99):1–15.

    Google Scholar 

  29. Ben-David A. Comparison of classification accuracy using Cohen’s Weighted Kappa. Expert Syst Appl 2008; 34(2):825– 32.

    Article  Google Scholar 

  30. Triguero I, González S, Moyano J M, Garcîa S, Alcalá-Fdez J, Luengo J, et al. KEEL 3.0: an open source software for multi-stage analysis in data mining. Int J Comput Intell Syst 2017;10(1):1238–49.

    Article  Google Scholar 

  31. Yin X, Han J. CPAR: classification based on predictive association rules. 3rd SIAM International Conference on Data Mining(SDM03); 2003. p. 331–5.

  32. Li W, Han J, Pei J. CMAR: accurate and efficient classification based on multiple class-association rules. 2001 IEEE International Conference on Data Mining(ICDM01); 2001. p. 369–76.

  33. Liu B, Ma Y, Wong CK. . Classification Using Association Rules: Weaknesses and Enhancements. Kluwer Academic Publishers; 2001. p. 591–601.

  34. Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 2004;8(1):53–87.

    Article  Google Scholar 

  35. Cohen WW. Fast effective rule induction. Machine Learning: Proceedings of the 12th International Conference; 1995. p. 1–10.

  36. Tan K C, Yu Q, Ang J H. A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 2006;37(12):835–64.

    Article  Google Scholar 

  37. Holte R C. Very simple classification rules perform well on most commonly used datasets. Mach Learn 1993; 11:63–91.

    Article  Google Scholar 

  38. Segatori A, Bechini A, Ducange P, Marcelloni F. A distributed fuzzy associative classifier for big data. IEEE Trans Cybern 2018;48(9):2656–69.

    Article  Google Scholar 

  39. Fazzolari M, Alcalá R, Herrera F. A multi-objective evolutionary method for learning granularities based on fuzzy discretization to improve the accuracy-complexity trade-off of fuzzy rule-based classification systems: D-MOFARC algorithm. Appl Soft Comput 2014;24:470–81.

    Article  Google Scholar 

Download references

Funding

This research was financially supported by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund, projects TIN2017-83445-P.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Ventura.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Padillo, F., Luna, J.M. & Ventura, S. A Grammar-Guided Genetic Programing Algorithm for Associative Classification in Big Data. Cogn Comput 11, 331–346 (2019). https://doi.org/10.1007/s12559-018-9617-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-018-9617-2

Keywords

Navigation