Skip to main content

Advertisement

Log in

On the adaptability of G3PARM to the extraction of rare association rules

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

To date, association rule mining has mainly focused on the discovery of frequent patterns. Nevertheless, it is often interesting to focus on those that do not frequently occur. Existing algorithms for mining this kind of infrequent patterns are mainly based on exhaustive search methods and can be applied only over categorical domains. In a previous work, the use of grammar-guided genetic programming for the discovery of frequent association rules was introduced, showing that this proposal was competitive in terms of scalability, expressiveness, flexibility and the ability to restrict the search space. The goal of this work is to demonstrate that this proposal is also appropriate for the discovery of rare association rules. This approach allows one to obtain solutions within specified time limits and does not require large amounts of memory, as current algorithms do. It also provides mechanisms to discard noise from the rare association rule set by applying four different and specific fitness functions, which are compared and studied in depth. Finally, this approach is compared with other existing algorithms for mining rare association rules, and an analysis of the mined rules is performed. As a result, this approach mines rare rules in a homogeneous and low execution time. The experimental study shows that this proposal obtains a small and accurate set of rules close to the size specified by the data miner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. This dataset comprises 392 instances and 8 attributes, and is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Auto+MPG).

  2. The dataset used in this example is a real dataset (http://archive.ics.uci.edu/ml/datasets/Zoo) that will be used in the experimental study.

  3. Ankara weather, mushroom, soybean and vote datasets are available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets).

  4. The Zoo dataset comprises 102 instances and 17 categorical attributes, and it is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Zoo).

  5. The Automobile Performance dataset comprises 392 instances and 8 numerical attributes, and it is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Automobile).

  6. All these datasets are publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets).

  7. JCLEC is available for download (http://jclec.sourceforge.net).

References

  1. Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of the 6th international conference on machine learning and applications, ICMLA ’07, pp 73–80, Cincinnati, Ohio

  2. Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, pp 307–328. http://dl.acm.org/citation.cfm?id=257938.257975

  3. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc., pp 487–499

  4. Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235

    MATH  Google Scholar 

  5. Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proceedings of the 1st workshop on frequent itemset mining implementations, FIMI ’03, Melbourne, Florida, USA, pp 1–9

  6. Chen Y, Peng W, Lee S (2011) Ceminer—an efficient algorithm for mining closed patterns from time interval-based data. In: Proceedings of the 11th IEEE international conference on data mining, ICDM ’11, Vancouver, BC, Canada, pp 121–130

  7. Chi Y, Wang H, Yu PS, Muntz RR (2006) Catch the momento: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3):265–294

    Article  Google Scholar 

  8. Datar E, Fujiwara M, Gionis S, Indyk A, Motwani P, Ullman R, Yang JD, Cohen C (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78

    Article  Google Scholar 

  9. De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, ACM SIGKDD ’08, Las Vegas, USA, pp 204–212

  10. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  11. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064

    Article  Google Scholar 

  12. García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the cec’2005 special session on real parameter optimization. J Heuristics 15(6):617–644

    Article  MATH  Google Scholar 

  13. García-Piquer A, Fornells A, Orriols-Puig A, Corral G, Golobardes E (2011) Data classification through an evolutionary approach based on multiple criteria. Knowl Inf Syst. doi:10.1007/s10115-011-0462-9

  14. Gruau F (1996) On using syntactic constraints with genetic programming. Adv Genet Progr 2:377–394

    Google Scholar 

  15. Ha H, Hwang D, Ryu B, Yun KH (2003) Mining association rules on significant rare data using relative support. J Syst Softw 67(3):181–191

    Article  Google Scholar 

  16. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87

    Article  MathSciNet  Google Scholar 

  17. Hoai RI, Whigham NX, Shan PA, O’neill Y, McKay M (2010) Grammar-based genetic programming: a survey. Genet Progr Evol Mach 11(3–4):365–396

    Google Scholar 

  18. Koh YS, Rountree N (2005) Finding sporadic rules using apriori-inverse. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 3518:97–106

  19. Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information science reference, Hershey, NY

    Google Scholar 

  20. Koufakou A, Secretan J, Georgiopoulos M (2011) Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl Inf Syst 29:697–725

    Article  Google Scholar 

  21. Li T, Li X (2010) Novel alarm correlation analysis system based on association rules mining in telecommunication networks. Inf Sci 180(16):2960–2978

    Article  Google Scholar 

  22. Luna JM, Ramírez A, Romero JR, Ventura S (2010) An intruder detection approach based on infrequent rating pattern mining. In: Proceedings of the 10th international conference on intelligent systems design and applications, ISDA ’10, Cairo, Egypt, pp 682–688

  23. Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76

    Article  Google Scholar 

  24. Mata J, Álvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. In: Proceeding of the 6th international conference on knowledge discovery and data mining, PAKDD ’02, pp 40–51

  25. Ordoñez C, Ezquerra N, Santana C (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):259–283

    Article  Google Scholar 

  26. Piatetsky-Shapiro G (1991) Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI Press, Menlo Park, CA, pp 229–248

    Google Scholar 

  27. Rahman A, Ezeife CI, Aggarwal AK (2008) Wifi miner: an online apriori-infrequent based wireless intrusion system. In: Proceedings of the 2nd international workshop in knowledge discovery from sensor data, Sensor-KDD ’08, Las Vegas, USA, pp 76–93

  28. Rastogi R, Shim K (2002) Mining optimized association rules with categorical and numeric attributes. IEEE Trans Knowl Data Eng 14(1):29–50

    Article  Google Scholar 

  29. Romero C, Luna JM, Romero JR, Ventura S (2011) Rm-tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576

    Article  Google Scholar 

  30. Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86

    Article  Google Scholar 

  31. Sánchez D, Serrano JM, Cerda L, Vila MA (2008) Association rules applied to credit card fraud detection. Expert Syst Appl 36:3630–3640

    Article  Google Scholar 

  32. Schuster A, Wolff R, Trock D (2004) A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7(4):458–475

    Article  Google Scholar 

  33. Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence, ICTAI ’07, Patras, Greece, pp 305–312

  34. Szathmary L, Valtchev P, Napoli A (2010) Generating rare association rules using the minimal rare itemsets family. Int J Softw Inf 4(3):219–238

    Google Scholar 

  35. Tan P, Kumar V Interestingness measures for association patterns: a perspective. In: Proceedings of the workshop on postprocessing in machine learning and data mining, KDD ’00, New York, USA

  36. Tung AKH, Lu H, Han J, Feng L (2003) Efficient mining of intertransaction association rules. IEEE Trans Knowl Data Eng 15(1):43–56. http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1161581

    Google Scholar 

  37. Ventura S, Romero C, Zafra A, Delgado JA, Hervs C (2008) Jclec: a java framework for evolutionary computation. Soft Comput 12(4):381–392

    Article  Google Scholar 

  38. Yun U, Ryu KH (2011) Approximate weighted frequent pattern mining with/without noisy environments. Knowl Based Syst 24(1):73–82

    Article  Google Scholar 

  39. Zhang C, Zhang S (2002) Association rule mining: models and algorithms. Springer, Berlin

    Book  Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge the very helpful comments and suggestions of Dr. Mykola Pechenizkiy (Technical University of Eindhoven) on previous versions of this paper. This work was supported by the Regional Government of Andalusia and the Spanish Ministry of Science and Technology projects, P08-TIC-3720, TIN2008-06681-C06-03 and TIN-2011-22408, respectively, and FEDER funds. This research was also supported by the Spanish Ministry of Education under FPU grant AP2010-0041.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Ventura.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luna, J.M., Romero, J.R. & Ventura, S. On the adaptability of G3PARM to the extraction of rare association rules. Knowl Inf Syst 38, 391–418 (2014). https://doi.org/10.1007/s10115-012-0591-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0591-9

Keywords

Navigation