On the adaptability of G3PARM to the extraction of rare association rules

Luna, J. M.; Romero, J. R.; Ventura, S.

doi:10.1007/s10115-012-0591-9

On the adaptability of G3PARM to the extraction of rare association rules

Regular Paper
Published: 09 February 2013

Volume 38, pages 391–418, (2014)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

J. M. Luna¹,
J. R. Romero¹ &
S. Ventura¹

478 Accesses
22 Citations
Explore all metrics

Abstract

To date, association rule mining has mainly focused on the discovery of frequent patterns. Nevertheless, it is often interesting to focus on those that do not frequently occur. Existing algorithms for mining this kind of infrequent patterns are mainly based on exhaustive search methods and can be applied only over categorical domains. In a previous work, the use of grammar-guided genetic programming for the discovery of frequent association rules was introduced, showing that this proposal was competitive in terms of scalability, expressiveness, flexibility and the ability to restrict the search space. The goal of this work is to demonstrate that this proposal is also appropriate for the discovery of rare association rules. This approach allows one to obtain solutions within specified time limits and does not require large amounts of memory, as current algorithms do. It also provides mechanisms to discard noise from the rare association rule set by applying four different and specific fitness functions, which are compared and studied in depth. Finally, this approach is compared with other existing algorithms for mining rare association rules, and an analysis of the mined rules is performed. As a result, this approach mines rare rules in a homogeneous and low execution time. The experimental study shows that this proposal obtains a small and accurate set of rules close to the size specified by the data miner.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on genetic algorithm: past, present, and future

Article 31 October 2020

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Genetic algorithms: theory, genetic operators, solutions, and applications

Article 03 February 2023

Notes

This dataset comprises 392 instances and 8 attributes, and is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Auto+MPG).
The dataset used in this example is a real dataset (http://archive.ics.uci.edu/ml/datasets/Zoo) that will be used in the experimental study.
Ankara weather, mushroom, soybean and vote datasets are available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets).
The Zoo dataset comprises 102 instances and 17 categorical attributes, and it is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Zoo).
The Automobile Performance dataset comprises 392 instances and 8 numerical attributes, and it is publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/Automobile).
All these datasets are publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets).
JCLEC is available for download (http://jclec.sourceforge.net).

References

Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of the 6th international conference on machine learning and applications, ICMLA ’07, pp 73–80, Cincinnati, Ohio
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence, Menlo Park, CA, pp 307–328. http://dl.acm.org/citation.cfm?id=257938.257975
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc., pp 487–499
Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235
MATH Google Scholar
Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proceedings of the 1st workshop on frequent itemset mining implementations, FIMI ’03, Melbourne, Florida, USA, pp 1–9
Chen Y, Peng W, Lee S (2011) Ceminer—an efficient algorithm for mining closed patterns from time interval-based data. In: Proceedings of the 11th IEEE international conference on data mining, ICDM ’11, Vancouver, BC, Canada, pp 121–130
Chi Y, Wang H, Yu PS, Muntz RR (2006) Catch the momento: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3):265–294
Article Google Scholar
Datar E, Fujiwara M, Gionis S, Indyk A, Motwani P, Ullman R, Yang JD, Cohen C (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78
Article Google Scholar
De Raedt L, Guns T, Nijssen S (2008) Constraint programming for itemset mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, ACM SIGKDD ’08, Las Vegas, USA, pp 204–212
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Article Google Scholar
García S, Molina D, Lozano M, Herrera F (2009) A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the cec’2005 special session on real parameter optimization. J Heuristics 15(6):617–644
Article MATH Google Scholar
García-Piquer A, Fornells A, Orriols-Puig A, Corral G, Golobardes E (2011) Data classification through an evolutionary approach based on multiple criteria. Knowl Inf Syst. doi:10.1007/s10115-011-0462-9
Gruau F (1996) On using syntactic constraints with genetic programming. Adv Genet Progr 2:377–394
Google Scholar
Ha H, Hwang D, Ryu B, Yun KH (2003) Mining association rules on significant rare data using relative support. J Syst Softw 67(3):181–191
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87
Article MathSciNet Google Scholar
Hoai RI, Whigham NX, Shan PA, O’neill Y, McKay M (2010) Grammar-based genetic programming: a survey. Genet Progr Evol Mach 11(3–4):365–396
Google Scholar
Koh YS, Rountree N (2005) Finding sporadic rules using apriori-inverse. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 3518:97–106
Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information science reference, Hershey, NY
Google Scholar
Koufakou A, Secretan J, Georgiopoulos M (2011) Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data. Knowl Inf Syst 29:697–725
Article Google Scholar
Li T, Li X (2010) Novel alarm correlation analysis system based on association rules mining in telecommunication networks. Inf Sci 180(16):2960–2978
Article Google Scholar
Luna JM, Ramírez A, Romero JR, Ventura S (2010) An intruder detection approach based on infrequent rating pattern mining. In: Proceedings of the 10th international conference on intelligent systems design and applications, ISDA ’10, Cairo, Egypt, pp 682–688
Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76
Article Google Scholar
Mata J, Álvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. In: Proceeding of the 6th international conference on knowledge discovery and data mining, PAKDD ’02, pp 40–51
Ordoñez C, Ezquerra N, Santana C (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):259–283
Article Google Scholar
Piatetsky-Shapiro G (1991) Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI Press, Menlo Park, CA, pp 229–248
Google Scholar
Rahman A, Ezeife CI, Aggarwal AK (2008) Wifi miner: an online apriori-infrequent based wireless intrusion system. In: Proceedings of the 2nd international workshop in knowledge discovery from sensor data, Sensor-KDD ’08, Las Vegas, USA, pp 76–93
Rastogi R, Shim K (2002) Mining optimized association rules with categorical and numeric attributes. IEEE Trans Knowl Data Eng 14(1):29–50
Article Google Scholar
Romero C, Luna JM, Romero JR, Ventura S (2011) Rm-tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576
Article Google Scholar
Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86
Article Google Scholar
Sánchez D, Serrano JM, Cerda L, Vila MA (2008) Association rules applied to credit card fraud detection. Expert Syst Appl 36:3630–3640
Article Google Scholar
Schuster A, Wolff R, Trock D (2004) A high-performance distributed algorithm for mining association rules. Knowl Inf Syst 7(4):458–475
Article Google Scholar
Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence, ICTAI ’07, Patras, Greece, pp 305–312
Szathmary L, Valtchev P, Napoli A (2010) Generating rare association rules using the minimal rare itemsets family. Int J Softw Inf 4(3):219–238
Google Scholar
Tan P, Kumar V Interestingness measures for association patterns: a perspective. In: Proceedings of the workshop on postprocessing in machine learning and data mining, KDD ’00, New York, USA
Tung AKH, Lu H, Han J, Feng L (2003) Efficient mining of intertransaction association rules. IEEE Trans Knowl Data Eng 15(1):43–56. http://doi.ieeecomputersociety.org/10.1109/TKDE.2003.1161581
Google Scholar
Ventura S, Romero C, Zafra A, Delgado JA, Hervs C (2008) Jclec: a java framework for evolutionary computation. Soft Comput 12(4):381–392
Article Google Scholar
Yun U, Ryu KH (2011) Approximate weighted frequent pattern mining with/without noisy environments. Knowl Based Syst 24(1):73–82
Article Google Scholar
Zhang C, Zhang S (2002) Association rule mining: models and algorithms. Springer, Berlin
Book Google Scholar

Download references

Acknowledgments

The authors would like to acknowledge the very helpful comments and suggestions of Dr. Mykola Pechenizkiy (Technical University of Eindhoven) on previous versions of this paper. This work was supported by the Regional Government of Andalusia and the Spanish Ministry of Science and Technology projects, P08-TIC-3720, TIN2008-06681-C06-03 and TIN-2011-22408, respectively, and FEDER funds. This research was also supported by the Spanish Ministry of Education under FPU grant AP2010-0041.

Author information

Authors and Affiliations

Department of Computer Science and Numerical Analysis, University of Cordoba, Rabanales Campus, 14071 , Cordoba, Spain
J. M. Luna, J. R. Romero & S. Ventura

Authors

J. M. Luna
View author publications
You can also search for this author in PubMed Google Scholar
J. R. Romero
View author publications
You can also search for this author in PubMed Google Scholar
S. Ventura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Ventura.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luna, J.M., Romero, J.R. & Ventura, S. On the adaptability of G3PARM to the extraction of rare association rules. Knowl Inf Syst 38, 391–418 (2014). https://doi.org/10.1007/s10115-012-0591-9

Download citation

Received: 16 January 2012
Revised: 21 August 2012
Accepted: 03 December 2012
Published: 09 February 2013
Issue Date: February 2014
DOI: https://doi.org/10.1007/s10115-012-0591-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the adaptability of G3PARM to the extraction of rare association rules

Abstract

Access this article

Similar content being viewed by others

A review on genetic algorithm: past, present, and future

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Genetic algorithms: theory, genetic operators, solutions, and applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the adaptability of G3PARM to the extraction of rare association rules

Abstract

Access this article

Similar content being viewed by others

A review on genetic algorithm: past, present, and future

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Genetic algorithms: theory, genetic operators, solutions, and applications

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation