Mining exceptional relationships with grammar-guided genetic programming

Luna, Jose Maria; Pechenizkiy, Mykola; Ventura, Sebastian

doi:10.1007/s10115-015-0859-y

Mining exceptional relationships with grammar-guided genetic programming

Regular Paper
Published: 16 July 2015

Volume 47, pages 571–594, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jose Maria Luna¹,
Mykola Pechenizkiy² &
Sebastian Ventura^1,3

411 Accesses
13 Citations
Explore all metrics

Abstract

Given a database of records, it might be possible to identify small subsets of data which distribution is exceptionally different from the distribution in the complete set of data records. Finding such interesting relationships, which we call exceptional relationships, in an automated way would allow discovering unusual or exceptional hidden behaviour. In this paper, we formulate the problem of mining exceptional relationships as a special case of exceptional model mining and propose a grammar-guided genetic programming algorithm (MERG3P) that enables the discovery of any exceptional relationships. In particular, MERG3P can work directly not only with categorical, but also with numerical data. In the experimental evaluation, we conduct a case study on mining exceptional relations between well-known and widely used quality measures of association rules, which exceptional behaviour would be of interest to pattern mining experts. For this purpose, we constructed a data set comprising a wide range of values for each considered association rule quality measure, such that possible exceptional relations between measures could be discovered. Thus, besides the actual validation of MERG3P, we found that the Support and Leverage measures in fact are negatively correlated under certain conditions, while in general experts in the field expect these measures to be positively correlated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Evolutionary Method for Exceptional Association Rule Set Discovery from Incomplete Database

A Clustering-Inspired Quality Measure for Exceptional Preferences Mining—Design Choices and Consequences

Gene Transfer: A Novel Genetic Operator for Discovering Diverse-Frequent Patterns

Notes

The data set and the data generator can be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.
A sensitivity analysis was carried out. The results and statistical analysis could be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.
JCLEC is available for download (http://jclec.sourceforge.net).
All the data sets are publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/).
The data set and the data generator can be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB’94. Santiago de Chile, Chile, Morgan Kaufmann, pp. 487–499
Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235
MATH Google Scholar
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87
Article MathSciNet Google Scholar
McKay RI, Nguyen XH, Whigham PA, Shan Y, O’Neill M (2010) Grammar-based genetic programming: a survey. Genet Program Evol Mach, 11(3–4):365–396
Jaroszewicz S (2008) Minimum variance associations—discovering relationships in numerical data. In: The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, pp. 172–183
Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information Science Reference, Hershey
Book Google Scholar
Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. The MIT Press, Cambridge
MATH Google Scholar
Leeuwen M (2010) Maximal exceptions with minimal descriptions. Data Min Knowl Discov 21(2):259–276
Article MathSciNet Google Scholar
Leeuwen Matthijs, Knobbe Arno (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242
Article MathSciNet Google Scholar
Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the European Conference in Machine Learning and Knowledge Discovery in Databases, volume 5212 of ECML/PKDD 2008, Antwerp, Belgium, Springer, pp. 1–16
Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76
Article Google Scholar
Luna JM, Romero JR, Ventura S (2014) On the adaptability of G3PARM to the extraction of rare association rules. Knowl Inf Syst 38(2):391–418
Article Google Scholar
Romero C, Luna JM, Romero JR, Ventura S (2010) Mining rare association rules from e-learning data. In: Proceedings of the 3rd International Conference on Educational Data Mining, EDM 2010, pp. 171–180
Romero C, Luna JM, Romero JR, Ventura S (2011) RM-Tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576
Article Google Scholar
Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86
Article Google Scholar
Ventura S, Romero C, Zafra A, Delgado JA, Hervás C (2008) JCLEC: a java framework for evolutionary computation. Soft Comput 12(4):381–392
Article Google Scholar
Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’01. New York, NY, USA, ACM, pp. 383–388
Zafra A, Pechenizkiy M, Ventura S (2012) ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1):210–218
Article Google Scholar

Download references

Acknowledgments

This research was supported by the Spanish Ministry of Economy and Competitiveness, project TIN-2014-55252-P, and by FEDER funds. This research was partly supported by STW CAPA project. Finally, this research was also supported by the Spanish Ministry of Education under FPU Grant AP2010-0041.

Author information

Authors and Affiliations

Department of Computer Science and Numerical Analysis, University of Cordoba, Rabanales Campus, 14071, Cordoba, Spain
Jose Maria Luna & Sebastian Ventura
Department of Computer Science, Eindhoven University of Technology, W&I, TU/e, P.O. Box 513, 5600 MB, Eindhoven, The Netherlands
Mykola Pechenizkiy
Department of Computer Science, King Abdulaziz University, Jeddah, Saudi Arabia Kingdom
Sebastian Ventura

Authors

Jose Maria Luna
View author publications
You can also search for this author in PubMed Google Scholar
Mykola Pechenizkiy
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Ventura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Ventura.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luna, J.M., Pechenizkiy, M. & Ventura, S. Mining exceptional relationships with grammar-guided genetic programming. Knowl Inf Syst 47, 571–594 (2016). https://doi.org/10.1007/s10115-015-0859-y

Download citation

Received: 08 August 2014
Revised: 02 February 2015
Accepted: 06 July 2015
Published: 16 July 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10115-015-0859-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining exceptional relationships with grammar-guided genetic programming

Abstract

Access this article

Similar content being viewed by others

An Evolutionary Method for Exceptional Association Rule Set Discovery from Incomplete Database

A Clustering-Inspired Quality Measure for Exceptional Preferences Mining—Design Choices and Consequences

Gene Transfer: A Novel Genetic Operator for Discovering Diverse-Frequent Patterns

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining exceptional relationships with grammar-guided genetic programming

Abstract

Access this article

Similar content being viewed by others

An Evolutionary Method for Exceptional Association Rule Set Discovery from Incomplete Database

A Clustering-Inspired Quality Measure for Exceptional Preferences Mining—Design Choices and Consequences

Gene Transfer: A Novel Genetic Operator for Discovering Diverse-Frequent Patterns

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation