Skip to main content

Advertisement

Log in

Mining exceptional relationships with grammar-guided genetic programming

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Given a database of records, it might be possible to identify small subsets of data which distribution is exceptionally different from the distribution in the complete set of data records. Finding such interesting relationships, which we call exceptional relationships, in an automated way would allow discovering unusual or exceptional hidden behaviour. In this paper, we formulate the problem of mining exceptional relationships as a special case of exceptional model mining and propose a grammar-guided genetic programming algorithm (MERG3P) that enables the discovery of any exceptional relationships. In particular, MERG3P can work directly not only with categorical, but also with numerical data. In the experimental evaluation, we conduct a case study on mining exceptional relations between well-known and widely used quality measures of association rules, which exceptional behaviour would be of interest to pattern mining experts. For this purpose, we constructed a data set comprising a wide range of values for each considered association rule quality measure, such that possible exceptional relations between measures could be discovered. Thus, besides the actual validation of MERG3P, we found that the Support and Leverage measures in fact are negatively correlated under certain conditions, while in general experts in the field expect these measures to be positively correlated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. The data set and the data generator can be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.

  2. A sensitivity analysis was carried out. The results and statistical analysis could be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.

  3. JCLEC is available for download (http://jclec.sourceforge.net).

  4. All the data sets are publicly available for download from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets/).

  5. The data set and the data generator can be reached at http://www.uco.es/grupos/kdis/kdiswiki/index.php/Exceptional_ARM.

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB’94. Santiago de Chile, Chile, Morgan Kaufmann, pp. 487–499

  2. Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235

    MATH  Google Scholar 

  3. Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9

    Article  Google Scholar 

  4. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8:53–87

    Article  MathSciNet  Google Scholar 

  5. McKay RI, Nguyen XH, Whigham PA, Shan Y, O’Neill M (2010) Grammar-based genetic programming: a survey. Genet Program Evol Mach, 11(3–4):365–396

  6. Jaroszewicz S (2008) Minimum variance associations—discovering relationships in numerical data. In: The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Osaka, Japan, pp. 172–183

  7. Koh YS, Rountree N (2010) Rare association rule mining and knowledge discovery: technologies for infrequent and critical event detection. Information Science Reference, Hershey

    Book  Google Scholar 

  8. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. The MIT Press, Cambridge

    MATH  Google Scholar 

  9. Leeuwen M (2010) Maximal exceptions with minimal descriptions. Data Min Knowl Discov 21(2):259–276

    Article  MathSciNet  Google Scholar 

  10. Leeuwen Matthijs, Knobbe Arno (2012) Diverse subgroup set discovery. Data Min Knowl Discov 25(2):208–242

    Article  MathSciNet  Google Scholar 

  11. Leman D, Feelders A, Knobbe AJ (2008) Exceptional model mining. In: Proceedings of the European Conference in Machine Learning and Knowledge Discovery in Databases, volume 5212 of ECML/PKDD 2008, Antwerp, Belgium, Springer, pp. 1–16

  12. Luna JM, Romero JR, Ventura S (2012) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst 32(1):53–76

    Article  Google Scholar 

  13. Luna JM, Romero JR, Ventura S (2014) On the adaptability of G3PARM to the extraction of rare association rules. Knowl Inf Syst 38(2):391–418

    Article  Google Scholar 

  14. Romero C, Luna JM, Romero JR, Ventura S (2010) Mining rare association rules from e-learning data. In: Proceedings of the 3rd International Conference on Educational Data Mining, EDM 2010, pp. 171–180

  15. Romero C, Luna JM, Romero JR, Ventura S (2011) RM-Tool: a framework for discovering and evaluating association rules. Adv Eng Softw 42(8):566–576

    Article  Google Scholar 

  16. Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30:57–86

    Article  Google Scholar 

  17. Ventura S, Romero C, Zafra A, Delgado JA, Hervás C (2008) JCLEC: a java framework for evolutionary computation. Soft Comput 12(4):381–392

    Article  Google Scholar 

  18. Webb GI (2001) Discovering associations with numeric variables. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’01. New York, NY, USA, ACM, pp. 383–388

  19. Zafra A, Pechenizkiy M, Ventura S (2012) ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1):210–218

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the Spanish Ministry of Economy and Competitiveness, project TIN-2014-55252-P, and by FEDER funds. This research was partly supported by STW CAPA project. Finally, this research was also supported by the Spanish Ministry of Education under FPU Grant AP2010-0041.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastian Ventura.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luna, J.M., Pechenizkiy, M. & Ventura, S. Mining exceptional relationships with grammar-guided genetic programming. Knowl Inf Syst 47, 571–594 (2016). https://doi.org/10.1007/s10115-015-0859-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0859-y

Keywords

Navigation