skip to main content
10.1145/2576768.2598327acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

A grammatical evolution based hyper-heuristic for the automatic design of split criteria

Published:12 July 2014Publication History

ABSTRACT

Top-down induction of decision trees (TDIDT) is a powerful method for data classification. A major issue in TDIDT is the decision on which attribute should be selected for dividing the nodes in subsets, creating the tree. For performing such a task, decision trees make use of a split criterion, which is usually an information-theory based measure. Apparently, there is no free-lunch regarding decision-tree split criteria, as is the case of most things in machine learning. Each application may benefit from a distinct split criterion, and the problem we pose here is how to identify the suitable split criterion for each possible application that may emerge. We propose in this paper a grammatical evolution algorithm for automatically generating split criteria through a context-free grammar. We name our new approach ESC-GE (Evolutionary Split Criteria with Grammatical Evolution). It is empirically evaluated on public gene expression datasets, and we compare its performance with state-of-the-art split criteria, namely the information gain and gain ratio. Results show that ESC-GE outperforms the baseline criteria in the domain of gene expression data, indicating its effectiveness for automatically designing tailor-made split criteria.

References

  1. R. C. Barros, M. P. Basgalupp, A. C. P. L. F. de Carvalho, and A. A. Freitas. A hyper-heuristic evolutionary algorithm for automatically designing decision-tree algorithms. In 14th Genetic and Evolutionary Computation Conference (GECCO 2012), pages 1237--1244, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. C. Barros, M. P. Basgalupp, A. C. P. L. F. de Carvalho, and A. A. Freitas. Automatic Design of Decision-Tree Algorithms with Evolutionary Algorithms. Evolutionary Computation, 21(4), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. C. Barros, M. P. Basgalupp, A. A. Freitas, and A. C. P. L. F. de Carvalho. Evolutionary Design of Decision-Tree Algorithms Tailored to Microarray Gene Expression Data Sets. IEEE Transactions on Evolutionary Computation, in press, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  4. R. C. Barros, A. T. Winck, K. S. Machado, M. P. Basgalupp, A. C. P. L. F. de Carvalho, D. D. Ruiz, and O. S. de Souza. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data. BMC Bioinformatics, 13, 2012.Google ScholarGoogle Scholar
  5. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984.Google ScholarGoogle Scholar
  6. E. K. Burke, M. Hyde, G. Kendall, G. Ochoa, E. Ozcan, and R. Qu. A survey of hyper-heuristics. Technical Report Computer Science Technical Report No. NOTTCS-TR-SUB-0906241418-2747, School of Computer Science and Information Technology, University of Nottingham, 2009.Google ScholarGoogle Scholar
  7. R. Casey and G. Nagy. Decision tree design using a probabilistic model. IEEE Transactions on Information Theory, 30(1):93--99, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Ching, A. Wong, and K. Chan. Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(7):641--651, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. L. De Mantaras. A distance-based attribute selection measure for decision tree induction. Machine Learning, 6(1):81--92, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1--30, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Gleser and M. Collen. Towards automated medical decisions. Computers and Biomedical Research, 5(2):180--189, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Hartmann, P. Varshney, K. Mehrotra, and C. Gerberich. Application of information theory to the construction of efficient decision trees. IEEE Transactions on Information Theory, 28(4):565--577, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Iman and J. Davenport. Approximations of the critical region of the friedman statistic. Communications in Statistics, pages 571--595, 1980.Google ScholarGoogle Scholar
  14. B. Jun, C. Kim, Y.-Y. Song, and J. Kim. A New Criterion in Selection and Discretization of Attributes for the Generation of Decision Trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):1371--1375, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Kononenko, I. Bratko, and E. Roskar. Experiments in automatic learning of medical diagnostic rules. Technical report, Jozef Stefan Institute, Ljubljana, Yugoslavia, 1984.Google ScholarGoogle Scholar
  16. R. Mckay, N. Hoai, P. Whigham, Y. Shan, and M. O Neill. Grammar-based Genetic Programming: a survey. Genetic Programming and Evolvable Machines, 11(3):365--396, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Mingers. Expert systems - rule induction with statistical data. Journal of the Operational Research Society, 38:39--47, 1987.Google ScholarGoogle Scholar
  18. J. Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3(4):319--342, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. O'Neill and C. Ryan. Grammatical evolution. IEEE Transactions on Evolutionary Computation, 5(4):349--358, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Ozcan, B. Bilgin, and E. E. Korkmaz. A comprehensive analysis of hyper-heuristics. Intelligent Data Analysis, 12(1):3--23, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. L. Pappa and A. A. Freitas. Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach. Springer Publishing Company, Incorporated, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81--106, 1986. Google ScholarGoogle ScholarCross RefCross Ref
  23. J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, San Francisco, CA, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Rokach and O. Maimon. Top-down induction of decision trees classifiers - a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 35(4):476--487, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. I. K. Sethi and G. P. R. Sarvarayudu. Hierarchical Classifier Design Using Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(4):441--445, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. E. Shannon. A mathematical theory of communication. BELL System Technical Journal, 27(1):379--423, 625--56, 1948.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Souto, I. Costa, D. de Araujo, T. Ludermir, and A. Schliep. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics, 9(1):497, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. Talmon. A multiclass nonparametric partitioning algorithm. Pattern Recognition Letters, 4(1):31--38, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Vella, D. Corne, and C. Murphy. Hyper-heuristic decision tree induction. World Congress on Nature & Biologically Inspired Computing, pages 409--414, 2010.Google ScholarGoogle Scholar
  30. D. Wang and L. Jiang. An improved attribute selection measure for decision tree induction. In 4th International Conference on Fuzzy Systems and Knowledge Discovery, pages 654--658, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. S. Wilks. Mathematical Statistics. John Wiley & Sons Inc., 1962.Google ScholarGoogle Scholar
  32. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, October 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Zhou and T. Dillon. A statistical-heuristic feature selection criterion for decision tree induction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):834--841, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A grammatical evolution based hyper-heuristic for the automatic design of split criteria

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '14: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation
      July 2014
      1478 pages
      ISBN:9781450326629
      DOI:10.1145/2576768

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 July 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      GECCO '14 Paper Acceptance Rate180of544submissions,33%Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader